Creating Agents¶

Guide for developing custom AI agents for BioML-bench.

Agent Requirements¶

LLM agents must:

Run autonomously in Docker containers without human intervention
Read and comprehend biomedical task descriptions from /home/data/description.md
Analyze data and design appropriate ML approaches
Implement and execute complete ML pipelines (preprocessing, training, evaluation)
Generate predictions and write them to /home/submission/submission.<ext> (e.g., submission.csv, submission.h5ad)
Handle domain-specific data (SMILES for molecules, medical images, protein sequences, etc.)

Unlike traditional ML models, agents must solve tasks from scratch using only the task description and training data.

Agent Structure¶

agents/my-agent/
├── Dockerfile           # Container build instructions
├── config.yaml         # Agent configuration
├── start.sh            # Execution entry point
├── requirements.txt    # Python dependencies
└── src/                # Agent source code (if needed)
    └── agent.py

Configuration (`config.yaml`)¶

See example agents/aide/config.yaml.

my-agent:
  start: my-agent/start.sh
  dockerfile: my-agent/Dockerfile
  env_vars:
    OPENAI_API_KEY: "${OPENAI_API_KEY}"
  privileged: false  # Set true only if necessary

Docker Setup (`Dockerfile`)¶

See example agents/aide/Dockerfile.

FROM biomlbench-env

# Copy agent code
COPY src/ /home/agent/src/
COPY requirements.txt /home/agent/

# Install dependencies
WORKDIR /home/agent
RUN /opt/conda/bin/conda run -n agent pip install -r requirements.txt

# Set working directory
WORKDIR /home

Agent Interface¶

Agents are LLM-based systems that autonomously solve biomedical ML tasks end-to-end. They must:

Understand the task by reading the description and analyzing the data
Design an ML approach appropriate for the biomedical problem
Implement, train, and evaluate models
Generate predictions for submission

Example Agent Structure¶

# src/agent.py
import pandas as pd
from pathlib import Path

def main():
    # 1. Read and understand the task
    with open('/home/data/description.md', 'r') as f:
        task_description = f.read()
        # Agent parses task type, evaluation metric, clinical context

    # 2. Analyze the data
    train_df = pd.read_csv('/home/data/train.csv')
    test_df = pd.read_csv('/home/data/test_features.csv')
    sample_submission = pd.read_csv('/home/data/sample_submission.csv')

    # 3. ... Agent Logic ...

    # 4. Create submission in required format
    submission = pd.DataFrame({
        'id': test_df['id'],
        'prediction': predictions  # Column name from sample_submission
    })

    submission.to_csv('/home/submission/submission.csv', index=False)

if __name__ == "__main__":
    main()

Agent Input Data¶

All agent input data is mounted at /home/data/ in the container:

description.md: Complete task description including:
Biomedical background and clinical relevance
Task type (regression, classification, segmentation) and evaluation metric
Dataset information (sample counts, features, targets)
Data format specifications and file descriptions
Baseline approaches and molecular representation strategies
Scientific references and original benchmark links
train/ or train.<ext>: Training data with features and targets (e.g., train.csv, train.h5ad)
test_features/ or test_features.<ext>: Test features with unique IDs (no target labels) (e.g., test_features.csv, test_features.h5ad)
sample_submission/ or sample_submission.<ext>: Expected submission format with dummy predictions (e.g., sample_submission.csv, sample_submission.h5ad)
human_baselines.csv: Human performance baselines (when available)

Building and Testing¶

# Build agent
./scripts/build_agent.sh my-agent # e.g., ./scripts/build_agent.sh aide

# Test agent
biomlbench run-agent --agent my-agent --task-id caco2-wang # e.g., biomlbench run-agent --agent aide --task-id caco2-wang