Skip to content

BioML-bench Documentation

Adding Tasks

Adding Tasks¶

Guide for adding new biomedical benchmark tasks to BioML-bench.

Task Requirements¶

Every task needs:

Pre-split data for training and testing
Clear evaluation metrics
A description of the task

Note: If adding a benchmarks from a new database, you'll also need to add a new data source module in biomlbench/data_sources/. See examples in biomlbench/data_sources/kaggle.py and biomlbench/data_sources/polaris.py.

Implementation Steps¶

Create task directory structure
Configure task metadata
Implement data preparation
Define evaluation logic
Write task description
Test and validate

Task directory structure¶

tasks/
├── my-source/                   # Data source folder (e.g., polarishub, manual)
│   └── my-biomedical-task/      # Task directory
│       ├── config.yaml          # Task configuration
│       ├── description.md       # Task description
│       ├── prepare.py           # Data preparation script
│       ├── leaderboard.csv      # Leaderboard with human performance baselines (optional but recommended)
│       └── grade.py             # Evaluation logic
│
├── ~/.cache/biomlbench/data/my-source/my-biomedical-task/prepared/    # Generated by biomlbench prepare -t <task_id>
│   ├── dataset-0/  # This folder must be created by prepare()
│       ├── public/          # Public data
│       │   ├── train.<ext>   # Training data (e.g., train.csv, train.h5ad)
│       │   ├── test_features.<ext> # Test features
│       │   └── sample_submission.<ext> # Example submission
│       └── private/         # Private data
│           └── answers.<ext>  # Test set answers
│   └── dataset-1/  # Multiple datasets allowed per task (e.g. for K-fold cross-validation)
│       └── public/  # Same directory structure as dataset-0 above
            ...

Task Configuration (`config.yaml`)¶

id: my-source/my-biomedical-task
name: "My Biomedical Task"
task_type: drug_discovery  # or medical_imaging, protein_engineering
domain: pharmacokinetics   # specific biomedical domain
difficulty: medium         # easy, medium, hard

data_source:
  type: kaggle            # or polaris, custom
  competition_id: my-task

dataset:
  answers: my-source/my-biomedical-task/prepared/private/answers.csv
  sample_submission: my-source/my-biomedical-task/prepared/public/sample_submission.csv

grader:
  name: rmse
  grade_fn: biomlbench.tasks.my-source.my-biomedical-task.grade:grade

preparer: biomlbench.tasks.my-source.my-biomedical-task.prepare:prepare

biomedical_metadata:
  modality: "molecular_properties"
  organ_system: "liver"
  data_type: "regression"
  clinical_relevance: "drug_metabolism"

Data Preparation (`prepare.py`)¶

See example biomlbench/tasks/polarishub/tdcommons-caco2-wang/prepare.py.

from pathlib import Path
import pandas as pd

def prepare(raw: Path, public: Path, private: Path) -> None:
    """
    Prepares the task data into public/private directories.

    Args:
        raw: Directory with the raw data
        public: Directory for public data (training examples and inputs for test examples)
        private: Directory for private data (answers for test examples)
    """
    # Download and process raw data
    # Create train.<ext>, test_features.<ext>, sample_submission.<ext>
    # Generate private answers.<ext>

Evaluation Logic (`grade.py`)¶

See example biomlbench/tasks/polarishub/tdcommons-caco2-wang/grade.py.

import pandas as pd
import numpy as np

def grade(submission: pd.DataFrame, answers: pd.DataFrame) -> float:
    """Calculate task-specific metric."""

    y_true = answers['label'].values
    y_pred = submission['label'].values

    # Implement domain-specific metric
    return np.sqrt(np.mean((y_true - y_pred) ** 2))

Testing New Tasks¶

# Test preparation
biomlbench prepare -t my-source/my-task

# Test with dummy agent
biomlbench run-agent --agent dummy --task-id my-source/my-task

# Validate submission
biomlbench grade --submission /path/to/submission.jsonl --output-dir /path/to/output/dir