Registry API¶

Task discovery, loading, and organization system for biomedical benchmark tasks.

Task `dataclass` ¶

Task(
    id,
    name,
    description,
    grader,
    answers,
    gold_submission,
    sample_submission,
    prepare_fn,
    raw_dir,
    private_dir,
    public_dir,
    checksums,
    leaderboard,
    biomedical_metadata=None,
    human_baselines=None,
    compute_requirements=None,
    data_source=None,
    task_type=None,
    domain=None,
    difficulty=None,
    time_limit=None,
)

Represents a biomedical ML task in the BioML-bench framework.

Extended from MLE-bench Competition class with biomedical-specific metadata.

Registry ¶

Registry(data_dir=DEFAULT_DATA_DIR)

get_task ¶

get_task(task_id)

Fetch the task from the registry using folder/task format.

get_tasks_dir ¶

get_tasks_dir()

Retrieves the tasks directory within the registry.

get_splits_dir ¶

get_splits_dir()

Retrieves the splits directory within the repository.

get_tasks_by_domain ¶

get_tasks_by_domain(domain)

List all task IDs for a specific biomedical domain.

get_tasks_by_type ¶

get_tasks_by_type(task_type)

List all task IDs for a specific task type.

get_lite_task_ids ¶

get_lite_task_ids()

List all task IDs for the lite version (low difficulty tasks).

get_tasks_by_difficulty ¶

get_tasks_by_difficulty(difficulty)

List all task IDs for a specific difficulty level.

get_data_dir ¶

get_data_dir()

Retrieves the data directory within the registry.

set_data_dir ¶

set_data_dir(new_data_dir)

Sets the data directory within the registry.

list_task_ids ¶

list_task_ids()

List all task IDs available in the registry, sorted alphabetically.

Usage Examples¶

Basic Task Access¶

from biomlbench.registry import registry

# Get task information
task = registry.get_task("caco2-wang")
print(f"Task: {task.name} ({task.domain})")

# List all available tasks
all_tasks = registry.list_task_ids()
print(f"Available tasks: {len(all_tasks)}")

Task Configuration Format¶

Each task requires a config.yaml file:

id: caco2-wang
name: "Caco-2 Cell Permeability Prediction"
task_type: drug_discovery
domain: admet
difficulty: medium

data_source:
  type: polaris
  benchmark_id: tdcommons/caco2-wang

dataset:
  answers: caco2-wang/prepared/private/answers.csv
  sample_submission: caco2-wang/prepared/public/sample_submission.csv

grader:
  name: mean-absolute-error
  grade_fn: biomlbench.tasks.caco2-wang.grade:grade

preparer: biomlbench.tasks.caco2-wang.prepare:prepare

biomedical_metadata:
  modality: "molecular"
  data_type: "smiles_regression"
  clinical_relevance: "drug_absorption"

compute_requirements:
  recommended_gpu_memory_gb: 4
  estimated_runtime_minutes: 15
  max_dataset_size_gb: 1