Agents API¶

Agent execution and management functionality for running LLM agents on biomedical tasks.

Agent execution module for BioML-bench.

This module provides functionality to run AI agents on biomedical tasks within the biomlbench framework.

AgentTask `dataclass` ¶

AgentTask(
    run_id,
    seed,
    image,
    path_to_run_group,
    path_to_run,
    agent,
    task,
    container_config,
)

Represents a single agent-task execution.

check_agent_success `async` ¶

check_agent_success(run_dir, run_logger)

Check if an agent run actually succeeded by examining submission files and logs.

This catches cases where agents fail internally but exit with success codes.

worker `async` ¶

worker(
    idx,
    queue,
    client,
    tasks_outputs,
    retain_container=False,
)

Worker function that processes agent tasks from the queue.

create_task_list_file ¶

create_task_list_file(task_id)

Create a temporary task list file for single task execution.

get_task_ids ¶

get_task_ids(task_id=None, task_list=None)

Get list of task IDs from either single task or task list file.

run_agent_async `async` ¶

run_agent_async(
    agent_id,
    task_ids,
    n_workers=1,
    n_seeds=1,
    container_config_path=None,
    retain_container=False,
    data_dir=None,
    cpu_only=False,
    fast=False,
)

Run an agent on multiple tasks asynchronously.

Returns: Tuple[str, Path]: The run group ID and path to the generated submission file

run_agent ¶

run_agent(args)

Main entry point for running agents from the CLI.

Args: args: Parsed command line arguments

Returns: str: The run group ID for this execution

Agents API¶

AgentTask dataclass ¶

check_agent_success async ¶

worker async ¶