Utilities API¶

Core utility functions for BioML-bench.

authenticate_kaggle_api ¶

authenticate_kaggle_api()

Authenticates the Kaggle API and returns an authenticated API object, or raises an error if authentication fails.

read_jsonl ¶

read_jsonl(file_path, skip_commented_out_lines=False)

Read a JSONL file and return a list of dictionaries of its content.

Args: file_path (str): Path to the JSONL file. skip_commented_out_lines (bool): If True, skip commented out lines.

Returns: list[dict]: List of dictionaries parsed from the JSONL file.

get_runs_dir ¶

get_runs_dir()

Returns an absolute path to the directory storing runs.

get_module_dir ¶

get_module_dir()

Returns an absolute path to the BioML-bench module.

get_repo_dir ¶

get_repo_dir()

Returns an absolute path to the repository directory.

generate_run_id ¶

generate_run_id(task_id, agent_id, run_group=None)

Creates a unique run ID for a specific task and agent combo

create_run_dir ¶

create_run_dir(task_id=None, agent_id=None, run_group=None)

Creates a directory for the run.

is_compressed ¶

is_compressed(fpath)

Checks if the file is compressed.

compress ¶

compress(src, compressed, exist_ok=False)

Compresses the contents of a source directory to a compressed file.

extract ¶

extract(
    compressed,
    dst,
    recursive=False,
    already_extracted=set(),
)

Extracts the contents of a compressed file to a destination directory.

is_empty ¶

is_empty(dir)

Checks if the directory is empty.

load_yaml ¶

load_yaml(fpath)

Loads a YAML file and returns its contents as a dictionary.

in_ci ¶

in_ci()

Checks if the code is running in GitHub CI.

import_fn ¶

import_fn(fn_import_string)

Imports a function from a module given a string in the format potentially.nested.module_name:fn_name.

Basically equivalent to from potentially.nested.module_name import fn_name.

get_path_to_callable ¶

get_path_to_callable(callable)

Retrieves the file path of the module where the given callable is defined.

Args: callable (Callable): The callable for which the module path is required.

Returns: Path: The relative path to the module file from the current working directory.

Raises: AssertionError: If the module does not have a file path.

get_diff ¶

get_diff(d, other_d, fromfile='d', tofile='other_d')

Finds the differences between two nested dictionaries and returns a diff string.

read_csv ¶

read_csv(*args, **kwargs)

Reads a CSV file and returns a DataFrame with custom default kwargs.

get_timestamp ¶

get_timestamp()

Returns the current timestamp in the format YYYY-MM-DDTHH-MM-SS-Z.

generate_submission_from_metadata ¶

generate_submission_from_metadata(
    metadata_path,
    output_path=None,
    rel_log_path=Path("logs/"),
    rel_code_path=Path("code/"),
)

Generate a submission.jsonl file from agent run metadata.

This function reads the metadata.json file created by run_agent_async() and creates a JSONL file mapping task IDs to their submission file paths, which can be used directly with the biomlbench grade command.

Args: metadata_path: Path to the metadata.json file output_path: Path for the output submission.jsonl file (defaults to same directory as metadata) rel_log_path: Path to logfile relative to run directory rel_code_path: Path to code file relative to run directory

Returns: Path to the generated submission.jsonl file