Environment Reference¶

The environment/ directory contains the containerized execution environment for BioML-bench agents, including the base Docker image, grading server, and configuration files.

Overview¶

BioML-bench uses Docker containers to provide secure, isolated execution environments for AI agents. The environment includes:

Base Docker image with biomedical libraries
Grading server for submission validation
Container configuration system

Core Components¶

Base Docker Image (`Dockerfile`)¶

The foundational Docker image provides a complete biomedical ML environment:

Base System:

Ubuntu 22.04 LTS
Python 3.11 via Miniconda
Essential system packages and development tools

Libraries:

RDKit - Molecular informatics toolkit
BioPython - Biological computation library
scikit-learn - Machine learning framework
pandas, numpy - Data processing
TensorFlow with CUDA support
PyTorch with GPU acceleration

Grading Server (`grading_server.py`)¶

A Flask-based validation server that runs inside containers to evaluate agent submissions:

Key Features:

HTTP health check endpoint (/health)
Submission validation against private answers
Real-time evaluation feedback
Resource monitoring and limits

Usage in Containers:

# Server starts automatically via entrypoint.sh
# Agents can check health status
curl http://localhost:5000/health

# Server validates submissions internally
# No direct agent interaction required

Container Entrypoint (`entrypoint.sh`)¶

The entry script that configures the container environment and starts services:

Responsibilities:

User environment setup (non-root execution)
Directory permissions configuration
Grading server initialization
Agent execution orchestration

Execution Flow:

Validate container environment
Set up user permissions and directories
Start grading server in background
Wait for server health check
Execute agent start script
Clean up resources on exit

Security Features:

Non-root user execution (nonroot)
Private directory isolation (/private/)
Read-only data mounts
Resource limit enforcement

Task Instructions¶

Standardized instructions provided to agents. Includes info about general task structure, data, validation, and submission details.

instructions.txt - Complete task instructions.

instructions_obfuscated.txt (TODO: NEEDS TO BE CHECKED) - Minimal instructions to prevent overfitting and data leakage.

Validation Script (`validate_submission.sh`)¶

Shell script for basic submission format validation. Agents are instructed to run this script to validate their submissions before finishing.

Container Configuration¶

Default Configuration (`config/container_configs/default.json`)¶

{
    "mem_limit": null,       # No memory limit (use system default)
    "shm_size": "4G",        # Shared memory for large datasets
    "nano_cpus": 4e9         # 4 CPU cores
}

Custom Configuration Options¶

Resource Limits:

{
    "mem_limit": "8g",          # 8GB memory limit
    "shm_size": "4g",           # 4GB shared memory
    "nano_cpus": 8e9,           # 8 CPU cores
    "gpus": -1,                 # All available GPUs
    "runtime": "sysbox-runc"    # Enhanced security runtime
}

Security Settings:

{
    "privileged": false,        # Disable privileged mode
    "user": "nonroot",          # Non-root user execution
    "read_only": false,         # Allow container writes
    "security_opt": [           # Security options
        "no-new-privileges:true"
    ]
}

Environment Variables¶

System Environment¶

Set automatically by the container:

# Directory paths
DATA_DIR="/home/data"
SUBMISSION_DIR="/home/submission"
LOGS_DIR="/home/logs"
CODE_DIR="/home/code"
AGENT_DIR="/home/agent"

Volume Mounting Strategy¶

Data Volume Structure¶

/home/data/                    # Task data (read-only)
├── description.md             # Task description
├── train.<ext>                # Training data (e.g., train.csv, train.h5ad)
├── test_features.<ext>        # Test features (e.g., test_features.csv, test_features.h5ad)
├── sample_submission.<ext>    # Expected format (e.g., sample_submission.csv, sample_submission.h5ad)
└── human_baselines.<ext>      # Human performance (if available) (e.g., human_baselines.csv, human_baselines.h5ad)

/home/submission/              # Agent output (read-write)
└── submission.<ext>           # Agent predictions (e.g., submission.csv, submission.h5ad)

/private/data/task-id/         # Private evaluation data
└── prepared/private/
    └── answers.<ext>          # Ground truth (inaccessible to agents) (e.g., answers.csv, answers.h5ad)

Build Process¶

Size Optimization: - Multi-stage builds - Cleanup of package managers - Removal of development headers

Example Build Command (NEEDS TO BE CHECKED):

docker build \
    --target final \
    --build-arg INSTALL_HEAVY_DEPENDENCIES=true \
    --tag biomlbench-env \
    -f environment/Dockerfile \
    .