HydraXcel

HydraXcel: Accelerate Your Experiments with Hydra and Accelerate

PyPI version License: Apache-2.0 Build

HydraXcel is a configuration-driven deep learning experiment launcher that combines the power of Facebook Hydra for configuration management, HuggingFace Accelerate for effortless multi-GPU/distributed training, and the UV workflow for streamlined execution. Its goal is to make running fast, reproducible ML experiments simple and consistent. With HydraXcel, you can define your experiment setup declaratively (via YAML files or Python dataclasses), automatically log results to MLflow or Weights & Biases, and launch training on one or many GPUs with minimal code changes.

Key Features

With this setup, running experiments or servers is as easy as typing:

uv run train
uv run mlflow_server

Installation

HydraXcel can be included in your project as a git submodule/source. It is suggested to use uv to manage the dependency. In this way hydraxcel is added to the dependencies in the pyproject.toml together with the following source information.

[tool.uv.sources]
hydraxcel = { git = "https://github.com/carelvniekerk/HydraXcel" }

(Requires Python 3.13+; currently version 0.1.0a).

Usage Guide

  1. Define your configuration – either as a dataclass in Python or as YAML files.
  2. Wrap your main function with HydraXcel’s decorator (hydraxcel_main).
  3. Run your experiment with Python or UV CLI.
  4. (Optional) Use launch for distributed training with Accelerate.
  5. (Optional) Launch MLflow tracking server.

1. Using a Dataclass Config

# train.py
from dataclasses import dataclass
from hydraxcel import hydraxcel_main, LoggingPlatform

@dataclass
class ModelConfig:
    type: str = "ResNet50"
    pretrained: bool = True

@dataclass
class TrainConfig:
    epochs: int = 10
    batch_size: int = 32
    learning_rate: float = 1e-3
    model: ModelConfig = ModelConfig()

@hydraxcel_main(
    "MyProject",
    config_class=TrainConfig,
    output_dir_keys=["batch_size"],
    logging_platform=LoggingPlatform.WANDB,
)
def main(cfg):
    print(f"Training for {cfg.epochs} epochs on batch size {cfg.batch_size}...")

Run examples:

uv run train
uv run train learning_rate=0.0005 batch_size=64 model.type="ResNet101"
uv run train -m epochs=5,10,20

(Using the train script defined in the pyproject.toml.)

2. Using YAML Configs

# conf/train.yaml
epochs: 10
batch_size: 32
learning_rate: 0.001
model:
  type: "ResNet50"
  pretrained: true
# train.py
from hydraxcel import hydraxcel_main

@hydraxcel_main(
    "MyProject",
    hydra_configs_dir="conf",
    output_dir_keys=["model.type"],
    logging_platform="mlflow",
)
def main(cfg):
    print(cfg)

Run:

python train.py
python train.py learning_rate=0.0005 model.type="ResNet101"

3. Launching with Accelerate

For distributed training, use HydraXcel’s launch:

# scripts/__init__.py
from pathlib import Path
from hydraxcel import launch

SCRIPTS_DIR = Path(__file__).parent

launch_train = launch(
    SCRIPTS_DIR / "train.py",
    config_name="accelerate",
)

Add in pyproject.toml:

[project.scripts]
myproject-train = "myproject.scripts:launch_train"

Run with UV:

uv run myproject-train -- accelerate.num_processes=4

HydraXcel provides Accelerate configs (accelerate.yaml, presets for GPU, FP16, etc.) in examples/configs. Dataclasses can also be used to configure accelerate.

4. MLflow Tracking Server

Expose the built-in MLflow server runner:

[project.scripts]
mlflow_server = "hydraxcel.logging:run_mlflow_server"

Run:

uv run mlflow_server

Server at http://127.0.0.1:5000. Override with:

uv run mlflow_server host=0.0.0.0 port=8080

License

HydraXcel is released under the Apache License 2.0. This permissive licence allows free academic and commercial use with attribution, aligning with Hydra and HuggingFace projects.


HydraXcel is a work in progress (v0.1.0a). This README currently serves as the main documentation. Contributions welcome! 🚀