Tutorial: hello-spike — 1k-frame ego-AHT training demo
This tutorial walks through running a tiny ego-AHT training loop end-to-end on a CPU in roughly five minutes. You will see the env step, the partner act, the trainer collect rollouts, and a checkpoint emit — every moving part of the M4b training stack against a small but real task (see ADR-002 §Decisions).
What you will build
A 1000-frame training run with:
- The 2-agent MPE Cooperative-Push env from
chamber.envs.mpe_cooperative_push. - A scripted-heuristic frozen partner from M4a's draft zoo.
- The real ego-PPO trainer
EgoPPOTrainer(M4b-8a) — wraps HARL's HAPPO actor over a hand-rolled MLP critic and rollout buffer. Selected as the default byrun_trainingwhen notrainer_factoryis passed. - A JSONL log carrying the per-step provenance bundle (run id, seed,
git SHA, pyproject hash) and
.ptcheckpoints at frames 500 and 1000.
Prerequisites
- A CONCERTO checkout with
make installrun (this installs the HARL fork via the SHA pin inpyproject.tomlfrom PR M4b-6). - No GPU required. Total wall-time: roughly five minutes on a modern laptop CPU.
Run it
Save this as hello_spike.py in a clean directory and run it. It uses
the existing Hydra config under configs/training/ego_aht_happo/ with
a few CLI-style overrides — total_frames, checkpoint_every, and
happo.rollout_length are scaled down so the demo fits a five-minute
laptop CPU budget; the production defaults from
configs/training/ego_aht_happo/mpe_cooperative_push.yaml are 100k /
10k / 1024 respectively.
from pathlib import Path
from chamber.benchmarks.training_runner import run_training
from concerto.training.config import load_config
repo_root = Path(__file__).resolve().parent
configs_dir = repo_root / "configs"
cfg = load_config(
config_path=configs_dir
/ "training"
/ "ego_aht_happo"
/ "mpe_cooperative_push.yaml",
overrides=[
"total_frames=1000",
"checkpoint_every=500",
"happo.rollout_length=250",
"artifacts_root=./hello_spike_artifacts",
"log_dir=./hello_spike_logs",
],
)
result = run_training(cfg, repo_root=repo_root)
curve = result.curve # result is a TrainingResult NamedTuple (curve + trainer)
print(f"run_id={curve.run_id}")
print(f"steps={len(curve.per_step_ego_rewards)}")
print(f"episodes={len(curve.per_episode_ego_rewards)}")
print(f"checkpoints={len(curve.checkpoint_paths)}")
You should see output similar to:
run_id=4f1e8a2c0a3d62e5
steps=1000
episodes=20
checkpoints=2
The exact run_id will differ from machine to machine because the
provenance bundle includes the current git SHA + the hash of
pyproject.toml (see RunContext).
Inspect the run
The JSONL log is a self-contained record: every line is a valid JSON
object carrying the run-level provenance plus the per-event step,
event name, and any payload fields. Substitute the run_id printed by
the script for <run-id> below.
head -2 hello_spike_logs/<run-id>.jsonl
Each checkpoint has a .pt payload plus a .pt.json sidecar with the
SHA-256 integrity digest and the producing run's metadata
(see save_checkpoint / load_checkpoint).
The local://artifacts/... URI scheme resolves under
<artifacts_root>/artifacts/, so the on-disk path doubles the
artifacts/ segment — that is intentional and grep-able:
ls hello_spike_artifacts/artifacts
What's next
This 1k-frame demo validates that the trainer plumbing works end-to-end
but is too short to demonstrate learning. The empirical-guarantee
experiment runs the same setup at 100k frames against the production
config in configs/training/ego_aht_happo/mpe_cooperative_push.yaml
and asserts the per-episode reward has a statistically significant
positive learning slope (one-sided test, alpha=0.05). The trip-wire is
ADR-002 risk-mitigation #1 — a loud signal if the on-policy PPO
substrate under a frozen partner fails to learn. The science narrative
+ the resulting plot live in
docs/explanation/why-aht.md; reproduce
it locally with make empirical-guarantee (~10 seconds CPU
wall-time).
To opt out of the default and run with the parameter-free
RandomEgoTrainer reference (smoke fixture only —
does not learn) instead, construct it explicitly and pass it as
trainer_factory to run_training.
On a GPU host
The same Hydra config + the same run_training call drive the ADR-001
Stage-0 rig-validated env (panda_wristcam + fetch +
allegro_hand_right) on a Vulkan-capable Linux box. The only
overrides are env.task=stage0_smoke and (under the runtime block,
added in M4b-9b) runtime.device=cuda; the
configs/training/ego_aht_happo/stage0_smoke.yaml config ships those
defaults. On a CPU-only host the env constructor surfaces a
ChamberEnvCompatibilityError from
make_stage0_training_env, so the tutorial above is
the right entry point for Mac / CPU users.
Reproduction recipe (Dockerfile, prerequisites, the zoo-seed run that
publishes the M4a partner checkpoint) lives in
docs/how-to/run-on-gpu.md, wired up in
M4b-9b.