API Reference
Generated from docstrings via mkdocstrings.
CONCERTO — the method
concerto.api
Public Protocols for CONCERTO — the method-level contracts.
All concrete implementations (in chamber.* or concerto.safety,
concerto.training) depend on these Protocols, not on each other.
See ADR-INDEX.md for the full decision record; individual modules in this
package cite the specific ADR they implement.
concerto.safety
CONCERTO safety stack — exp CBF-QP + conformal overlay + OSCBF + braking.
Implements the three-layer safety filter decided in ADR-004 (Wang-Ames-Egerstedt 2017 + Huriot & Sibai 2025 + Morton & Pavone 2025), the per-task numeric bounds plumbing from ADR-006 §Decision, and the three-table reporting format from ADR-014 §Decision.
Public surface (see :mod:concerto.safety.api for full docstrings):
- :class:
Bounds— per-task numeric envelope (ADR-006). - :class:
SafetyState— mutable conformal-CBF state (ADR-004). - :class:
FilterInfo— telemetry payload (ADR-014). - :class:
EgoOnlySafetyFilter, :class:JointSafetyFilter— typed Protocols for the two filter shapes (ADR-004 §Public API; spike_004A §Three-mode taxonomy). The legacy single :data:SafetyFilteralias remains importable but emits a :class:DeprecationWarningon first use; removal target 0.3.0. - :class:
ConcertoSafetyInfeasible— QP-infeasibility error (ADR-004). - :class:
EmergencyController— per-uid embodiment hook for the hard braking fallback (ADR-004 risk-mitigation #1). - :class:
CartesianAccelEmergencyController— default Cartesian sum-and-saturate override for double-integrator agents (ADR-004 risk-mitigation #1). - :data:
DEFAULT_EPSILON, :data:DEFAULT_ETA, :data:DEFAULT_WARMUP_STEPS— conformal defaults (ADR-004 §Decision). - :func:
solve_qp_stub— small Clarabel QP benchmark used by the M2 saturation guard test (T3.10 replaced the no-op M2 stub with a real solve at a name-preserved entry point).
FilterInfo
module-attribute
FilterInfo = TypedDict(
"FilterInfo",
{
"lambda": LambdaDict,
"constraint_violation": FloatArray,
"prediction_gap_loss": LambdaDict | None,
"fallback_fired": bool,
"qp_solve_ms": float,
},
)
Telemetry payload returned by the filter Protocols (ADR-014 §Decision).
Returned by :meth:EgoOnlySafetyFilter.filter and
:meth:JointSafetyFilter.filter.
The fields populate the three-table renderer's row data:
lambda— current conformal slack as a :data:LambdaDict(per-pair UID-tuple keys →float); feeds Table 3 (conservativeness gap λ mean/var vs. oracle gt/noLearn). Issue #144 promoted this field from a canonical-pair-order ndarray to the dict form so the per-pair indexing is part of the type system; see :func:lambda_to_jsonable/ :func:lambda_from_jsonablefor the ADR-014 v3 wire encoding.constraint_violation— per-pair non-negative violation signalmax(0, -h_ij)from the CBF backbone. Zero when the constraint is satisfied; positive when the barrier is in the violated half-space. Counted into Table 2's per-condition violation rates. Stays as a :class:numpy.ndarrayindexed by the canonical pair-iteration order produced by :func:make_pair_keysover the filter's UID set — issue #144's structural fix targetslambdaonly.prediction_gap_loss— per-pair Huriot & Sibai 2025 §IV.A lossmax(0, predicted_h - actual_h)against a partner-trajectory predictor as a :data:LambdaDict(same keying aslambda). This is the signal that drives the conformal update (update_lambda); :data:Nonewhen no predictor is wired (the CBF backbone alone cannot compute it — seeconcerto.safety.conformal.update_lambda_from_predictor).fallback_fired— feeds Table 2's "fallback fired" column.qp_solve_ms— feeds the OSCBF 1 kHz target check (ADR-004 §"OSCBF target").
The constraint-violation and prediction-gap loss are reported as
separate cells of the ADR-014 three-table report because they answer
different questions: constraint_violation is a per-step CBF gap
(did the QP keep us in the safe set this step?), and
prediction_gap_loss is the conformal calibration signal (was the
predictor's forecast of the safe set wrong?).
AgentControlModel
Bases: Protocol
Per-agent action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A).
The CBF row generator builds pairwise constraints in Cartesian / safety space, then projects to per-agent action-space coefficients via this Protocol. The Protocol mediates between each agent's action space (joint torques for a 7-DOF arm, wheel velocities for a 2-DOF base, Cartesian accelerations for a point mass) and the shared safety space (Cartesian acceleration of the agent's safety body).
Implementations MUST be deterministic on identical (state,
action) inputs (P6; seeding contract) and MUST be consistent
with each other: for any state and any Cartesian acceleration
a in the image of :meth:action_to_cartesian_accel,
action_to_cartesian_accel(state,
cartesian_accel_to_action(state, a)) == a to within floating-
point tolerance. The right-inverse exists by construction for
over-actuated agents (Jacobian has more columns than rows); for
exactly-actuated agents the right-inverse is the unique inverse.
Attributes:
-
uid(str) –Unique identifier of the agent whose model this is.
-
action_dim(int) –Dimensionality of the agent's action vector (e.g. 7 for a 7-DOF arm; 2 for a planar double integrator).
-
position_dim(int) –Cartesian dimension of the agent's safety body (typically 2 for the §V toy crossing, 3 for manipulation).
Source code in src/concerto/safety/api.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | |
action_dim
property
action_dim: int
Dimensionality of the action vector (ADR-004 §Decision).
position_dim
property
position_dim: int
Cartesian dimension of the safety body (ADR-004 §Decision).
uid
property
uid: str
Unique identifier of the agent (ADR-004 §Decision).
action_to_cartesian_accel
action_to_cartesian_accel(
state: AgentSnapshot, action: FloatArray
) -> FloatArray
Map a control-space action to a Cartesian acceleration (ADR-004 §Decision).
Parameters:
-
state(AgentSnapshot) –Current :class:
concerto.safety.cbf_qp.AgentSnapshot. The mapping is state-dependent for any embodiment whose Jacobian depends on configuration (e.g. a 7-DOF arm). -
action(FloatArray) –Action vector of shape
(action_dim,), dtypefloat64.
Returns:
-
FloatArray–Cartesian acceleration of shape
(position_dim,), dtype -
FloatArray–float64.
Source code in src/concerto/safety/api.py
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 | |
cartesian_accel_to_action
cartesian_accel_to_action(
state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray
Right-inverse for QP projection back into action space (ADR-004 §Decision).
Used by the CBF-QP to express the per-agent action-space slot
coefficients of a Cartesian constraint row, and (via
:class:~concerto.safety.emergency.JacobianEmergencyController)
by the braking fallback to translate the Cartesian repulsion
target into joint-space torques. For an over-actuated agent the
right-inverse selects one of the many actions that realise the
same Cartesian acceleration; the damped-least-squares
pseudo-inverse is the canonical choice (see
:class:JacobianControlModel) and is the single source of
truth for the Cartesian-to-joint map across the CBF row
projection and the emergency fallback (ADR-004 §Decision; P1.02).
Parameters:
-
state(AgentSnapshot) –Current :class:
concerto.safety.cbf_qp.AgentSnapshot. -
cartesian_accel(FloatArray) –Cartesian acceleration of shape
(position_dim,), dtypefloat64.
Returns:
-
FloatArray–Action vector of shape
(action_dim,), dtypefloat64.
Source code in src/concerto/safety/api.py
582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 | |
max_cartesian_accel
max_cartesian_accel(bounds: Bounds) -> float
Per-agent Cartesian acceleration capacity (ADR-004 §Decision; spike_004A).
Used by the QP-row generator to size the per-pair
alpha_pair = alpha_i_cart + alpha_j_cart and to drive the
proportional Wang-Ames-Egerstedt 2017 §IV budget split for
heterogeneous embodiments. For a double-integrator agent
whose action space is Cartesian acceleration this is
bounds.cartesian_accel_capacity (the L2 magnitude cap).
Parameters:
-
bounds(Bounds) –Per-task :class:
Boundsenvelope.
Returns:
-
float–Strictly positive scalar acceleration capacity.
Source code in src/concerto/safety/api.py
609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 | |
Bounds
dataclass
Per-task numeric bounds for the safety stack (ADR-006 §Decision Option C).
The Bounds envelope is the explicit-numeric half of ADR-006's hybrid
decision: bounds gate QP feasibility and reportability while the
conformal slack (:class:SafetyState) governs online adaptation. The
enumerated values track ADR-006 §Consequences — the comm-latency bound
is anchored to the URLLC-3GPP-R17 sweep table from M2;
action_linf_component, cartesian_accel_capacity, and
action_rate are task-specific (Phase-1 fills them; Phase-0 ships
sane defaults). The force_limit field is a Phase-0 stub for
ADR-007 Open Question #4 (per-vendor decomposition, ADR-004 Open
Question #5) — a strategy-interfaced per-vendor handler slots in once
Stage-3 SA resolves the open question.
Field split (issue #146, closed by P1.02; 2026-05-17). The prior
single action_norm field was consumed by two safety layers with
mismatched semantics (L-infinity per-component in the CBF-QP outer
filter; L2 magnitude cap in the Cartesian emergency controller). The
field is now split into two explicit fields, each named for its
semantics:
action_linf_component— per-component L-infinity bound on the action vector (|u[k]| <= action_linf_componentfor every componentk). Enforced by :class:concerto.safety.cbf_qp.ExpCBFQP's action-space bound rows and (post-Jacobian) by :class:~concerto.safety.emergency.JacobianEmergencyController's per-joint clip.cartesian_accel_capacity— L2 magnitude cap on the Cartesian acceleration. The semantic source of the per-agentalphain the Wang-Ames-Egerstedt 2017 §III barrier value (alpha_pair = 2 * cartesian_accel_capacityfor symmetric agents) and of the :class:~concerto.safety.emergency.CartesianAccelEmergencyController's saturation step.
For a d-dimensional action the two are related but not equal:
an action drawn from the L-infinity envelope has worst-case L2
magnitude sqrt(d) * action_linf_component. The conservative
operator pattern is cartesian_accel_capacity =
action_linf_component (matches the pre-split semantics under the
safe-operator workaround the 2026-05-16 docstring named); a less
conservative deployment sets cartesian_accel_capacity =
sqrt(d) * action_linf_component to use the full envelope. Either
pin is the operator's call; the safety stack treats the two fields
as independent inputs.
Attributes:
-
action_linf_component(float) –Per-component L-infinity bound enforced by the CBF-QP outer filter (:class:
concerto.safety.cbf_qp.ExpCBFQP) and as the post-Jacobian per-joint clip in :class:~concerto.safety.emergency.JacobianEmergencyController. -
cartesian_accel_capacity(float) –L2 magnitude cap on Cartesian acceleration. Source of the per-agent
alphaconsumed by the conformal-barrieralpha_pairand the saturation step in :class:~concerto.safety.emergency.CartesianAccelEmergencyController. -
action_rate(float) –Maximum :math:
\lVert u_k - u_{k-1} \rVert_2per agent, per task. Bounds the actuator-rate slack the conformal CBF must remain feasible under (ADR-006 §Decision). -
comm_latency_ms(float) –Default mean comm latency, in milliseconds. Set from the URLLC profile selected for the run (ADR-006 §Consequences;
chamber.comm.URLLC_3GPP_R17). -
force_limit(float) –Global force threshold for the per-pair contact-force constraint (ADR-004 Open Question #5; per-vendor handler slots in once Stage-3 SA resolves the decomposition).
Source code in src/concerto/safety/api.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 | |
CartesianAccelEmergencyController
Cartesian-acceleration emergency override (ADR-004 risk-mitigation #1).
Default :class:EmergencyController for double-integrator agents
whose control dimension equals the Cartesian dimension (2D and 3D
point masses; the toy crossing example in Wang-Ames-Egerstedt 2017
§V). The aggregation rule is:
- Sum the per-pair Cartesian repulsion unit vectors.
- Saturate the result to
bounds.cartesian_accel_capacityin the net push-apart direction: when the sum has non-trivial magnitude, rescale it tobounds.cartesian_accel_capacity(max emergency push); when it cancels out, return zero. - Return the saturated vector — its shape matches
agent_state.position.shapeby construction.
Single-pair case (one dangerous partner): the sum is a single unit
vector with norm 1; the output magnitude is
bounds.cartesian_accel_capacity, matching the original
toy-crossing contract. Two opposing pairs (uid squeezed in the
middle): the sum cancels to ~zero and the output is a zero action
— there is no preferred direction to push in, and a spurious push
would be worse than a no-op. Two non-opposing pairs: the sum
points in the net escape direction and the output is
bounds.cartesian_accel_capacity along that direction.
Not correct for 7-DOF manipulators: see
:class:JacobianEmergencyController and the ADR-007 Stage-1 AS
spike scope.
Source code in src/concerto/safety/emergency.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
compute_override
compute_override(
agent_state: AgentSnapshot,
pairwise_repulsion_vectors: list[FloatArray],
bounds: Bounds,
) -> FloatArray
Sum-and-saturate Cartesian repulsion vectors (ADR-004 risk-mitigation #1).
Parameters:
-
agent_state(AgentSnapshot) –This uid's snapshot; only its
position.shapeis used (the override's output shape). -
pairwise_repulsion_vectors(list[FloatArray]) –At least one Cartesian unit vector pointing away from each dangerous partner.
-
bounds(Bounds) –Per-task :class:
Bounds; the saturation magnitude isbounds.cartesian_accel_capacity(the L2 magnitude cap on Cartesian acceleration). Post-P1.02 the field split closes the prior L-infinity / L2 ambiguity that lived on the unifiedaction_normfield — the per-component L-infinity envelope now lives onbounds.action_linf_componentand is consumed only by the CBF-QP outer filter and (post-Jacobian) by :class:JacobianEmergencyController.
Returns:
-
FloatArray–Saturated Cartesian acceleration along the net push-apart
-
FloatArray–direction with magnitude
bounds.cartesian_accel_capacity; -
FloatArray–or zero when the pairwise vectors cancel out. Shape matches
-
FloatArray–agent_state.position.shape, dtypefloat64.
Source code in src/concerto/safety/emergency.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
ConcertoSafetyInfeasible
Bases: RuntimeError
Raised when the safety QP is infeasible (ADR-004 §Decision; ADR-006 §Risks).
The conformal CBF-QP can become infeasible when partner-induced constraints conflict with within-robot OSCBF constraints under aggressive actuator limits — exactly the open problem named in Garg et al. 2024 §7.3 and tracked as ADR-006 R5. Callers MUST route to the braking fallback (ADR-004 risk-mitigation #1) on this error; the conformal QP is not a valid recovery path because Theorem 3 (Huriot & Sibai 2025) only bounds the average loss.
Source code in src/concerto/safety/errors.py
25 26 27 28 29 30 31 32 33 34 35 | |
DoubleIntegratorControlModel
dataclass
Identity action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A §Reduction).
Default :class:AgentControlModel for double-integrator agents
whose action space coincides with Cartesian acceleration (the
Wang-Ames-Egerstedt 2017 §V toy crossing and every Phase-0
homogeneous test). The Jacobian J_i = d cartesian_accel /
d action is the identity, so the CBF row coefficients projected
through this model are element-wise identical to the rows the
pre-refactor _pair_constraint_row emitted. This is the
migration shim: tests that construct homogeneous 2-D pairs pass
a :class:DoubleIntegratorControlModel per uid and run unchanged
through :class:ExpCBFQP in :attr:SafetyMode.CENTRALIZED mode.
Attributes:
-
uid(str) –Unique identifier of the agent.
-
action_dim(int) –Dimensionality of the agent's action vector (equal to
position_dimby construction).
Source code in src/concerto/safety/api.py
628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 | |
position_dim
property
position_dim: int
Cartesian dimension of the safety body (ADR-004 §Decision).
For a double integrator the action and position spaces have the same dimension by definition.
action_to_cartesian_accel
action_to_cartesian_accel(
state: AgentSnapshot, action: FloatArray
) -> FloatArray
Return action unchanged (ADR-004 §Decision; spike_004A §Reduction).
The double-integrator identity map.
Source code in src/concerto/safety/api.py
661 662 663 664 665 666 667 668 669 670 671 672 673 | |
cartesian_accel_to_action
cartesian_accel_to_action(
state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray
Return cartesian_accel unchanged (ADR-004 §Decision; spike_004A §Reduction).
The double-integrator identity right-inverse.
Source code in src/concerto/safety/api.py
675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 | |
max_cartesian_accel
max_cartesian_accel(bounds: Bounds) -> float
Return bounds.cartesian_accel_capacity (ADR-004 §Decision; spike_004A).
For a double integrator the L2 magnitude cap is the Cartesian
acceleration capacity by definition of the embodiment. The
Wang-Ames-Egerstedt 2017 §III barrier formula's alpha is a
Cartesian acceleration capacity (it appears inside
sqrt(2 * alpha * ...)); pre-split this read
bounds.action_norm under the operator pattern
action_norm = capacity / sqrt(d), but the post-split field
names this semantic explicitly (ADR-004 §Revision history
2026-05-17 + ADR-INDEX footnote (a)).
Source code in src/concerto/safety/api.py
691 692 693 694 695 696 697 698 699 700 701 702 703 704 | |
EgoOnlySafetyFilter
Bases: Protocol
Contract for an :attr:SafetyMode.EGO_ONLY safety-filter pipeline (ADR-004 §Public API).
The deployment / ad-hoc / black-box-partner mode (ADR-006 §Decision;
reviewer P0-3). The QP decides only the ego agent's action; the
partner's motion enters as a predicted disturbance on the
constraint right-hand side. The structural difference from
:class:JointSafetyFilter is wire-visible: proposed_action is a
single :class:FloatArray (the ego's nominal control), and the
return value's first element is likewise an array — never a dict.
Implementations compose the three layers from ADR-004 — exp CBF-QP backbone (Wang-Ames-Egerstedt 2017), conformal slack overlay (Huriot & Sibai 2025), OSCBF inner filter (Morton & Pavone 2025) — plus the hard braking fallback (ADR-004 risk-mitigation #1) into a single propose-check-replan boundary that wraps any nominal controller without accessing its internals (the property the black-box-partner setting requires per ADR-006 §Decision).
The canonical implementation is
:meth:concerto.safety.cbf_qp.ExpCBFQP.ego_only; third-party
plugin filters that target ego-only deployment SHOULD subclass /
structurally-conform to this Protocol so static checkers can
catch shape mismatches at construction time (spike_004A
§Three-mode taxonomy; reviewer P0-2).
Source code in src/concerto/safety/api.py
827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 | |
filter
filter(
proposed_action: FloatArray,
obs: dict[str, object],
state: SafetyState,
bounds: Bounds,
*,
ego_uid: str,
partner_predicted_states: dict[str, AgentSnapshot],
dt: float,
) -> tuple[FloatArray, FilterInfo]
Project the nominal ego action onto the safe set (ADR-004 §Public API).
Parameters:
-
proposed_action(FloatArray) –Ego nominal control input of shape
(control_models[ego_uid].action_dim,). The QP minimises||u - u_hat||^2with this as the reference (Wang-Ames-Egerstedt 2017 eq. 9). -
obs(dict[str, object]) –Observation dict; the filter reads
obs["comm"](M2 contract — seechamber.comm.api.CommPacket) andobs["meta"]["partner_id"](M4 contract;None⇒ single-partner mode, no hard fail; identity change ⇒ resetlambdatolambda_safeand enter warmup per ADR-004 risk-mitigation #2). -
state(SafetyState) –Mutable :class:
SafetyState; updated in place by the conformal layer each control step. -
bounds(Bounds) –Per-task :class:
Boundsenvelope (ADR-006 §Decision). -
ego_uid(str) –Identifies which uid in
obs["agent_states"]is the ego. -
partner_predicted_states(dict[str, AgentSnapshot]) –Per-partner-uid predicted :class:
concerto.safety.cbf_qp.AgentSnapshotat the next control step, used to evaluate the partner's predicted Cartesian acceleration; this enters the constraint RHS as a known drift term. -
dt(float) –Predictor lookahead horizon in seconds (typically the env's control-step duration). Used to convert the predicted velocity delta into a Cartesian acceleration
(v_pred - v_now) / dton the constraint RHS (ADR-004 §Decisions, "Predicted-acceleration units"; external-review P0-1, 2026-05-16). Must be strictly positive and consistent with the value passed to the partner-trajectory predictor (e.g. :func:concerto.safety.conformal.constant_velocity_predict).
Returns:
-
FloatArray–A pair
(safe_action, info)wheresafe_actionis the -
FilterInfo–QP-projected ego action and
infois a -
tuple[FloatArray, FilterInfo]–class:
FilterInfotelemetry payload feeding the ADR-014 -
tuple[FloatArray, FilterInfo]–three-table renderer.
Raises:
-
ConcertoSafetyInfeasible–When the QP is infeasible even after slack relaxation (ADR-004 §Decision; ADR-006 §Risks). The caller MUST route to the braking fallback in this case.
Source code in src/concerto/safety/api.py
867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 | |
reset
reset(*, seed: int | None = None) -> None
Reset filter state at episode start (ADR-004 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional root seed for the filter's deterministic RNG substream (P6). Implementations route this through
concerto.training.seeding.derive_substreamso two CPU runs with the same seed produce byte-identical outputs.
Source code in src/concerto/safety/api.py
855 856 857 858 859 860 861 862 863 864 865 | |
EmergencyController
Bases: Protocol
Per-uid emergency-override hook (ADR-004 risk-mitigation #1).
Translates the aggregate per-uid repulsion (summed across every
dangerous pair the uid is involved in) into a control-space override
action. The braking fallback consults one controller per uid so
heterogeneous embodiments (double-integrator base, 7-DOF arm,
manipulator end-effector) can supply their own Cartesian-to-control
mapping without :func:concerto.safety.braking.maybe_brake needing
to know the embodiment class.
Implementations MUST be deterministic on identical inputs (P6; seeding contract) and MUST return an array whose shape matches the control-action shape the env expects for this uid — not necessarily the Cartesian shape. A 7-DOF arm controller returns a 7-vector; a 2D double-integrator controller returns a 2-vector.
Source code in src/concerto/safety/emergency.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
compute_override
compute_override(
agent_state: AgentSnapshot,
pairwise_repulsion_vectors: list[FloatArray],
bounds: Bounds,
) -> FloatArray
Return the embodiment-specific emergency override (ADR-004 risk-mitigation #1).
Parameters:
-
agent_state(AgentSnapshot) –This uid's :class:
AgentSnapshotat the current control step. Implementations may read position and velocity to build a Jacobian or any embodiment-specific mapping. The Cartesian default ignores it (the repulsion vectors already carry the direction information). -
pairwise_repulsion_vectors(list[FloatArray]) –List of unit-magnitude Cartesian repulsion vectors, one per dangerous pair this uid is involved in (sign convention: each vector points away from the partner in that pair). The list has at least one element — uids with zero dangerous pairs are not routed through this hook.
-
bounds(Bounds) –Per-task :class:
Bounds; how each field caps the override is up to the implementation. The Cartesian controller usesbounds.cartesian_accel_capacityto L2-cap the saturated direction; the Jacobian controller additionally usesbounds.action_linf_componentfor a post-transform per-joint clip.
Returns:
-
FloatArray–Override action in this uid's control space,
float64.
Source code in src/concerto/safety/emergency.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
JacobianControlModel
dataclass
Jacobian-aware action ↔ Cartesian map skeleton (ADR-004 §Decision; ADR-007 Stage 1 AS).
The 7-DOF arm case for the AS axis. Takes a Jacobian callable
jacobian_fn(state) -> J (Cartesian-to-action shape
(position_dim, action_dim)) and a damped-least-squares
pseudo-inverse for the right-inverse (Nakamura & Hanafusa 1986;
the singularity-robust variant the OSCBF inner filter already
relies on per Morton & Pavone 2025 §IV).
Stage-1 AS spike will exercise this on the 7-DOF arm; the
skeleton raises :class:NotImplementedError on
:meth:action_to_cartesian_accel unless a Jacobian callable is
supplied via jacobian_fn. The same loud-fail discipline
ADR-004 §Risks established for
:class:concerto.safety.emergency.JacobianEmergencyController
applies here: 7-DOF uids must not silently receive a Cartesian-
shaped action by accident.
Attributes:
-
uid(str) –Unique identifier of the agent.
-
action_dim(int) –Dimensionality of the action vector (e.g. 7 for a Franka arm).
-
position_dim(int) –Cartesian dimension of the safety body (typically 3 for manipulation).
-
jacobian_fn(Callable[[AgentSnapshot], FloatArray] | None) –Optional callable mapping :class:
AgentSnapshotto the Jacobian matrix of shape(position_dim, action_dim). When :data:None, both mapping methods raise :class:NotImplementedError. -
damping(float) –Damped-least-squares parameter; default :data:
DEFAULT_DLS_DAMPING. -
max_cartesian_accel_value(float) –Pre-computed scalar capacity (kg-free acceleration units). Defaults to :data:
numpy.nanso the placeholder fails loudly on attempted use; supply the embodiment-specific value at construction time.
Source code in src/concerto/safety/api.py
714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 | |
action_to_cartesian_accel
action_to_cartesian_accel(
state: AgentSnapshot, action: FloatArray
) -> FloatArray
Apply J(state) @ action (ADR-004 §Decision; ADR-007 Stage 1 AS).
Raises:
-
NotImplementedError–When
jacobian_fnwas not supplied at construction time. Stage-1 AS spike delivers the concrete kinematic model; until then 7-DOF uids fail loudly per ADR-004 §Risks.
Source code in src/concerto/safety/api.py
760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 | |
cartesian_accel_to_action
cartesian_accel_to_action(
state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray
Apply the damped-least-squares pseudo-inverse (ADR-004 §Decision; ADR-007 Stage 1 AS).
J^T (J J^T + damping^2 I)^{-1} a — Nakamura & Hanafusa
1986 singularity-robust right-inverse, identical in form to
the OSCBF inner filter's Jacobian handling.
Raises:
-
NotImplementedError–When
jacobian_fnwas not supplied at construction time.
Source code in src/concerto/safety/api.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 | |
max_cartesian_accel
max_cartesian_accel(bounds: Bounds) -> float
Return the pre-computed Cartesian acceleration capacity (ADR-004 §Decision).
Raises:
-
ValueError–When
max_cartesian_accel_valuewas left at its placeholder :data:numpy.nan. Stage-1 AS spike supplies the embodiment-specific value.
Source code in src/concerto/safety/api.py
808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 | |
JointSafetyFilter
Bases: Protocol
Joint-action filter contract for CENTRALIZED / SHARED_CONTROL (ADR-004 §Public API).
The oracle / ablation baseline mode plus the lab-only co-control
mode (ADR-014 three-table report upper-bound row; reviewer P0-3
mode 3). The QP decides every uid's action; per-uid slot widths
come from control_models[uid].action_dim (no implicit "first
uid's shape" fallback per reviewer P0-2). The structural
difference from :class:EgoOnlySafetyFilter is wire-visible:
proposed_action is a dict[uid, FloatArray], and the
return value's first element is likewise a dict.
The canonical implementations are
:meth:concerto.safety.cbf_qp.ExpCBFQP.centralized and
:meth:concerto.safety.cbf_qp.ExpCBFQP.shared_control.
Source code in src/concerto/safety/api.py
925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 | |
filter
filter(
proposed_action: dict[str, FloatArray],
obs: dict[str, object],
state: SafetyState,
bounds: Bounds,
*,
ego_uid: str | None = None,
partner_action_bound: float | None = None,
) -> tuple[dict[str, FloatArray], FilterInfo]
Project the per-uid nominal actions onto the safe set (ADR-004 §Public API).
Parameters:
-
proposed_action(dict[str, FloatArray]) –Per-agent nominal control inputs, keyed by uid. The QP minimises
||u - u_hat||^2with this as the reference (Wang-Ames-Egerstedt 2017 eq. 9). -
obs(dict[str, object]) –Observation dict; same contract as :meth:
EgoOnlySafetyFilter.filter. -
state(SafetyState) –Mutable :class:
SafetyState; updated in place by the conformal layer each control step. -
bounds(Bounds) –Per-task :class:
Boundsenvelope (ADR-006 §Decision). -
ego_uid(str | None, default:None) –Required for :attr:
SafetyMode.SHARED_CONTROL, ignored for :attr:SafetyMode.CENTRALIZED. Identifies which uid inproposed_actionis the ego whose action is unconstrained bypartner_action_bound. -
partner_action_bound(float | None, default:None) –Required for :attr:
SafetyMode.SHARED_CONTROL, ignored for :attr:SafetyMode.CENTRALIZED. Strictly positive L-infinity ball radius around each non-ego uid's proposed action.
Returns:
-
dict[str, FloatArray]–A pair
(safe_action, info)matching the input shape -
FilterInfo–(one entry per uid in
proposed_action).
Raises:
-
ConcertoSafetyInfeasible–When the QP is infeasible even after slack relaxation (ADR-004 §Decision; ADR-006 §Risks).
Source code in src/concerto/safety/api.py
951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 | |
reset
reset(*, seed: int | None = None) -> None
Reset filter state at episode start (ADR-004 §Decision).
See :meth:EgoOnlySafetyFilter.reset for the determinism
contract (P6 seeding).
Source code in src/concerto/safety/api.py
943 944 945 946 947 948 949 | |
SafetyMode
Bases: Enum
Safety-filter operating mode (ADR-004 §Decision; spike_004A §Three-mode taxonomy).
Three modes make the filter's who is being optimised contract explicit at the constructor site, so callsites cannot accidentally request the oracle solve at deployment (reviewer P0-3):
- :attr:
EGO_ONLY— deployment default. QP decision variable is the ego agent's action only; the partner's motion enters as a predicted disturbance on the constraint RHS. This is the ad-hoc / black-box-partner setting ADR-006 §Decision pins. - :attr:
CENTRALIZED— oracle / ablation only. QP decision variable is the concatenation of every uid's action, with per-uid slot widths set bycontrol_models[uid].action_dim(no implicit "first uid's shape" fallback). Used as the upper-bound baseline in ADR-014's three-table report. Never used at evaluation against ad-hoc partners. - :attr:
SHARED_CONTROL— lab-only baseline (reviewer P0-3 mode 3). Same shape as :attr:CENTRALIZEDwith an additionalpartner_action_boundparameter that restricts the partner's adjustment magnitude. Measures how much of the centralised headroom comes from co-control budget vs from partner co-operation.
Mode is a constructor argument, not a per-call flag: the
conformal slack vector SafetyState.lambda_ is sized per pair,
and the pair set differs between :attr:EGO_ONLY (one row per
partner) and :attr:CENTRALIZED (full pairwise table). Flipping
mode mid-episode would silently invalidate the conformal state.
A mode change requires constructing a new filter with a fresh
:class:SafetyState.
Source code in src/concerto/safety/api.py
483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 | |
SafetyState
dataclass
Mutable conformal CBF state (Huriot & Sibai 2025 §IV; ADR-004 §Decision).
The conformal slack dict lambda_ carries one entry per agent
pair, keyed by a lexicographically-ordered UID tuple
(uid_a, uid_b) with uid_a < uid_b. Each entry is updated
each control step by
:func:concerto.safety.conformal.update_lambda according to
Theorem 3's rule
lambda_{k+1} = lambda_k + eta * (eps - l_k). ADR-004
risk-mitigation #2 motivates the partner-swap warmup window: on
partner identity change (detected via obs["meta"]["partner_id"])
the state is reset to lambda_safe (the value guaranteeing QP
feasibility under the worst-case bounded prediction error per ADR-006
Assumption A2) and the next warmup_steps_remaining steps run with
a tighter eps.
Issue #144 / ADR-014 v3 (2026-05-19) promoted lambda_ from a
canonical-pair-order-indexed :class:numpy.ndarray to a
:data:LambdaDict keyed by canonical UID-pair tuples. The
structural fix subsumes the implicit pair-index alignment
invariant the prior Phase-0 amendment (canonical_pair_order)
enforced via runtime asserts: a caller that rebuilds the snapshot
dict in a different insertion order still keys into the same
canonical tuple, so per-pair slack cannot silently misalign with
the pairwise CBF constraint.
Attributes:
-
lambda_(LambdaDict) –Per-pair slack as a :data:
LambdaDict. Construct via :func:make_lambda_dict(or directly when the canonical pair set is already in hand). Mutated in place by :func:concerto.safety.conformal.update_lambda. -
epsilon(float) –Target average loss. ADR-004 §Decision pins the default to
-0.05(conservative manipulation regime). -
eta(float) –Conformal learning rate (default
0.01). -
warmup_steps_remaining(int) –Decremented each step while in warmup; zero outside the warmup window (ADR-004 risk-mitigation #2).
Source code in src/concerto/safety/api.py
383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
__getattr__
__getattr__(name: str) -> object
Re-export the deprecated :data:SafetyFilter alias from the api submodule.
Forwards the lookup so from concerto.safety import SafetyFilter
surfaces the same :class:DeprecationWarning as from
concerto.safety.api import SafetyFilter. ADR-004 §Public API;
spike_004A §Three-mode taxonomy.
Source code in src/concerto/safety/__init__.py
122 123 124 125 126 127 128 129 130 131 132 133 | |
canonical_pair_key
canonical_pair_key(a: str, b: str) -> tuple[str, str]
Return the lexicographically-ordered pair key for (a, b) (ADR-004 §Decision; issue #144).
The canonical key for any agent pair is the unique 2-tuple of UIDs
sorted lexicographically. Callers that already know the canonical
order (e.g. a nested loop with the outer uid lex-smaller than the
inner) can build the tuple directly; this helper is for sites that
take an unordered (uid_a, uid_b) pair (e.g. EGO_ONLY mode's
(ego_uid, partner_uid) rows, where neither side is guaranteed
to be lex-smaller) and need the canonical lookup key for
:data:LambdaDict indexing.
Parameters:
-
a(str) –First UID.
-
b(str) –Second UID (must differ from
a).
Returns:
-
tuple[str, str]–(a, b)ifa < blexicographically, otherwise(b, a).
Raises:
-
ValueError–If
a == b(an agent pair always has two distinct UIDs).
Source code in src/concerto/safety/api.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
canonical_pair_order
canonical_pair_order(uids: Iterable[str]) -> list[str]
Canonical lexicographic UID order for pair iteration (ADR-004 §Decision).
Public utility used by the three pair-iteration entry points
(:func:concerto.safety.conformal.compute_prediction_gap_for_pairs,
:meth:concerto.safety.cbf_qp.ExpCBFQP._filter_ego_only, and
:meth:concerto.safety.cbf_qp.ExpCBFQP._filter_joint); third-party
consumers building their own pair-iteration code over the safety
stack should call this helper to stay aligned with the same
canonical ordering :class:SafetyState lambda_ is indexed
against. Returns
a list of the UIDs sorted lexicographically; downstream code
iterates upper-triangular pairs (uids[i], uids[j]) for i < j
so the pair index for any (uid_a, uid_b) (with
uid_a < uid_b lexicographically) is stable across callers and
across dict reconstruction (external-review P1, 2026-05-16).
Post-issue-#144 the structural fix (lambda_: LambdaDict) makes
the pair-keying part of the type system: lookups go via the
lexicographic tuple key produced by :func:canonical_pair_key, and
callers no longer rely on the canonical ordering to index a flat
array. The helper is retained as a primitive because the
upper-triangular pair iteration in
:func:make_pair_keys and the cbf_qp call-sites still needs a
deterministic uid ordering to build the canonical set of pair
tuples.
Parameters:
-
uids(Iterable[str]) –An iterable of agent UIDs.
Returns:
-
list[str]–A new list containing the UIDs sorted lexicographically.
Raises:
-
ValueError–If the iterable contains duplicates (a partner-set invariant the upstream filter already enforces, surfaced here as a defensive check).
Source code in src/concerto/safety/api.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
lambda_from_jsonable
lambda_from_jsonable(
payload: object, *, uids: Iterable[str] | None = None
) -> LambdaDict
Deserialise a :data:LambdaDict from a JSON-decoded payload (ADR-014 §Decision; issue #144).
Accepts either:
- v3 (current) — a list of structured entries
[{"a": uid_a, "b": uid_b, "value": λ}, ...]as produced by :func:lambda_to_jsonable. Each entry's pair is canonicalised via :func:canonical_pair_keyso a misordered(b, a)entry on the wire is rebuilt under the canonical(a, b)key. - v2 (legacy) — a flat list / sequence of floats of length
n*(n-1)/2plus an explicituidskeyword. The values are assumed to be indexed in the canonical pair-iteration order (the implicit invariant that issue #144 closed by promoting to a dict). Reading this form emits a :class:DeprecationWarningand the legacy reader is targeted for removal in v0.6.0.
Parameters:
-
payload(object) –The JSON-decoded structure (list of dicts for v3, or a list of floats for v2).
-
uids(Iterable[str] | None, default:None) –Required for v2; ignored for v3. The UID set whose canonical pair order indexed the legacy ndarray.
Returns:
-
LambdaDict–A fresh :data:
LambdaDict.
Raises:
-
TypeError–When
payloadis neither a v3 structured list nor a v2 flat float list, or v2 was passed withoutuids. -
ValueError–When a v3 entry lists
a == b(canonical pair keys always have distinct UIDs), or the v2 length does not match the canonical pair count foruids.
Source code in src/concerto/safety/api.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | |
lambda_to_jsonable
lambda_to_jsonable(d: LambdaDict) -> list[dict[str, Any]]
Serialise a :data:LambdaDict for JSON round-trip (ADR-014 §Decision; issue #144).
JSON does not have tuple keys natively. The structured list form
[{"a": uid_a, "b": uid_b, "value": λ}, ...] is the canonical
wire encoding for v3 :data:LambdaDict payloads — pair-keys are
explicit named fields rather than stringified tuples, so a reader
cannot accidentally split or rejoin the pair on the wrong
delimiter.
Parameters:
-
d(LambdaDict) –A :data:
LambdaDict.
Returns:
-
list[dict[str, Any]]–A list of plain-dict entries with
"a","b", and -
list[dict[str, Any]]–"value"fields, ordered by the canonical pair-key order -
from(list[dict[str, Any]]) –func:
make_pair_keysover the dict's UID set so the -
list[dict[str, Any]]–wire encoding is determinstic.
Source code in src/concerto/safety/api.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
make_lambda_dict
make_lambda_dict(
uids: Iterable[str], *, fill: float = 0.0
) -> LambdaDict
Construct a :data:LambdaDict over the canonical pair set (ADR-004 §Decision; issue #144).
The convenience constructor for :class:SafetyState's lambda_
field. Each canonical pair-key produced by :func:make_pair_keys
maps to fill; callers seed every pair to the same scalar
(typically the warm-start lambda_safe from
:func:concerto.safety.conformal.reset_on_partner_swap) and the
conformal layer updates entries in place per step.
Parameters:
-
uids(Iterable[str]) –Iterable of agent UIDs.
-
fill(float, default:0.0) –Initial per-pair value (default
0.0).
Returns:
-
LambdaDict–A fresh :data:
LambdaDictwith one entry per canonical pair.
Source code in src/concerto/safety/api.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
make_pair_keys
make_pair_keys(
uids: Iterable[str],
) -> list[tuple[str, str]]
Build the canonical upper-triangular pair-key list (ADR-004 §Decision; issue #144).
Returns the list of (uid_i, uid_j) tuples with i < j over
the lexicographically-sorted UID order produced by
:func:canonical_pair_order. Used by :func:make_lambda_dict,
the cbf_qp call-sites, and any other consumer that needs to
enumerate the canonical pair set for a uid collection.
Parameters:
-
uids(Iterable[str]) –Iterable of agent UIDs (must not contain duplicates).
Returns:
-
list[tuple[str, str]]–A new list of pair-key tuples in canonical iteration order
-
list[tuple[str, str]]–(outer loop over lex-sorted uids, inner loop over the
-
list[tuple[str, str]]–suffix). The list length is
n*(n-1)/2forndistinct -
list[tuple[str, str]]–uids.
Source code in src/concerto/safety/api.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
solve_qp_stub
solve_qp_stub(
*args: object, **kwargs: object
) -> tuple[float, float]
Real CBF-QP benchmark solve (T3.10; ADR-004 §"OSCBF target").
T3.10 replaces M2's no-op stub with an actual Clarabel solve on a
small representative QP. The function name is preserved for
backward compatibility with tests/property/test_qp_saturation.py
(the M2 acceptance gate) per plan/03 §8 + the M3 brief Hard Rule
4: the saturation guard's regime check (drop_rate >= 10 % or
latency >= 100 ms; ADR-006 §Risks R5) is independent of the QP
body, so after replacement it still fires only at the URLLC
saturation profile, not at lossy.
The QP is pre-built at module load and the solver instance is
module-level so the per-call hot path is just the Clarabel solve
— no allocation or instantiation overhead. The *args /
**kwargs signature is preserved for the saturation_guard
contract (callers pass no args; the placeholder forwarding kept
the signature open for future per-call inputs).
Parameters:
-
*args(object, default:()) –Ignored (M2 backward-compat surface).
-
**kwargs(object, default:{}) –Ignored.
Returns:
-
float–(decision_norm, elapsed_seconds)matching the M2 stub -
float–contract; the M2 saturation guard reads only the timing
-
tuple[float, float]–envelope so the decision value is informational.
Source code in src/concerto/safety/__init__.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |
concerto.training
CONCERTO training stack — ego-AHT HAPPO wrapper + seeding + logging.
Implements the frozen-partner ego-AHT training algorithm from ADR-002 using the HARL fork (ADR-002 §"HARL dependency"). The seeding module provides the determinism harness required by project principle P6.
CheckStatus
Bases: Enum
Outcome taxonomy for :class:SlopeReport (ADR-002 §Risks #1; review P1-2).
Replaces the pre-amendment binary passed: bool shape so callers
can distinguish "trip-wire cleared" from "data too short" from
"curve contained NaN / inf". The legacy passed boolean is
preserved for v0.5.x backwards-compat but is derived from
status; passed = status is CheckStatus.PASSED.
Values
PASSED: slope > 0 and p_value < alpha.
FAILED: Slope non-positive (flat or regressing) OR p-value
above alpha.
INSUFFICIENT_DATA: Curve has fewer than min_episodes
entries. The pre-amendment code returned passed=True
here (vacuous-pass-on-short-curves bug, external-review
P1-2); the new branch returns this status with
passed=False so consumers do not silently consume a
too-short curve as a passing trip-wire.
INVALID: Curve contains NaN or +/- inf. The pre-amendment code
would crash inside scipy.stats.linregress (on NaN) or
silently propagate inf through the slope; the new branch
detects them up front and returns this status with
passed=False.
Source code in src/concerto/training/learning_signal_check.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
GuaranteeReport
dataclass
Result of one moving-window non-decreasing-fraction check (T4b.13; ADR-002 §Risks #1).
The dataclass is frozen so a returned report cannot be mutated after
the fact (the project's reproducibility contract: provenance bundles
are immutable). Serialise to JSON via :func:dataclasses.asdict
(used by :func:plot_reward_curve).
The "Guarantee" in the name is a legacy label held over from the
pre-rename empirical_guarantee module — the value is a
trip-wire, not a formal guarantee; see the module docstring +
:class:CheckStatus for the renamed canonical taxonomy. The
dataclass name is preserved for v0.5.x backwards-compat with
consumers that import it directly.
Attributes:
-
window(int) –Moving-average window length the report was computed against (default :data:
DEFAULT_WINDOW). -
threshold(float) –Pass threshold the report was checked against (default :data:
DEFAULT_THRESHOLD). -
n_intervals(int) –Number of consecutive-window-pair intervals examined. Equals
max(len(curve) - window, 0)for a non-degenerate run; zero when the curve is empty or shorter thanwindow. -
n_nondecreasing(int) –Count of intervals whose later-window mean is
>=the earlier-window mean. -
fraction(float) –n_nondecreasing / n_intervalswhenn_intervals > 0;1.0(vacuously) whenn_intervals == 0so callers can treat the degenerate case uniformly. -
passed(bool) –fraction >= threshold. VacuouslyTruewhenn_intervals == 0(the curve was too short to assert anything; the assertion is documented as vacuous in that regime).
Source code in src/concerto/training/learning_signal_check.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
SlopeReport
dataclass
Result of one slope-test learning-signal check (T4b.13; ADR-002 §Risks #1).
Captures everything an external reader needs to interpret the
trip-wire outcome without re-running the experiment: the
least-squares slope itself, the coefficient of determination, the
one-sided p-value, the alpha threshold the run was checked against,
the number of episodes the regression saw, the :class:CheckStatus
taxon, and the deprecated passed boolean (kept for v0.5.x
backwards-compat).
The dataclass is frozen so a returned report cannot be mutated after
the fact (the project's reproducibility contract: provenance bundles
are immutable). Serialise to JSON via :func:dataclasses.asdict
(used by :func:plot_slope_curve).
Attributes:
-
slope(float) –Least-squares slope of per-episode reward versus episode index, in reward-units-per-episode. Positive ⇒ learning; zero or negative ⇒ no learning or regressing.
-
intercept(float) –Least-squares intercept. Useful for the regression overlay in :func:
plot_slope_curveand as a sanity check on the early-rollout reward scale. -
r_squared(float) –Coefficient of determination
R². LowR²plusstatus=PASSEDis acceptable on noisy tasks (the slope is still statistically significant); the project does not gate onR²directly. -
p_value(float) –One-sided p-value for the null "slope <= 0", computed as
two_sided_p / 2when the observed slope is positive and1 - two_sided_p / 2when non-positive. The two-sided p-value comes from :func:scipy.stats.linregress. -
alpha(float) –Significance threshold the check was run against (default :data:
DEFAULT_ALPHA). -
n_episodes(int) –Number of episodes in the regressed curve. Below :data:
DEFAULT_MIN_EPISODESthe report carries :attr:CheckStatus.INSUFFICIENT_DATA. -
status(CheckStatus) –Outcome taxon (:class:
CheckStatus). Canonical signal; new code should branch on this, not onpassed. -
passed(bool) –Deprecated — derived from
status(passed == status is CheckStatus.PASSED). Kept for v0.5.x consumers; removal target v0.6.0. Construction emits a :class:DeprecationWarningin__post_init__.
Source code in src/concerto/training/learning_signal_check.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 | |
__post_init__
__post_init__() -> None
Derive passed from status + emit the field-deprecation warning.
Two side effects, in order:
passedis recomputed fromstatusso the two cannot drift even when a caller supplies an inconsistent value (e.g. while round-tripping a legacy v1 JSON payload that haspassedbut nostatus); the (rare) override usesobject.__setattr__because the dataclass is frozen.- A :class:
DeprecationWarningis emitted unconditionally, on every construction, signalling that thepassedfield itself is deprecated. The warning fires whether or not the caller suppliedpassedexplicitly — the goal is to flag the field to consumers, not just inconsistent inputs. New code should branch onreport.status is CheckStatus.PASSEDand ignorepassedentirely;passedis removed in v0.6.0 along with this__post_init__warning.
Source code in src/concerto/training/learning_signal_check.py
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 | |
assert_non_decreasing_window
assert_non_decreasing_window(
curve: RewardCurve,
*,
window: int = DEFAULT_WINDOW,
threshold: float = DEFAULT_THRESHOLD,
) -> GuaranteeReport
Check the moving-window-of-K mean for non-decreasing intervals (ADR-002 §Risks #1).
Slides a window of length window over curve.per_episode_ego_rewards
and counts the fraction of consecutive interval pairs whose later-window
mean is >= the earlier-window mean. A run is considered to have
cleared the trip-wire when that fraction is >= threshold.
Plan/05 §6 criterion 4 names the canonical setting:
window=10, threshold=0.8. Callers MUST NOT silently widen the
window or lower the threshold to coerce a passing report — if the
assertion fires under the production config, hand control back to
the maintainer and open a #scope-revision issue (ADR-002 §Risks
1 mitigation policy).
Degenerate inputs (empty curve, len(curve) < window) return a
report with n_intervals=0 and passed=True. The pass is
vacuous — there is not enough signal to assert anything — and is
only useful as a sentinel for unit tests of the helper itself; do
not rely on it to mask a too-short production run. The canonical
slope test :func:check_positive_learning_slope resolves this
vacuous-pass class of failure (external-review P1-2) by returning
CheckStatus.INSUFFICIENT_DATA with passed=False on
too-short curves.
Parameters:
-
curve(RewardCurve) –The reward curve produced by :func:
concerto.training.ego_aht.train. Only theper_episode_ego_rewardsfield is read; the per-step series is deliberately ignored because the assertion is episode-aggregated (Huriot-Sibai 2025 Theorem 7 is stated over the policy-improvement step, which lives at episode granularity). -
window(int, default:DEFAULT_WINDOW) –Window size (default :data:
DEFAULT_WINDOW). -
threshold(float, default:DEFAULT_THRESHOLD) –Pass threshold on the non-decreasing fraction (default :data:
DEFAULT_THRESHOLD).
Returns:
-
Frozen(GuaranteeReport) –class:
GuaranteeReport. Does NOT raise on failure — the -
GuaranteeReport–caller decides whether to fail the test, exit the CLI non-zero,
-
GuaranteeReport–or just log the report.
Source code in src/concerto/training/learning_signal_check.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
check_positive_learning_slope
check_positive_learning_slope(
curve: RewardCurve,
*,
alpha: float = DEFAULT_ALPHA,
min_episodes: int = DEFAULT_MIN_EPISODES,
) -> SlopeReport
Least-squares slope check on per-episode reward (ADR-002 §Risks #1; canonical).
Renames the pre-amendment assert_positive_learning_slope (the
"assert" verb overclaimed — the function returns a report rather
than raising, and the slope check is a trip-wire for ADR-002
§Risks #1, not a formal monotonic-improvement guarantee). The old
name remains available via a deprecation shim at
concerto.training.empirical_guarantee.assert_positive_learning_slope;
removal target v0.6.0.
Issue #62 root-cause writeup: the legacy
:func:assert_non_decreasing_window statistic has poor signal-to-
noise on per-episode rewards at the Phase-0 100k-frame budget — the
moving-window-of-10 mean stdev (~6.3) is comparable to the total
drift (~10-20 reward units) the trainer can produce, and the
non-decreasing-fraction sits near 0.5 (a random walk) even when the
policy is materially improving. The slope check integrates over the
whole curve and rejects the null "slope <= 0" with p-values
astronomically below any reasonable alpha when learning is real.
The check:
- Validate the curve. NaN / +/- inf ⇒
:attr:
CheckStatus.INVALIDimmediately. - If
n < min_episodes⇒ :attr:CheckStatus.INSUFFICIENT_DATAwithpassed=False. (Pre-amendment code returnedpassed=Truehere; external-review P1-2 flagged this as silently masking too-short curves.) - Fit
rewards ~ slope * episode_index + interceptvia least squares (:func:scipy.stats.linregress). - Take the one-sided p-value:
two_sided / 2when the observed slope is positive, otherwise1 - two_sided / 2. (scipy reports two-sided by default; the one-sided form is the natural gate for "positive learning".) - Pass when
slope > 0ANDp_value < alpha⇒ :attr:CheckStatus.PASSED. Else :attr:CheckStatus.FAILED.
Plan/05 §6 criterion 4 uses alpha=0.05 as the canonical
threshold; the project's production config gives p_value < 1e-14
across seeds, so the gate has very wide margin.
Parameters:
-
curve(RewardCurve) –The reward curve produced by :func:
concerto.training.ego_aht.train. Onlyper_episode_ego_rewardsis read. -
alpha(float, default:DEFAULT_ALPHA) –One-sided significance threshold. Default :data:
DEFAULT_ALPHA. -
min_episodes(int, default:DEFAULT_MIN_EPISODES) –Minimum episode count for a non-vacuous report. Below this the report carries :attr:
CheckStatus.INSUFFICIENT_DATA. Default :data:DEFAULT_MIN_EPISODES.
Returns:
-
Frozen(SlopeReport) –class:
SlopeReport. Does NOT raise on failure — the -
SlopeReport–caller decides whether to fail the test, exit the CLI non-zero,
-
SlopeReport–or just log the report.
Source code in src/concerto/training/learning_signal_check.py
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 | |
plot_reward_curve
plot_reward_curve(
curve: RewardCurve,
*,
report: GuaranteeReport,
out_dir: Path,
) -> tuple[Path, Path]
Emit <run_id>.png + <run_id>.json for the assertion plot (T4b.13; ADR-002 §Risks #1).
Writes two files under out_dir:
<run_id>.png: matplotlib line plot of the per-episode reward with the moving-window-of-report.windowmean overlaid. The report'sfraction/passedare stamped into the title sodocs/explanation/why-aht.mdcan embed the figure unchanged.<run_id>.json: machine-replayable sidecar carrying the raw per-episode curve + the report. A future reader regenerates the figure by replaying the JSON through this function.
matplotlib is imported lazily inside the function so the
surrounding module stays importable in environments without the
viz extra (Phase-0 CI, headless containers). Callers that need
the plot must install the viz extra (uv sync --extra viz).
Parameters:
-
curve(RewardCurve) –The reward curve produced by :func:
concerto.training.ego_aht.train.curve.run_idis used as the filename stem. -
report(GuaranteeReport) –The :class:
GuaranteeReportproduced by :func:assert_non_decreasing_window. Stamped into the figure title and copied into the JSON sidecar. -
out_dir(Path) –Output directory. Created if missing.
Returns:
-
Path–(png_path, json_path)tuple — the absolute paths of the two -
Path–emitted files.
Source code in src/concerto/training/learning_signal_check.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 | |
plot_slope_curve
plot_slope_curve(
curve: RewardCurve,
*,
report: SlopeReport,
out_dir: Path,
) -> tuple[Path, Path]
Emit <run_id>.png + <run_id>.json for the slope plot (T4b.13; ADR-002 §Risks #1).
Mirrors :func:plot_reward_curve's contract but overlays the
least-squares regression line instead of the moving-window mean.
The status line in the title carries the slope, p-value, status,
and pass/fail flag so docs/explanation/why-aht.md can embed
the figure unchanged.
Parameters:
-
curve(RewardCurve) –The reward curve produced by :func:
concerto.training.ego_aht.train.curve.run_idis the filename stem. -
report(SlopeReport) –The :class:
SlopeReportproduced by :func:check_positive_learning_slope. Stamped into the figure title and copied into the JSON sidecar. -
out_dir(Path) –Output directory. Created if missing.
Returns:
-
Path–(png_path, json_path)tuple — the absolute paths of the two -
Path–emitted files.
Source code in src/concerto/training/learning_signal_check.py
589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 | |
Deterministic seeding harness for CONCERTO (P6 reproducibility).
Every randomised piece of CONCERTO and CHAMBER (env wrappers, comm channel,
partner zoo, training loop) draws from a numpy Generator seeded via
:func:derive_substream. The function maps a (name, root_seed) pair to a
fresh Substream whose RNG is independent of every other named substream
under the same root seed; reusing the same pair always returns byte-identical
draws.
Cited as ADR-002 §"deterministic seeding harness" (training stack)
and ADR-003 §Decision (the comm channel uses derive_substream("comm.fixed", ...)
to seed its substream).
Substream
dataclass
Wraps a numpy SeedSequence with convenience constructors (P6).
ADR-002 §"deterministic seeding harness". The frozen dataclass makes the substream identity hashable; equality is structural over the underlying seed-sequence's entropy + spawn key, which is enough for caching.
Source code in src/concerto/training/seeding.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
default_rng
default_rng() -> np.random.Generator
Construct a fresh numpy Generator from this substream (P6).
ADR-002 §"deterministic seeding harness". Two calls return independent generators that produce identical sequences when freshly built — i.e., the substream is a seed factory, not an iterator.
Source code in src/concerto/training/seeding.py
49 50 51 52 53 54 55 56 | |
spawn
spawn(n: int) -> list[Substream]
Spawn n independent child substreams (P6).
ADR-002 §"deterministic seeding harness". Use when a producer needs per-worker / per-episode RNG without colliding with sibling streams.
Source code in src/concerto/training/seeding.py
58 59 60 61 62 63 64 | |
derive_substream
derive_substream(
name: str, *, root_seed: int = 0
) -> Substream
Derive a deterministic named substream from a root seed (P6).
ADR-002 §"deterministic seeding harness"; ADR-003 §Decision (the comm
channel uses derive_substream("comm.fixed", root_seed=...)).
Parameters:
-
name(str) –Stable label for the substream. Distinct names yield independent RNG streams under the same
root_seed; reusing the pair(name, root_seed)always returns byte-identical draws. -
root_seed(int, default:0) –Project-wide root seed. Defaults to 0 so call sites can ship working defaults; production runs override per-experiment.
Returns:
-
A(Substream) –class:
Substreamwhose :meth:Substream.default_rngis the -
Substream–canonical entry-point for downstream randomness.
Source code in src/concerto/training/seeding.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
Structured logging for ego-AHT training runs (ADR-002 §Decisions; plan/05 §2).
Every line emitted during a training run carries the RunContext fields
(run_id, seed, git_sha, pyproject_hash) plus a per-event
step so any line in any artefact can be traced back to the exact code
+ config + RNG state that produced it. Phase-0 ships two sinks:
- A local JSONL file (
logs/<run_id>.jsonl) — the offline-replayable fallback that does not depend on W&B being reachable. Every JSONL line is a self-contained valid JSON object. - An opt-in :class:
wandb.sdk.wandb_run.Runsink — the caller passes a pre-constructed run object as thewandb_sinkkwarg of :func:bind_run_logger. Phase-0 uses W&B's offline mode in tests so the sink can be exercised without a network round-trip. The four :class:RunContextprovenance fields are written to W&B'sconfigonce at bind time (via :meth:WandbSink.set_config) rather than to every metric event, so they appear as run-level metadata rather than scalar metrics.
Determinism: :func:compute_run_metadata derives run_id from
(seed, git_sha, pyproject_hash, run_kind) so two reruns with identical
code + config + seed produce the same run_id (P6 reproducibility). The
pyproject_hash is the SHA-256 of the project's pyproject.toml so a
silent dependency drift produces a different run_id.
RunContext
dataclass
Per-run provenance bundle bound to every log line (ADR-002 §Decisions; plan/05 §2; P6).
Attributes:
-
run_id(str) –Stable 16-hex-char hash derived from the other four fields via :func:
compute_run_metadata. Used as the JSONL filename and as the W&B run name so artefacts collated byrun_idfrom different sinks describe the same run. -
seed(int) –Project-wide root seed; ego-AHT runs derive every RNG substream from this via :func:
concerto.training.seeding.derive_substream. -
git_sha(str) –Full SHA of the currently-checked-out commit, or :data:
GIT_SHA_UNKNOWNwhen the working tree is not a git repository or thegitbinary is missing. -
pyproject_hash(str) –SHA-256 of
pyproject.toml(full hex digest). A silent dependency drift produces a different hash and therefore a differentrun_id. -
run_kind(str) –Free-form label describing what this run is for (e.g.
"empirical_guarantee","zoo_seed"). Folded into therun_idhash so two distinct kinds at the same seed do not collide. -
extra(dict[str, str]) –Optional free-form string-string metadata attached to every log line (e.g. task name, partner class). Read-only by convention; defaults to empty dict.
Source code in src/concerto/training/logging.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
WandbSink
Bases: Protocol
Minimal subset of :class:wandb.sdk.wandb_run.Run we depend on (ADR-002 §Decisions).
Plan/05 §2: W&B is opt-in. The Protocol lets tests inject an in-memory
fake without requiring the real wandb package to be importable.
Source code in src/concerto/training/logging.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | |
log
log(
data: Mapping[str, object], *, step: int | None = None
) -> None
Emit one structured event (W&B-side surface; ADR-002 §Decisions).
Source code in src/concerto/training/logging.py
236 237 238 | |
set_config
set_config(config: Mapping[str, object]) -> None
Store run-level metadata once at bind time (ADR-002 §Decisions; plan/05 §2).
Called by :func:bind_run_logger to pin the four
:class:RunContext provenance fields (run_id, seed,
git_sha, pyproject_hash) on the W&B run as metadata rather
than as per-step metrics.
Source code in src/concerto/training/logging.py
240 241 242 243 244 245 246 247 248 | |
bind_run_logger
bind_run_logger(
ctx: RunContext,
*,
jsonl_path: Path,
wandb_sink: WandbSink | None = None,
) -> structlog.BoundLogger
Build a structlog logger that emits JSONL + optionally W&B (ADR-002 §Decisions; plan/05 §2).
The returned logger has every :class:RunContext field bound to it so
every emitted line carries them automatically. Callers add per-event
fields (e.g. step, ego_reward) via logger.info(...).
The function does NOT close the JSONL file; the caller owns its
lifecycle (typically a with open(...) as fh: block held open for
the duration of the run).
Parameters:
-
ctx(RunContext) –Run-level provenance bundle.
-
jsonl_path(Path) –Filesystem path where the JSONL fallback is written. Parent directory is created if missing.
-
wandb_sink(WandbSink | None, default:None) –Optional :class:
WandbSink(production: awandb.run; tests: an in-memory fake). WhenNone, only JSONL is emitted.
Returns:
-
A(BoundLogger) –class:
structlog.BoundLoggerwrapping the JSONL (and optionally -
BoundLogger–W&B) sinks. The logger is not module-global; callers explicitly
-
BoundLogger–pass it through their training loop so unit tests can substitute
-
BoundLogger–in-memory sinks freely.
Source code in src/concerto/training/logging.py
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
compute_run_metadata
compute_run_metadata(
*,
seed: int,
run_kind: str,
repo_root: Path,
extra: Mapping[str, str] | None = None,
) -> RunContext
Build :class:RunContext with auto-detected git/pyproject hashes (ADR-002; plan/05 §2).
The run_id is the first 16 hex chars of the SHA-256 of
(seed, git_sha, pyproject_hash, run_kind), matching the
16-hex-char convention used for PartnerSpec.partner_id (plan/04
§3.1) so artefact filenames are uniform across the project.
Parameters:
-
seed(int) –Root seed; passed to :func:
concerto.training.seeding.derive_substreamupstream. -
run_kind(str) –Free-form label (e.g.
"empirical_guarantee"). -
repo_root(Path) –Working-tree directory to query for
git_shaandpyproject.toml. -
extra(Mapping[str, str] | None, default:None) –Optional string-string metadata attached to every line.
Returns:
-
RunContext–A frozen :class:
RunContextready to bind via :func:bind_run_logger.
Source code in src/concerto/training/logging.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
Checkpoint save/load with SHA-256 integrity (T4b.12; ADR-002 §Decisions).
The training stack writes torch state-dicts as .pt files referenced
from M4a's :class:~chamber.partners.api.PartnerSpec weights_uri field
(plan/04 §3.8). This module owns the wire format:
- A
local://artifacts/<name>.ptURI resolves to<artifacts_root>/<name>.pt. - Each
.ptpayload is paired with a<name>.pt.jsonmetadata sidecar carrying :class:CheckpointMetadata(run_id, seed, step, git_sha, pyproject_hash, sha256). The sidecar is the authoritative provenance record; the.ptfile is the opaque tensor payload. - :func:
load_checkpointre-hashes the loaded.ptpayload and refuses to return tensors when the digest disagrees with the sidecar (ADR-002 §Decisions; P6 reproducibility).
Deploy order: payload first, sidecar second. A half-finished deploy (payload landed but sidecar not yet) hits the "missing sidecar" failure path and refuses to load tensors. The reverse order (sidecar first, stale payload) would silently load tensors with the wrong provenance — do not invert.
ADR-009 §Consequences "draft-zoo scoping": the M4a Phase-0 draft-zoo
specs reference local://artifacts/mappo_seed42_step100k.pt and
local://artifacts/happo_seed7_step50k.pt but the concrete frozen-RL
adapters (T4.5 / T4.6) load them only in M4 Phase 3. Until then the SHA-
verified loader fails loudly when the sidecar is missing — by design.
CheckpointError
Bases: RuntimeError
Loud-fail signal for checkpoint I/O failures (ADR-002 §Decisions).
Raised on:
- URI scheme mismatch (only local:// is supported in Phase 0).
- Missing .pt payload or missing sidecar JSON.
- SHA-256 mismatch between the stored sidecar digest and the
re-computed payload digest on load.
Subclasses :class:RuntimeError so a careless caller catching
Exception still sees the failure but can disambiguate via
isinstance(exc, CheckpointError).
Source code in src/concerto/training/checkpoints.py
58 59 60 61 62 63 64 65 66 67 68 69 70 | |
CheckpointMetadata
dataclass
Provenance + integrity sidecar for one checkpoint (T4b.12; plan/05 §5).
Mirrors the :class:~concerto.training.logging.RunContext provenance
bundle (run_id / seed / git_sha / pyproject_hash) plus the per-
checkpoint step and the integrity-protecting sha256 of the
.pt payload. All fields are required so a sidecar with missing
keys raises :class:CheckpointError on load (loud-fail per ADR-002).
Attributes:
-
run_id(str) –16-hex hash of the run that produced this checkpoint. Cross-references the run's JSONL log file via :class:
concerto.training.logging.RunContext. -
seed(int) –Root seed of the producing run; same value passed to :func:
concerto.training.seeding.derive_substream. -
step(int) –Training step at which this checkpoint was taken (e.g. 50_000 for the canonical "50%-reward" tier in ADR-009 §Decision).
-
git_sha(str) –Full SHA of the commit the producing run was launched from.
"unknown"is permitted (sentinel from :class:concerto.training.logging.GIT_SHA_UNKNOWN). -
pyproject_hash(str) –SHA-256 of
pyproject.tomlat run launch. Detects silent dependency drift between save and load. -
sha256(str) –SHA-256 of the raw
.ptpayload bytes, computed at save time and verified on every load. Mismatch → :class:CheckpointError.
Source code in src/concerto/training/checkpoints.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
from_dict
classmethod
from_dict(raw: Mapping[str, object]) -> CheckpointMetadata
Build a :class:CheckpointMetadata from a JSON-decoded mapping (ADR-002 §Decisions).
Refuses partial sidecars: every field declared on the dataclass
must be present, and types are coerced (int/str) without
loss. Missing fields raise :class:CheckpointError so a half-
written sidecar is caught at load time, not at first inference.
Parameters:
-
raw(Mapping[str, object]) –A
json.load(open(sidecar))-style dict.
Returns:
-
CheckpointMetadata–A frozen :class:
CheckpointMetadatainstance.
Raises:
-
CheckpointError–If any required field is missing.
Source code in src/concerto/training/checkpoints.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
load_checkpoint
load_checkpoint(
*, uri: str, artifacts_root: Path
) -> tuple[dict[str, torch.Tensor], CheckpointMetadata]
Load a torch state-dict + verify SHA-256 (T4b.12; ADR-002 §Decisions; P6).
Refuses to return tensors when:
- The .pt payload is missing.
- The sidecar JSON is missing.
- The sidecar is missing required fields (per
:meth:CheckpointMetadata.from_dict).
- The sidecar's stored sha256 disagrees with the recomputed
digest of the on-disk payload.
All four failure modes raise :class:CheckpointError so a partial
deployment (e.g. payload deployed but sidecar not yet) cannot
silently load tensors that the project has no provenance for.
Parameters:
-
uri(str) –Source URI (e.g.
"local://artifacts/happo_seed7_step50k.pt"). -
artifacts_root(Path) –Filesystem directory the URI resolves against.
Returns:
-
dict[str, Tensor]–A tuple
(state_dict, metadata)wherestate_dictis the -
CheckpointMetadata–verbatim mapping loaded from the
.ptpayload and -
tuple[dict[str, Tensor], CheckpointMetadata]–metadatais the verified :class:CheckpointMetadata.
Raises:
-
CheckpointError–On any of the four loud-fail conditions above.
Source code in src/concerto/training/checkpoints.py
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
resolve_uri
resolve_uri(uri: str, *, artifacts_root: Path) -> Path
Resolve local://artifacts/<name>.pt URI to filesystem path (ADR-002; plan/04 §3.8).
Phase-0 supports the local:// scheme only. Non-local:// URIs
raise :class:CheckpointError so Phase-1 distribution schemes
(hf://, s3://) cannot silently fall through to the local
filesystem before an ADR amendment is in place.
The path immediately under local:// is appended to artifacts_root;
e.g. local://artifacts/foo.pt against artifacts_root=/data resolves
to /data/artifacts/foo.pt. The leading artifacts/ segment in the
URI is preserved by design — it makes the URI self-describing in logs
and keeps the on-disk layout grep-able.
Parameters:
-
uri(str) –The
weights_urifrom a :class:~chamber.partners.api.PartnerSpec(plan/04 §3.8). -
artifacts_root(Path) –Filesystem directory the URI is resolved against.
Returns:
-
Path–Absolute filesystem path to the
.ptpayload (the file may -
Path–not exist yet — :func:
save_checkpointcreates it).
Raises:
-
CheckpointError–If
uridoes not start with :data:LOCAL_URI_SCHEME.
Source code in src/concerto/training/checkpoints.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
save_checkpoint
save_checkpoint(
*,
state_dict: Mapping[str, Tensor],
uri: str,
metadata: CheckpointMetadata,
artifacts_root: Path,
) -> Path
Save a torch state-dict + sidecar (T4b.12; ADR-002 §Decisions).
The save is two-phase: write the .pt payload first, then compute
its SHA-256, write the sidecar JSON with the digest stitched in. The
sha256 field on the input metadata is overwritten with the
digest computed at save time so callers cannot accidentally ship a
stale digest that would crash :func:load_checkpoint.
Parameters:
-
state_dict(Mapping[str, Tensor]) –A
Mapping[str, torch.Tensor]exactly as produced bymodel.state_dict(). Saved verbatim via :func:torch.save. -
uri(str) –Destination URI (e.g.
"local://artifacts/happo_seed7_step50k.pt"). -
metadata(CheckpointMetadata) –Provenance bundle. The
sha256field is recomputed at save time and overrides whatever the caller passed. -
artifacts_root(Path) –Filesystem directory the URI resolves against.
Returns:
-
Path–Absolute filesystem path to the written
.ptpayload.
Raises:
-
CheckpointError–If
uriis not alocal://URI.
Source code in src/concerto/training/checkpoints.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 | |
Hydra-driven config for ego-AHT training runs (T4b.11; ADR-002 §Decisions; plan/05 §2).
The training stack is configured via Hydra YAML files under
configs/training/<algo>/<task>.yaml. This module ships the
type-safe Pydantic v2 models that those YAMLs validate against:
EgoAHTConfig is the root, with sub-models for the env / partner /
W&B / hyperparameter blocks.
Why two layers (Hydra + Pydantic)?
- Hydra owns the composition + CLI override surface (defaults, group
selection,
+key=valueoverrides). - Pydantic owns the validation + Python-side type-safety (Path coercion, range checks, frozen-by-default to keep call sites from mutating config mid-run).
:func:load_config is the canonical entry point: it locates the project's
configs/ directory by walking up from the caller's CWD, composes the
named config via Hydra, and validates it through :class:EgoAHTConfig.
ADR-002 §Decisions ("Hydra config root"): the YAML schema is part of the
public reproducibility contract. Adding a required field is a breaking
change to the config; deprecate-then-remove via a Pydantic
:class:Field default and a DeprecationWarning rather than removing
in place.
EgoAHTConfig
Bases: _FrozenModel
Root config for an ego-AHT training run (T4b.11; ADR-002 §Decisions).
Validates the composed Hydra config and exposes a frozen
Python-side handle for :func:concerto.training.ego_aht.train.
Attributes:
-
algo(str) –Trainer-class registry key. Phase-0:
"ego_aht_happo";"ego_aht_hatd3"is the Phase-1 stub. -
seed(int) –Project root seed routed to :func:
concerto.training.seeding.derive_substream. Two runs with the sameseedproduce byte-identical CPU reward curves (plan/05 §6 criterion 7). -
total_frames(int) –Training budget in env frames. T4b.13 = 100_000.
-
checkpoint_every(int) –Save a
.ptartefact every K frames. -
artifacts_root(Path) –Directory the
local://artifacts/...URIs resolve against (plan/04 §3.8). -
log_dir(Path) –Directory where the JSONL logs are written.
-
wandb(WandbConfig) –:class:
WandbConfig. -
env(EnvConfig) –:class:
EnvConfig. -
partner(PartnerConfig) –:class:
PartnerConfig. -
happo(HAPPOHyperparams) –:class:
HAPPOHyperparams. -
runtime(RuntimeConfig) –:class:
RuntimeConfig— device + determinism knobs (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat). Defaults todevice="auto"+deterministic_torch=Trueso an unset YAML on a Mac CPU dev box just works; the productionmpe_cooperative_push.yamlpinsdevice: cpuandstage0_smoke.yamlpinsdevice: cuda.
Source code in src/concerto/training/config.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | |
EnvConfig
Bases: _FrozenModel
Env-side configuration (T4b.13; ADR-002 §Decisions; plan/05 §3.5).
Attributes:
-
task(str) –Registered task name. Phase-0 supports
"mpe_cooperative_push"(T4b.13's empirical-guarantee env) and"stage0_smoke"(T4b.14's zoo-seed env, deferred to user-side GPU run). P1.04 adds"stage1_pickplace"(ADR-007 §Stage 1b real-env science-evaluation target). -
episode_length(int) –Truncation horizon in env ticks. The empirical- guarantee experiment defaults to 50 (matches PettingZoo simple_spread).
-
agent_uids(tuple[str, str]) –2-element list of uids the env exposes; the first is the ego, the second is the frozen partner.
-
condition_id(str | None) –Stage-1b pre-registered condition_id string (P1.04 / ADR-007 §Stage 1b). Required when
task == "stage1_pickplace"because OM-homo and OM-hetero share the("panda_wristcam", "fetch")agent_uids tuple — the condition_id is the explicit disambiguator. The yaml carries a sensible default; :class:chamber.benchmarks.stage1_common.TrainedPolicyFactoryoverrides per(seed, condition)cell viamodel_copy.Nonefor non-Stage-1b tasks (MPE, Stage-0) — pre-P1.04 cfgs work unchanged.
Source code in src/concerto/training/config.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
HAPPOHyperparams
Bases: _FrozenModel
On-policy ego-AHT HAPPO hyperparameters (ADR-002 §Decisions; plan/05 §3.2).
Defaults match the published HARL Bi-DexHands-style values where they translate; tuned conservatively for the small MPE task to keep the empirical-guarantee experiment within its 30-minute CPU budget (plan/05 §8).
Attributes:
-
lr(float) –Adam learning rate for the actor + critic.
-
gamma(float) –Discount factor.
-
gae_lambda(float) –GAE-λ for advantage estimation.
-
clip_eps(float) –PPO clipping epsilon.
-
n_epochs(int) –Number of optimisation epochs per rollout.
-
rollout_length(int) –Frames collected per epoch before the update step.
-
batch_size(int) –SGD minibatch size; rollout_length must be a multiple.
-
hidden_dim(int) –MLP hidden width for actor + critic.
Source code in src/concerto/training/config.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | |
PartnerConfig
Bases: _FrozenModel
Frozen partner spec (ADR-009 §Decision; plan/05 §3.5).
Mirrors the M4a :class:chamber.partners.api.PartnerSpec fields so
the training loop can build a PartnerSpec directly from the
config without re-validating.
Attributes:
-
class_name(str) –Registry key passed to :func:
chamber.partners.registry.load_partner. Phase-0 empirical-guarantee runs use"scripted_heuristic". -
seed(int) –Per-partner training seed; for scripted partners use
0. -
checkpoint_step(int | None) –Training step at which the checkpoint was taken;
Nonefor scripted partners. -
weights_uri(str | None) –local://...URI for frozen-RL partners;Nonefor scripted partners. -
extra(dict[str, str]) –Free-form string-string metadata routed to
PartnerSpec.extra.
Source code in src/concerto/training/config.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
RuntimeConfig
Bases: _FrozenModel
Device + perf knobs (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat).
Phase-0 development runs on Mac CPU (Apple M-series). Production
zoo-seed runs target Linux + CUDA via
scripts/repro/zoo_seed.sh (M4b-9b). The two regimes have
different determinism contracts:
- CPU: byte-identical across runs at the same seed (plan/05 §6
criterion 7; pinned by
tests/integration/test_cpu_determinism.py).deterministic_torch=Truecalls :func:torch.use_deterministic_algorithmsonce at trainer construction so any non-deterministic CUDA-style op falls back to a deterministic implementation if one exists, or warns loudly otherwise (warn_only=True). - CUDA / MPS: plan/08 §4 grants the GPU determinism caveat —
cuDNN / cuBLAS kernels are not bit-deterministic across hardware
generations even with deterministic flags. Set
deterministic_torch=Falseon GPU configs to disable the strict-mode flag and avoid the warning spam.
Attributes:
-
device(Literal['auto', 'cpu', 'cuda', 'mps']) –"auto"resolves via :func:chamber.utils.device.torch_device(CUDA > MPS > CPU). Explicit"cpu"/"cuda"/"mps"pins the choice and raises if unavailable (the trainer reports the ADR cite in the error message so a confused user searching for the message lands on ADR-002 §Decisions). -
deterministic_torch(bool) –When
True, the trainer calls :func:torch.use_deterministic_algorithms(True, warn_only=True)once at construction. Documented as a process-global side effect.
Source code in src/concerto/training/config.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
SafetyConfig
Bases: _FrozenModel
Safety-stack runtime configuration for the training rollout (P1.04.5; ADR-007 §Stage 1b).
Phase-1 P1.04.5 wires the CBF-QP outer filter into
:func:concerto.training.ego_aht.train per ADR-007 §Stage 1b
implementation-details paragraph 2. The filter runs once per env
step between the trainer's act and env.step; conformal
slack state.lambda_ accumulates per cell and the
:class:concerto.training.safety_telemetry.SafetyAggregator emits
a safety_telemetry_final JSONL event at end-of-cell carrying
the audit-gate predicate's inputs.
The block is read by
:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory's
per-cell construction path; non-Stage-1b yaml configs default
enabled=False so the filter doesn't engage outside its target
use case (the Phase-0 MPE empirical-guarantee config + the
Stage-0 smoke adapter both keep the un-filtered training rollout).
Kill-switch contract (D10 from the P1.04.5 Plan-subagent design):
when enabled=False the training loop skips the filter entirely
and emits safety_enabled=false in the JSONL summary. The
audit-gate hook reads that flag and emits a non-failing
"safety disabled by operator override; gate skipped" message — the
operator's intent is preserved on the audit trail.
Attributes:
-
enabled(bool) –Master switch.
True(default) wires the filter + telemetry into the training loop.Falseskips both; the audit gate readssafety_enabledand treats the cell as non-gated. -
saturation_threshold(float) –Fraction of
cartesian_accel_capacityabove which the cell is considered saturated. Default0.9leaves a 10% margin — operator-tunable for the audit-gate predicate A. ADR-007 §Stage 1b implementation details paragraph 2 names the predicate verbatim (λ_steady_state < cartesian_accel_capacity); the0.9factor is the engineering safety margin. -
cbf_gamma(float) –Class-K function gain forwarded to :func:
concerto.safety.conformal.update_lambda_from_predictor. Default5.0matches the existing :data:concerto.safety.cbf_qp.DEFAULT_CBF_GAMMA. -
lambda_safe(float) –Per-pair slack reset value at cell start (Phase-0 placeholder
0.0per ADR-004 §Open questions; the derived form is a Stage-2 prerequisite). -
n_warmup_steps(int) –Length of the warmup window after the cell starts (matches :data:
concerto.safety.api.DEFAULT_WARMUP_STEPS). -
predictor_kind(Literal['constant_velocity']) –Partner-trajectory predictor (Phase-0/1 default
"constant_velocity"per :func:concerto.safety.conformal.constant_velocity_predict; AoI-conditioned variants are Phase-2 per ADR-004 §Open questions). -
tau_brake(float) –Per-step braking-fallback TTC threshold in seconds (P1.04.6; ADR-007 §Stage 1b training-vs-deployment parity). When the smallest pairwise time-to-collision drops below this, :func:
concerto.safety.braking.maybe_brakeoverrides the ego action with the per-uid emergency-controller's push-apart response — the per-step backstop that does not depend on QP feasibility (Wang-Ames-Egerstedt 2017 eq. 17; ADR-004 risk-mitigation #1). Default matches :data:concerto.safety.braking.DEFAULT_TAU_BRAKE(100 ms).
Source code in src/concerto/training/config.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
WandbConfig
Bases: _FrozenModel
W&B sink toggle + project name (ADR-002 §Decisions; plan/05 §2).
Attributes:
-
enabled(bool) –When
False(default), :func:bind_run_loggeris built without a W&B sink and only the JSONL fallback fires. CI runs and unit tests should leave this off. -
project(str) –W&B project name. Used only when
enabledisTrue.
Source code in src/concerto/training/config.py
46 47 48 49 50 51 52 53 54 55 56 57 | |
load_config
load_config(
*, config_path: Path, overrides: list[str] | None = None
) -> EgoAHTConfig
Compose a Hydra config + validate via :class:EgoAHTConfig (T4b.11; ADR-002 §Decisions).
The function does NOT use Hydra's @hydra.main decorator (which
takes over the process's CWD + argv); it uses the lower-level
:class:hydra.compose API so the call is testable and re-entrant.
Parameters:
-
config_path(Path) –Absolute path to the YAML config file (e.g.
configs/training/ego_aht_happo/mpe_cooperative_push.yaml). The directory becomes Hydra'sconfig_dirand the stem becomes theconfig_nameit loads. Splitting via the file path keeps the YAML's root-level keys at the root of the composed config (rather than Hydra's path-folding). -
overrides(list[str] | None, default:None) –Optional list of Hydra CLI-style overrides (e.g.
["seed=7", "happo.lr=1e-4"]).
Returns:
-
EgoAHTConfig–A frozen :class:
EgoAHTConfigready for -
EgoAHTConfig–func:
concerto.training.ego_aht.train.
Raises:
-
ValidationError–If the composed config violates the schema (missing required field, out-of-range hyperparam, duplicate agent uids, etc.).
Source code in src/concerto/training/config.py
320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
Ego-AHT training loop entry point (T4b.11; ADR-002 §Decisions; plan/05 §3.5).
The single user-facing API for an ego-AHT training run is :func:train.
It wires up logging, seeding, per-step rollout collection, checkpoint
emission, and the trainer-factory seam, and returns a :class:RewardCurve
for T4b.13's empirical-guarantee assertion.
Dependency-direction discipline (project plan/10 §2 dependency-direction rule):
concerto.* MUST NOT import from chamber.*. The training loop
therefore takes the env and partner via dependency injection — the
chamber-side runner (chamber.benchmarks.training_runner) builds the
concrete env (:class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv)
and partner (via :func:chamber.partners.registry.load_partner) and
hands them to :func:train. That keeps every concerto.* arrow
pointing only at lower layers.
Trainer-factory injection: this loop lives in concerto.* and so cannot
import from chamber.* (project plan/10 §2). The real ego-PPO trainer
(:class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer) wraps HARL's
HAPPO and lives on the chamber side; the chamber-side
:func:chamber.benchmarks.training_runner.run_training selects it as the
default. :func:train therefore continues to accept trainer_factory
and falls back to :class:RandomEgoTrainer when called directly without
an injected factory — which keeps the loop unit-testable inside
concerto.* without violating the dependency direction.
ADR-002 risk-mitigation #1: this loop is the substrate the empirical- guarantee assertion runs against. Do not silently drop steps, mutate the rollout collection mid-loop, or change the partner-frozen contract without a new ADR.
EgoTrainer
Bases: Protocol
Minimal contract the training loop expects of an ego trainer (T4b.11; ADR-002 §Decisions).
Two concrete implementations satisfy this Protocol:
- :class:
chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer— the real HARL-HAPPO-backed ego-PPO trainer (M4b-8a / plan/05 §3.5), and the default selected by :func:chamber.benchmarks.training_runner.run_training. - :class:
RandomEgoTrainer— the Phase-0 reference fallback, used by :func:trainwhen notrainer_factoryis injected.
The Protocol is intentionally tiny (:meth:act + :meth:observe +
:meth:update + :meth:state_dict) so the seam stays small enough
that an external user can plug in their own algorithm without
changing :func:train.
Source code in src/concerto/training/ego_aht.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
act
act(
obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]
Sample (or pick deterministically) the ego action (ADR-002 §Decisions).
Source code in src/concerto/training/ego_aht.py
154 155 156 157 158 159 160 161 | |
observe
observe(
obs: Mapping[str, Any],
reward: float,
done: bool,
*,
truncated: bool = False,
) -> None
Record a (s, a, r, s', done, truncated) tuple (ADR-002 §Decisions; Pardo 2017).
done keeps the historical "this step ended the episode"
meaning (terminated OR truncated). truncated is a
keyword-only flag that distinguishes time-limit truncation from
true termination — needed by PPO-style trainers to bootstrap the
GAE target correctly (see :func:chamber.benchmarks.ego_ppo_trainer.compute_gae
and project issue #62 for the root-cause writeup). Default
truncated=False keeps legacy callers' behavior unchanged.
Source code in src/concerto/training/ego_aht.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
state_dict
state_dict() -> dict[str, Any]
Return the trainer's torch state-dict for checkpointing (ADR-002 §Decisions).
Source code in src/concerto/training/ego_aht.py
187 188 189 | |
update
update() -> None
Run one optimisation epoch on the buffered rollout (ADR-002 §Decisions).
Source code in src/concerto/training/ego_aht.py
183 184 185 | |
EnvLike
Bases: Protocol
Structural Gymnasium-multi-agent env contract the loop needs (T4b.11; ADR-002 §Decisions).
Defined locally so :func:train does not import a concrete class
from chamber.* (project plan/10 §2 dependency-direction rule). The
chamber-side :class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv
satisfies this Protocol structurally.
Source code in src/concerto/training/ego_aht.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
reset
reset(
*,
seed: int | None = None,
options: dict[str, Any] | None = None,
) -> tuple[Mapping[str, Any], Mapping[str, Any]]
Reset to the start of an episode (T4b.11; ADR-002 §Decisions).
Source code in src/concerto/training/ego_aht.py
78 79 80 81 82 83 84 85 | |
step
step(
action: dict[str, NDArray[floating]],
) -> tuple[
Mapping[str, Any], float, bool, bool, Mapping[str, Any]
]
Advance one tick (T4b.11; ADR-002 §Decisions).
Takes a concrete dict rather than a Mapping because the
chamber-side :class:MPECooperativePushEnv and the future
ManiSkill v3 wrappers all require dict-keyed access (mutation
of the action dict is intentional in some wrappers' rate-limit
/ collision-injection paths).
Source code in src/concerto/training/ego_aht.py
87 88 89 90 91 92 93 94 95 96 97 98 99 | |
PartnerLike
Bases: Protocol
Structural FrozenPartner contract the loop needs (T4b.11; ADR-009 §Decision).
Defined locally to keep concerto.* free of chamber.* imports.
The M4a :class:chamber.partners.api.FrozenPartner satisfies this
Protocol structurally — both the heuristic + the future frozen-RL
adapters drop in unchanged.
The spec attribute is declared as :class:Any so concrete
implementations can attach any spec type (M4a's
:class:chamber.partners.api.PartnerSpec); the loop only uses it
for diagnostic logging, never structurally. Any is required
here rather than object because Protocol attributes are
invariant and concrete subtypes (PartnerSpec) are not assignable to
a mutable object annotation.
Source code in src/concerto/training/ego_aht.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
act
act(
obs: Mapping[str, Any], *, deterministic: bool = True
) -> NDArray[np.floating]
Pick an action for the partner's own uid (ADR-009 §Decision).
Source code in src/concerto/training/ego_aht.py
126 127 128 129 130 131 132 133 | |
reset
reset(*, seed: int | None = None) -> None
Reset internal episode state (ADR-009 §Decision).
Source code in src/concerto/training/ego_aht.py
122 123 124 | |
RandomEgoTrainer
Reference :class:EgoTrainer that samples uniformly from the action space.
Exists so :func:train can be exercised end-to-end before the HARL
fork lands (T4b.4-T4b.7). Not a real learner — :meth:update is a
no-op, :meth:state_dict returns an empty dict — but the loop's
plumbing (env stepping, partner act, checkpoint emission, JSONL
logging) is fully exercised. T4b.13 will swap to EgoAHTHAPPO
via the trainer-factory seam (ADR-002 §Decisions; plan/05 §3.5).
Source code in src/concerto/training/ego_aht.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 | |
__init__
__init__(
*, ego_uid: str, action_dim: int, root_seed: int
) -> None
Build a uniform-random reference trainer (T4b.11; ADR-002 §Decisions).
Parameters:
-
ego_uid(str) –Stored for downstream identity propagation (the trainer doesn't read it at action time but the loop uses it as the dict-action key).
-
action_dim(int) –Length of the ego's action vector.
-
root_seed(int) –Project seed; the trainer's RNG is derived via :func:
concerto.training.seeding.derive_substreamfor P6 reproducibility.
Source code in src/concerto/training/ego_aht.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | |
act
act(
obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]
Sample uniformly from [-1, 1]^action_dim (ADR-002 §Decisions; plan/05 §3.5).
Source code in src/concerto/training/ego_aht.py
309 310 311 312 313 314 315 316 317 318 319 | |
observe
observe(
obs: Mapping[str, Any],
reward: float,
done: bool,
*,
truncated: bool = False,
) -> None
No-op (ADR-002 §Decisions): reference trainer has no rollout buffer.
Source code in src/concerto/training/ego_aht.py
321 322 323 324 325 326 327 328 329 330 | |
state_dict
state_dict() -> dict[str, Any]
Return an empty dict (ADR-002 §Decisions: trainer is parameter-free).
Source code in src/concerto/training/ego_aht.py
335 336 337 | |
update
update() -> None
No-op (ADR-002 §Decisions): reference trainer has no learnable params.
Source code in src/concerto/training/ego_aht.py
332 333 | |
RewardCurve
dataclass
Result of one ego-AHT training run (T4b.11; T4b.13 consumer; ADR-002 §Decisions).
Attributes:
-
run_id(str) –16-hex run identifier (from :class:
concerto.training.logging.RunContext). -
per_step_ego_rewards(list[float]) –One entry per env step. T4b.13's empirical- guarantee assertion runs over the moving-window-of-10 mean of this series.
-
per_episode_ego_rewards(list[float]) –Sum-of-step reward per episode. Coarser than
per_step; the canonical "epoch reward" the empirical-guarantee plot embeds indocs/explanation/why-aht.md. -
checkpoint_paths(list[Path]) –Absolute paths of every
.ptartefact saved during the run. T4b.14 reads the canonical 50%-step file from this list.
Source code in src/concerto/training/ego_aht.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
TrainerFactory
Bases: Protocol
Constructor that builds an :class:EgoTrainer (T4b.11; ADR-002 §Decisions).
The seam through which :func:train plugs in the algorithm.
:func:chamber.benchmarks.training_runner.run_training selects
:meth:EgoPPOTrainer.from_config <chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config>
by default (M4b-8a / plan/05 §3.5).
The factory receives the partner so it can refuse to construct
against a non-frozen partner before any tensor allocation
(ADR-009 §Consequences; plan/05 §6 #3). The default factory
:meth:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config
calls :func:~chamber.benchmarks.ego_ppo_trainer._assert_partner_is_frozen
on the partner as its first construction step.
Source code in src/concerto/training/ego_aht.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
__call__
__call__(
cfg: EgoAHTConfig,
*,
env: EnvLike,
partner: PartnerLike,
ego_uid: str,
) -> EgoTrainer
Build a trainer (ADR-002 §Decisions; plan/05 §3.5; ADR-009 §Consequences).
Source code in src/concerto/training/ego_aht.py
208 209 210 211 212 213 214 215 216 217 | |
TrainingResult
Bases: NamedTuple
Return value of :func:train and :func:chamber.benchmarks.training_runner.run_training.
Pairs the diagnostic reward stream (:class:RewardCurve) with the
trained ego policy (:class:EgoTrainer) so Phase-1 callers like
:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory
(P1.04 / ADR-007 §Stage 1b) can wrap result.trainer.act in a
per-step closure without paying a checkpoint round-trip cost. The
trainer's internal state (HARL HAPPO actor + critic + optimisers)
survives :func:train's scope through this return value rather
than being reloaded from disk.
NamedTuple (not a bare tuple[RewardCurve, EgoTrainer]) so call
sites use the field names — result.curve /
result.trainer — instead of positional indexing. Tuple-unpack
still works (curve, trainer = run_training(cfg)) for callers
that prefer it.
Attributes:
-
curve(RewardCurve) –The diagnostic per-step + per-episode reward stream T4b.13's empirical-guarantee assertion runs over. Same shape as before P1.04 — this NamedTuple is a wrapper, not a replacement.
-
trainer(EgoTrainer) –The trained :class:
EgoTrainerinstance, with the HARL HAPPO actor weights at their post-training state. Usetrainer.act(obs, deterministic=True)for evaluation rollouts. When :func:trainis called without atrainer_factory(Phase-0 reference path), this is the :class:RandomEgoTrainerfallback (parameter-free).
Source code in src/concerto/training/ego_aht.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | |
train
train(
cfg: EgoAHTConfig,
*,
env: EnvLike,
partner: PartnerLike,
trainer_factory: TrainerFactory | None = None,
repo_root: Path | None = None,
safety_filter: EgoOnlySafetyFilter | None = None,
safety_state: SafetyState | None = None,
safety_bounds: Bounds | None = None,
safety_snapshot_builder: Callable[
[EnvLike], dict[str, AgentSnapshot]
]
| None = None,
safety_dt: float | None = None,
safety_emergency_controllers: Mapping[
str, EmergencyController
]
| None = None,
) -> TrainingResult
Run one ego-AHT training loop (T4b.11; ADR-002 §Decisions; plan/05 §3.5).
Wires up logging, seeding, per-step rollout, checkpoint emission, and
the trainer-factory seam. Returns a :class:TrainingResult (P1.04 /
ADR-007 §Stage 1b) carrying both the diagnostic
:class:RewardCurve and the trained :class:EgoTrainer instance,
so Phase-1 callers (notably
:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory) can
wrap result.trainer.act in a per-cell closure without paying
a checkpoint round-trip.
The pre-P1.04 return type was the bare :class:RewardCurve; the
NamedTuple wrapper is backward-compatible at the tuple-unpack level
(curve, trainer = train(...)) but reads more cleanly at named
call sites (result.curve / result.trainer).
Parameters:
-
cfg(EgoAHTConfig) –Validated :class:
~concerto.training.config.EgoAHTConfig. -
env(EnvLike) –Concrete env instance satisfying the :class:
EnvLikestructural Protocol. Built upstream by :func:chamber.benchmarks.training_runner.build_env. -
partner(PartnerLike) –Concrete frozen partner satisfying :class:
PartnerLike. Built upstream by :func:chamber.benchmarks.training_runner.build_partner. -
trainer_factory(TrainerFactory | None, default:None) –Constructor that builds the :class:
EgoTrainer. WhenNone(default), uses :class:RandomEgoTrainer— the Phase-0 in-concerto.*fallback. The chamber-side wrapper :func:chamber.benchmarks.training_runner.run_traininginjects the M4b-8a :class:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer(the real HARL-HAPPO trainer; plan/05 §3.5) here. -
repo_root(Path | None, default:None) –Working-tree root for the run-metadata bundle (defaults to :func:
pathlib.Path.cwd). -
safety_filter(EgoOnlySafetyFilter | None, default:None) –Optional CBF-QP outer filter (P1.04.5; ADR-007 §Stage 1b). All five safety kwargs must be passed together when
cfg.safety.enabled=True; passing any with the cfg disabled (or partial sets) raises :class:ValueError(intent-mismatch contract). -
safety_state(SafetyState | None, default:None) –Optional :class:
SafetyState— the conformal slack vector (mutated in place per step by :func:update_lambda_from_predictor). -
safety_bounds(Bounds | None, default:None) –Optional per-task :class:
Bounds— the cell's CBF/Cartesian envelope (audit-gate predicate A's RHS is sourced fromcartesian_accel_capacity). -
safety_snapshot_builder(Callable[[EnvLike], dict[str, AgentSnapshot]] | None, default:None) –Optional
(env) -> dict[str, AgentSnapshot]callable that the loop invokes per step to build the per-uid Cartesian snapshot map the filter consumes viapartner_predicted_states. -
safety_dt(float | None, default:None) –Optional control-step dt in seconds for the constant-velocity predictor lookahead + numerical- difference velocity. Typically
env.control_timestep. -
safety_emergency_controllers(Mapping[str, EmergencyController] | None, default:None) –Optional per-uid :class:
concerto.safety.emergency.EmergencyControllerdispatch map (P1.04.6; ADR-007 §Stage 1b Rev 8). Consulted by :func:concerto.safety.braking.maybe_brakeat each per-step braking call to translate the aggregate Cartesian repulsion into the per-uid control-space override.Nonefalls back to a per-uid :class:concerto.safety.emergency.CartesianAccelEmergencyController— correct for double-integrator agents but a silent dimension-mismatch foot-gun for 7-DOF arms. Wired by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactoryvia :func:chamber.benchmarks.training_runner.build_emergency_controllers. Honoured only whencfg.safety.enabled=TrueAND the rest of the safety stack is wired.
Returns:
-
TrainingResult–class:
TrainingResultNamedTuple carryingcurve(the -
TrainingResult–class:
RewardCurvewith per-step + per-episode ego rewards -
TrainingResult–and saved checkpoint paths) and
trainer(the trained -
TrainingResult–class:
EgoTrainerinstance whoseactmethod evaluation -
TrainingResult–rollouts wrap).
Source code in src/concerto/training/ego_aht.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 | |
CHAMBER — the benchmark
chamber.envs
CHAMBER environment wrappers above ManiSkill v3.
ADR-001 §Decision: extend ManiSkill v3 with three thin wrappers covering the four heterogeneity axes. ADR-005 §Decision: SAPIEN 3 + Warp-MPM is the physics substrate, inherited from ManiSkill v3.
ChamberEnvCompatibilityError
Bases: RuntimeError
Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.
ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate the gate; failing loudly forces ADR re-review.
Source code in src/chamber/envs/errors.py
11 12 13 14 15 16 | |
CommShapingWrapper
Bases: Wrapper
Inject comm-channel output into the observation dict as obs["comm"].
Wraps any env whose observation space is a gym.spaces.Dict. On each
reset and step, calls channel.encode(obs) and stores the result under
the "comm" key. The channel itself controls all degradation semantics
(latency, drop, jitter); this wrapper is a pure carrier.
ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.
Parameters:
-
env(Env) –Env with a
gym.spaces.Dictobservation space. -
channel(CommChannel) –A :class:
chamber.comm.api.CommChannelinstance. In M1 this is a stub :class:chamber.comm.FixedFormatCommChannelthat emits an empty packet shell. M2 fills the real fixed-format encoding.
Raises:
-
ChamberEnvCompatibilityError–if the inner observation space is not a
gym.spaces.Dict(ADR-001 §Risks).
Source code in src/chamber/envs/comm_shaping.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__init__
__init__(env: Env, channel: CommChannel) -> None
Validate the obs space and extend it with a comm sub-space.
Source code in src/chamber/envs/comm_shaping.py
46 47 48 49 50 51 52 53 54 55 | |
reset
reset(**kwargs: object) -> tuple[dict, dict]
Reset the env and channel, injecting comm into the initial obs.
ADR-001 §Decision item 3; ADR-003 §Decision.
Source code in src/chamber/envs/comm_shaping.py
65 66 67 68 69 70 71 72 73 74 | |
step
step(
action: object,
) -> tuple[dict, object, bool, bool, dict]
Step the env and inject updated comm into obs.
ADR-001 §Decision item 3; ADR-003 §Decision.
Source code in src/chamber/envs/comm_shaping.py
76 77 78 79 80 81 82 83 84 | |
ConditionConfig
Bases: NamedTuple
Resolved per-condition configuration (ADR-007 §Stage 1b).
Returned by :func:resolve_condition. Pure-Python NamedTuple so
Tier-1 tests can construct fakes without any SAPIEN dependency.
Attributes:
-
condition_id(str) –The verbatim pre-registration
condition_idstring (one of the four Stage-1 strings). -
agent_uids(tuple[str, str]) –Ordered
(ego_uid, partner_uid)tuple. The ego is the agent the trainer updates / the safety filter authorises; the partner is the frozen / heuristic partner. -
obs_mode(str) –ManiSkill v3 env-level
obs_modesuperset string passed to :class:BaseEnv.__init__. The OM channel-filter wrapper (:class:chamber.envs.stage1_obs_filter.Stage1OMChannelFilter) slices this superset per-uid to the condition's actual channel keep-set. -
is_om_condition(bool) –Truefor the two OMcondition_idstrings (vision_only/vision_plus_force_torque_plus_proprio). Drives whether the channel-filter wrapper is engaged downstream. -
ft_synthesis_enabled(bool) –Truefor the OM-hetero condition. The env always injects the synthesised force-torque channel intoobs["agent"]["panda_wristcam"]["force_torque"]; the channel filter then masks or keeps it per condition. The flag here is informational only (consumed by the smoke scripts and the docstring report).
Source code in src/chamber/envs/stage1_pickplace.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
MPECooperativePushEnv
Bases: Env
2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).
See module docstring for the task design. The env exposes the standard
Gymnasium reset + step API with multi-agent dict actions / dict
observations keyed by uid, matching :class:tests.fakes.FakeMultiAgentEnv
so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).
Source code in src/chamber/envs/mpe_cooperative_push.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
__init__
__init__(
*,
agent_uids: tuple[str, str] = ("ego", "partner"),
episode_length: int = DEFAULT_EPISODE_LENGTH,
root_seed: int = 0,
) -> None
Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).
Parameters:
-
agent_uids(tuple[str, str], default:('ego', 'partner')) –The two uids the env exposes. The default
("ego", "partner")matches the ego-AHT training loop convention. Reordering swaps which agent the trainer updates and which is held frozen — the env is agnostic. -
episode_length(int, default:DEFAULT_EPISODE_LENGTH) –Truncation horizon in env ticks (default :data:
DEFAULT_EPISODE_LENGTH). -
root_seed(int, default:0) –Project-wide root seed; the env derives a deterministic numpy
Generatorfrom this via :func:concerto.training.seeding.derive_substream(P6 reproducibility).
Raises:
-
ValueError–If
agent_uidsdoes not have exactly two distinct entries.
Source code in src/chamber/envs/mpe_cooperative_push.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
reset
reset(
*,
seed: int | None = None,
options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]
Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).
.. note::
This env's seeding semantics intentionally diverge from
Gymnasium's convention (where ``reset(seed=K)`` is the
*complete* seed source). Here, ``reset(seed=K)`` is folded
with the constructor's ``root_seed`` via
:func:`concerto.training.seeding.derive_substream` using the
substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
so two training runs at distinct ``root_seed`` values do
*not* collide on the same per-episode RNG even at the same
``K``. This matches the project-wide seeding philosophy
(P6 + ADR-002 §Decisions): the ``root_seed`` is the
run-level pin, and per-episode ``seed`` is an episode index
scoped under it.
Parameters:
-
seed(int | None, default:None) –Per-episode seed (episode index, NOT a complete seed). When provided, re-derives the env's RNG via
derive_substream(f"env.mpe_cooperative_push.episode.{seed}", root_seed=self._root_seed). WhenNone, continues from the RNG's existing state (Gymnasium convention). -
options(dict[str, Any] | None, default:None) –Reserved for the Gymnasium API; ignored.
Returns:
-
dict[str, Any]–(obs, info)whereobsmatches the -
dict[str, Any]–attr:
observation_spaceshape andinfois empty.
Source code in src/chamber/envs/mpe_cooperative_push.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
step
step(
action: dict[str, NDArray[floating]],
) -> tuple[
dict[str, Any], float, bool, bool, dict[str, Any]
]
Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).
Parameters:
-
action(dict[str, NDArray[floating]]) –Dict keyed by
uid, each value a 2-D velocity command. Components are clipped to[-VELOCITY_BOUND, +VELOCITY_BOUND]before integration so a misbehaving partner cannot escape the plane.
Returns:
-
dict[str, Any]–(obs, reward, terminated, truncated, info).rewardis -
float–the shared cooperative-coverage scalar (both agents see the
-
bool–same value); the trainer's ego/partner split happens upstream
-
in(bool) –func:
concerto.training.ego_aht.train.terminatedis -
dict[str, Any]–always
False(no terminal state);truncatedisTrue -
tuple[dict[str, Any], float, bool, bool, dict[str, Any]]–on the last tick of the episode.
Raises:
-
ValueError–If
actionis missing a required uid.
Source code in src/chamber/envs/mpe_cooperative_push.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
PartnerIdAnnotationWrapper
Bases: Wrapper
Inject obs["meta"]["partner_id"] on every reset and step (ADR-006 risk #3).
Wraps any env whose observation space is a :class:gym.spaces.Dict.
The wrapper extends the obs space with a "meta" sub-Dict (empty
in Phase-0 — Gymnasium's :class:gym.spaces.Dict does not natively
type string values, so the actual partner_id field is carried
in the runtime observation dict rather than declared in the space).
Note
:meth:gym.spaces.Dict.contains returns False on the
"meta" sub-dict because the declared inner space is empty;
this matches :class:chamber.envs.CommShapingWrapper's
precedent for its "comm" sub-space (no current consumer
validates an obs via observation_space.contains). Adding a
string-typed inner Space would require an upstream Gymnasium
change.
Phase-0 binds partner_id at construction. For a Phase-1
Stage-3 PF spike with mid-episode swap, call :meth:set_partner_id
before the next :meth:step; the wrapper will surface the new
value on the subsequent obs.
Parameters:
-
env(Env) –Inner env with a :class:
gym.spaces.Dictobservation space. -
partner_id(str) –Stable 16-hex-char hash from :attr:
chamber.partners.api.PartnerSpec.partner_id. The wrapper does not validate the format — :class:PartnerSpecis the source of truth.
Raises:
-
ChamberEnvCompatibilityError–When the inner observation space is not a :class:
gym.spaces.Dict(ADR-001 §Risks; matches :class:chamber.envs.CommShapingWrapper's compatibility contract).
Source code in src/chamber/envs/partner_meta.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
partner_id
property
partner_id: str
Return the partner identity the wrapper currently injects (ADR-006 risk #3).
Returns:
-
str–The stable 16-hex-char hash bound at construction or via
-
str–meth:
set_partner_id.
__init__
__init__(env: Env, *, partner_id: str) -> None
Validate the obs space and bind the partner identity (ADR-006 risk #3).
Source code in src/chamber/envs/partner_meta.py
94 95 96 97 98 99 100 101 102 103 104 | |
reset
reset(**kwargs: object) -> tuple[dict, dict]
Reset the env and inject partner_id into the initial obs (ADR-006 risk #3).
Source code in src/chamber/envs/partner_meta.py
138 139 140 141 142 143 | |
set_partner_id
set_partner_id(partner_id: str) -> None
Rebind the partner identity for the next reset / step (ADR-004 §risk-mitigation #2).
Used by Phase-1 mid-episode-swap experiments (ADR-007 Stage-3 PF):
call this before the next :meth:step and the wrapper will
surface the new value on the subsequent obs. The change does
NOT trigger a new :meth:reset automatically — the conformal
filter does that itself when it observes the new
partner_id on obs["meta"].
Parameters:
-
partner_id(str) –New stable hash from :attr:
chamber.partners.api.PartnerSpec.partner_id.
Source code in src/chamber/envs/partner_meta.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
step
step(
action: object,
) -> tuple[dict, object, bool, bool, dict]
Step the env and inject the current partner_id (ADR-006 risk #3).
Source code in src/chamber/envs/partner_meta.py
145 146 147 148 149 150 | |
PerAgentActionRepeatWrapper
Bases: Wrapper
Apply different action-repeat counts per agent uid.
Effective control frequency for agent a is::
base_freq / action_repeat[a]
At each env.step call, an agent whose repeat counter has not elapsed
re-applies its most recent action instead of the newly submitted one.
ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).
Parameters:
-
env(Env) –Env exposing a
gym.spaces.Dictaction space keyed by agent uid. -
action_repeat(Mapping[str, int]) –Mapping from agent uid to a positive integer repeat count. Keys absent from this mapping default to repeat=1 (pass-through).
Raises:
-
ChamberEnvCompatibilityError–if the action space is not a
gym.spaces.Dictkeyed by agent uid (ADR-001 §Risks). -
ValueError–if any repeat count is less than 1.
Source code in src/chamber/envs/action_repeat.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
__init__
__init__(
env: Env, action_repeat: Mapping[str, int]
) -> None
Validate the action space and store per-agent repeat counts.
Source code in src/chamber/envs/action_repeat.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
reset
reset(**kwargs: object) -> tuple
Reset the env and clear all per-agent repeat state.
ADR-001 §Decision item 2: no state leaks across episodes.
Source code in src/chamber/envs/action_repeat.py
77 78 79 80 81 82 83 84 85 | |
step
step(action: dict) -> tuple
Step with per-agent action repeat applied.
ADR-001 §Validation criteria condition (c): the slow agent's effective action update interval matches 1/rate within one env tick.
Source code in src/chamber/envs/action_repeat.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
Stage1ASStateSynthesizer
Bases: ObservationWrapper
Synthesise obs["agent"][uid]["state"] for AS conditions (ADR-007 §Stage 1b).
ManiSkill v3.0.1 obs_mode="state_dict" (used by the Stage-1 AS
conditions) emits per-agent Dict(qpos: Box, qvel: Box, ...) with
no top-level "state" key. But
:meth:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config
reads env.observation_space["agent"][ego_uid]["state"] as a 1-D
Box and derives obs_dim = ego_state_space.shape[0]. ManiSkill v3
obs_mode="state" flattens concat(qpos, qvel) into a single
"state" key with that shape; this wrapper replicates that concat
in obs-space so callers can keep obs_mode="state_dict" for the
synthesised-channel injection (force_torque, tcp_pose, ...)
the OM-hetero condition needs while still satisfying the trainer's
1-D state contract.
Pass-through for OM conditions (is_om_condition=True — those go
through :class:Stage1OMChannelFilter instead) and for envs whose
condition_id is missing or unrecognised (Tier-1 safety: keeps
the wrapper a no-op for envs that don't satisfy the Stage-1b
condition_id contract).
Parameters:
-
env(Env[Any, Any]) –Inner env exposing a
condition_idattribute.
Raises:
-
TypeError–If
env.observation_spaceis not a :class:gym.spaces.Dict.
Source code in src/chamber/envs/stage1_obs_filter.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
__getattr__
__getattr__(name: str) -> Any
Forward attribute access to the inner env (Tier-2 caller contract).
Gymnasium 1.3 removed :class:gym.Wrapper's implicit
__getattr__; existing P1.03 callers of
:func:make_stage1_pickplace_env read SAPIEN-env attributes
directly (env.agent, env._jacobian_provider,
env.condition_config...). Forwarding here keeps that
contract without forcing every caller to switch to
:meth:gym.Wrapper.get_wrapper_attr.
__getattr__ only fires when normal attribute lookup misses;
self.env is set by :meth:gym.Wrapper.__init__ before any
downstream access, so the delegation is safe. The env-name
guard is a defensive backstop against recursive lookup on the
env attr itself (only reachable in pathological half-init
paths).
Source code in src/chamber/envs/stage1_obs_filter.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
__init__
__init__(env: Env[Any, Any]) -> None
Detect AS-condition envs at construction (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
observation
observation(observation: dict[str, Any]) -> dict[str, Any]
Inject synthesised state per agent (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
Stage1OMChannelFilter
Bases: ObservationWrapper
OM-axis per-condition channel filter (ADR-007 §Stage 1b).
Reads the inner env's condition_id at construction time and
captures the resulting filter policy. The wrapper does not re-read
condition_id on every step — Stage-1b envs are built once per
(seed, condition) cell (see stage1_as.py:253-275) and the
condition is fixed for the env's lifetime.
Parameters:
-
env(Env[Any, Any]) –Inner env exposing a
condition_idattribute. The :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnvclass satisfies this; the wrapper falls back to a pass-through identity ifcondition_idis missing or not one of the two OM strings.
Raises:
-
TypeError–If the inner env's observation space is not a :class:
gym.spaces.Dict(the wrapper relies on dict-shaped obs).
Source code in src/chamber/envs/stage1_obs_filter.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 | |
condition_id
property
condition_id: str | None
ADR-007 §Stage 1b OM axis: the condition_id resolved at construction.
__getattr__
__getattr__(name: str) -> Any
Forward attribute access to the inner env (Tier-2 caller contract).
Mirrors :meth:Stage1ASStateSynthesizer.__getattr__: Gymnasium 1.3
removed :class:gym.Wrapper's implicit forwarding, so callers of
:func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env
— which now applies this wrapper on top of the AS synthesizer —
would otherwise lose access to SAPIEN-env attributes such as
env.agent, env.obs_mode, env.condition_config.
Source code in src/chamber/envs/stage1_obs_filter.py
335 336 337 338 339 340 341 342 343 344 345 346 347 | |
__init__
__init__(env: Env[Any, Any]) -> None
Store the per-condition filter policy (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | |
observation
observation(observation: dict[str, Any]) -> dict[str, Any]
Apply the per-condition channel mask (ADR-007 §Stage 1b).
OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
and the obs["extra"]["force_torque"] channel; keeps
obs["extra"]["tcp_pose"] + obs["extra"]["goal_pos"]
(the minimal task-relevant fields) plus obs["sensor_data"]
(RGB-D camera) untouched. Other conditions pass through.
Source code in src/chamber/envs/stage1_obs_filter.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
TextureFilterObsWrapper
Bases: ObservationWrapper
Filter observation channels per agent uid, zero-masking unlisted channels.
Tensor shapes are preserved — masked channels become zero arrays of the same shape and dtype. This keeps downstream policy networks oblivious to which modalities a given partner actually receives.
ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.
Parameters:
-
env(Env) –Env whose observation has shape
{"agent": {uid: {channel: array}}}. -
keep_per_agent(Mapping[str, Iterable[str]]) –Mapping from agent uid to the iterable of channel keys to keep. Channels not listed are zero-masked. Uids absent from this mapping are passed through unchanged.
Source code in src/chamber/envs/texture_filter.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
__init__
__init__(
env: Env, keep_per_agent: Mapping[str, Iterable[str]]
) -> None
Store per-agent channel keep-sets as frozensets for O(1) lookup.
Source code in src/chamber/envs/texture_filter.py
35 36 37 38 39 40 | |
observation
observation(observation: dict) -> dict
Apply per-agent channel mask, zero-filling unlisted channels.
ADR-001 §Decision item 1: obs modality is a controllable heterogeneity axis; shapes are invariant so vector rollout is unaffected.
Source code in src/chamber/envs/texture_filter.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
build_control_models_for_condition
build_control_models_for_condition(
condition_id: str,
*,
jacobian_fn: Callable[[AgentSnapshot], NDArray[float64]]
| None = None,
fetch_action_dim: int | None = None,
panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]
Build per-uid :class:AgentControlModel map for condition_id (ADR-004 §Decision).
Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable
is plumbed in by the env at construction time (see
:meth:Stage1PickPlaceEnv.build_control_models); Tier-1 tests pass
jacobian_fn=None to exercise the model-construction path, which
keeps the placeholder per :class:JacobianControlModel's
loud-fail-on-use contract (concerto/safety/api.py:552).
Parameters:
-
condition_id(str) –Stage-1
condition_id(raises if unknown). -
jacobian_fn(Callable[[AgentSnapshot], NDArray[float64]] | None, default:None) –Optional Jacobian callable to embed in the panda uid's :class:
JacobianControlModel.Nonemeans the placeholder model is returned (Tier-1 path); pass the env's closure-via-env-reference callable (lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])) to get a working model for Tier-2 / production. -
fetch_action_dim(int | None, default:None) –Action dimension of the fetch uid's Box.
None(default) is the Tier-1 sentinel — :class:DoubleIntegratorControlModelwithaction_dim=2(just the differential-drive base axis; the arm + body + gripper are folded into the same model with the canonical "treat the whole action as a 1-D Cartesian acceleration" approximation P1.05.5 will tighten). Tier-2 paths pass the actualenv.action_space[uid].shape[0]. -
panda_action_dim(int, default:8) –Action dimension of the panda uid's Box. Defaults to
8(7 arm joint deltas + 1 mimic-gripper joint delta underpd_joint_delta_pos). Bothpanda_wristcamandpanda_partnershare this shape.
Returns:
-
dict[str, AgentControlModel]–{uid: AgentControlModel}map keyed by the condition's -
dict[str, AgentControlModel]–attr:
ConditionConfig.agent_uids. The panda uid (always the -
dict[str, AgentControlModel]–ego, always present) gets a :class:
JacobianControlModel; the -
dict[str, AgentControlModel]–partner uid gets a :class:
JacobianControlModelfor -
dict[str, AgentControlModel]–panda_partner(AS-homo) or -
dict[str, AgentControlModel]–class:
DoubleIntegratorControlModelforfetch(AS-hetero / -
dict[str, AgentControlModel]–OM-*).
Raises:
-
ValueError–If
condition_idis not in the four-condition table (delegated to :func:resolve_condition).
Source code in src/chamber/envs/stage1_pickplace.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 | |
make_stage1_pickplace_env
make_stage1_pickplace_env(
*,
condition_id: str,
episode_length: int = DEFAULT_EPISODE_LENGTH,
root_seed: int = 0,
num_envs: int = 1,
render_mode: str | None = None,
render_backend: str | None = None,
force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]
Build a :class:Stage1PickPlaceEnv instance (ADR-007 §Stage 1b).
Factory entry point. SAPIEN / ManiSkill imports are deferred to the
body so python -c "import chamber.envs.stage1_pickplace" works
on a Vulkan-less host (Tier-1 contract). The
:class:Stage1PickPlaceEnv class itself is defined inside the
factory body — same pattern as
:func:chamber.benchmarks.stage0_smoke.make_stage0_env.
Parameters:
-
condition_id(str) –One of the four Stage-1
condition_idstrings; raises :class:ValueErroron any other input (cf. :func:resolve_condition). -
episode_length(int, default:DEFAULT_EPISODE_LENGTH) –Truncation horizon, in env ticks. Default :data:
DEFAULT_EPISODE_LENGTH(100). See module docstring for the budget rationale. -
root_seed(int, default:0) –Project root seed routed to the env's :func:
concerto.training.seeding.derive_substreamsubstream for P6 reproducibility (P6 / ADR-002 determinism rule). -
num_envs(int, default:1) –ManiSkill vectorisation count; default 1 (Stage-1b paths are single-env per cell). Higher values are supported but untested.
-
render_mode(str | None, default:None) –ManiSkill
render_mode(None,"human","rgb_array", etc.).None(default) disables rendering for the spike-runner path. -
render_backend(str | None, default:None) –ManiSkill
render_backend("gpu","none", …).None(default) falls through to ManiSkill's own default. The Stage-0 smoke env usesCHAMBER_RENDER_BACKENDenv-var probe + the visual-material strip patch; Stage-1b follows the same pattern but exposes the backend choice as a constructor kwarg too. -
force_torque_noise_sigma(float, default:SYNTHESISED_FT_NOISE_SIGMA_N) –sigma of the zero-mean Gaussian noise added to the synthesised force-torque channel, in Newtons. Default :data:
SYNTHESISED_FT_NOISE_SIGMA_N. Override only with an ADR-007 §Revision history entry (the value is load-bearing for the OM gate; see module docstring).
Returns:
-
Env[Any, Any]–class:
Stage1PickPlaceEnvinstance, ready to call -
Env[Any, Any]–reset(seed=K)on. The action space is a -
Env[Any, Any]–class:
gym.spaces.Dictkeyed by the resolved -
Env[Any, Any]–attr:
ConditionConfig.agent_uids.
Raises:
-
ValueError–If
condition_idis unknown (see :func:resolve_condition). -
ChamberEnvCompatibilityError–If ManiSkill / SAPIEN / Vulkan initialisation fails (CPU-only host without a fallback
render_backend="none"; see ADR-001 §Risks).
Source code in src/chamber/envs/stage1_pickplace.py
498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 | |
resolve_condition
resolve_condition(condition_id: str) -> ConditionConfig
Resolve a pre-registered condition_id to its config (ADR-007 §Stage 1b).
Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the
four-condition coverage and the ValueError raise on bogus IDs.
Parameters:
-
condition_id(str) –One of the four verbatim Stage-1 pre-registration strings (see
spikes/preregistration/{AS,OM}.yaml).
Returns:
-
Resolved(ConditionConfig) –class:
ConditionConfigcarrying the -
ConditionConfig–agent_uidstuple, env-levelobs_modesuperset, and the -
ConditionConfig–boolean flags downstream wrappers / scripts read.
Raises:
-
ValueError–If
condition_idis not one of the four Stage-1 pre-registration strings. The message names the four valid options and cites ADR-007 §Discipline (the project anti-pattern for editing prereg YAMLs).
Source code in src/chamber/envs/stage1_pickplace.py
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
Per-agent action-repeat wrapper.
ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).
PerAgentActionRepeatWrapper
Bases: Wrapper
Apply different action-repeat counts per agent uid.
Effective control frequency for agent a is::
base_freq / action_repeat[a]
At each env.step call, an agent whose repeat counter has not elapsed
re-applies its most recent action instead of the newly submitted one.
ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).
Parameters:
-
env(Env) –Env exposing a
gym.spaces.Dictaction space keyed by agent uid. -
action_repeat(Mapping[str, int]) –Mapping from agent uid to a positive integer repeat count. Keys absent from this mapping default to repeat=1 (pass-through).
Raises:
-
ChamberEnvCompatibilityError–if the action space is not a
gym.spaces.Dictkeyed by agent uid (ADR-001 §Risks). -
ValueError–if any repeat count is less than 1.
Source code in src/chamber/envs/action_repeat.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
__init__
__init__(
env: Env, action_repeat: Mapping[str, int]
) -> None
Validate the action space and store per-agent repeat counts.
Source code in src/chamber/envs/action_repeat.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
reset
reset(**kwargs: object) -> tuple
Reset the env and clear all per-agent repeat state.
ADR-001 §Decision item 2: no state leaks across episodes.
Source code in src/chamber/envs/action_repeat.py
77 78 79 80 81 82 83 84 85 | |
step
step(action: dict) -> tuple
Step with per-agent action repeat applied.
ADR-001 §Validation criteria condition (c): the slow agent's effective action update interval matches 1/rate within one env tick.
Source code in src/chamber/envs/action_repeat.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
Per-agent observation-channel filter.
ADR-001 §Decision item 1 (obs modality heterogeneity axis); ADR-007 §Stage 1 OM axis (rev 3 §Stage 1).
TextureFilterObsWrapper
Bases: ObservationWrapper
Filter observation channels per agent uid, zero-masking unlisted channels.
Tensor shapes are preserved — masked channels become zero arrays of the same shape and dtype. This keeps downstream policy networks oblivious to which modalities a given partner actually receives.
ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.
Parameters:
-
env(Env) –Env whose observation has shape
{"agent": {uid: {channel: array}}}. -
keep_per_agent(Mapping[str, Iterable[str]]) –Mapping from agent uid to the iterable of channel keys to keep. Channels not listed are zero-masked. Uids absent from this mapping are passed through unchanged.
Source code in src/chamber/envs/texture_filter.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
__init__
__init__(
env: Env, keep_per_agent: Mapping[str, Iterable[str]]
) -> None
Store per-agent channel keep-sets as frozensets for O(1) lookup.
Source code in src/chamber/envs/texture_filter.py
35 36 37 38 39 40 | |
observation
observation(observation: dict) -> dict
Apply per-agent channel mask, zero-filling unlisted channels.
ADR-001 §Decision item 1: obs modality is a controllable heterogeneity axis; shapes are invariant so vector rollout is unaffected.
Source code in src/chamber/envs/texture_filter.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
Communication-channel observation injector (M1 carrier; M2 fills logic).
ADR-001 §Decision item 3 (comm shaping is the outermost wrapper); ADR-003 §Decision (fixed-format + optional learned overlay via CommChannel); ADR-006 §Decision (URLLC-anchored numeric bounds enforced by the channel impl).
The actual comm logic (fixed-format encoding, AoI accounting, optional learned
overlay, URLLC-anchored degradation) lives in :mod:chamber.comm (M2).
This wrapper is a thin adapter that places the comm output into obs["comm"].
CommShapingWrapper
Bases: Wrapper
Inject comm-channel output into the observation dict as obs["comm"].
Wraps any env whose observation space is a gym.spaces.Dict. On each
reset and step, calls channel.encode(obs) and stores the result under
the "comm" key. The channel itself controls all degradation semantics
(latency, drop, jitter); this wrapper is a pure carrier.
ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.
Parameters:
-
env(Env) –Env with a
gym.spaces.Dictobservation space. -
channel(CommChannel) –A :class:
chamber.comm.api.CommChannelinstance. In M1 this is a stub :class:chamber.comm.FixedFormatCommChannelthat emits an empty packet shell. M2 fills the real fixed-format encoding.
Raises:
-
ChamberEnvCompatibilityError–if the inner observation space is not a
gym.spaces.Dict(ADR-001 §Risks).
Source code in src/chamber/envs/comm_shaping.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__init__
__init__(env: Env, channel: CommChannel) -> None
Validate the obs space and extend it with a comm sub-space.
Source code in src/chamber/envs/comm_shaping.py
46 47 48 49 50 51 52 53 54 55 | |
reset
reset(**kwargs: object) -> tuple[dict, dict]
Reset the env and channel, injecting comm into the initial obs.
ADR-001 §Decision item 3; ADR-003 §Decision.
Source code in src/chamber/envs/comm_shaping.py
65 66 67 68 69 70 71 72 73 74 | |
step
step(
action: object,
) -> tuple[dict, object, bool, bool, dict]
Step the env and inject updated comm into obs.
ADR-001 §Decision item 3; ADR-003 §Decision.
Source code in src/chamber/envs/comm_shaping.py
76 77 78 79 80 81 82 83 84 | |
2-agent MPE Cooperative-Push env for the empirical-guarantee experiment (T4b.13).
Plan/05 §3.5 (decision row "Empirical guarantee Phase-0 verification") +
ADR-002 risk-mitigation #1: the Phase-0 ego-AHT empirical-guarantee
experiment runs on a 2-agent MPE-style cooperative continuous-control
task. This module ships a deterministic hand-rolled equivalent of
PettingZoo's simple_spread_v3 rather than depending on PettingZoo
itself — Phase 0 keeps the dependency footprint minimal (PettingZoo
deprecated MPE in 1.25 and split it into the younger mpe2 package;
either lift would mean a dep-auditor pass for a single Phase-0 task).
The task structure matches:
- Two agents (default uids
egoandpartner) on the unit-square plane[-1, 1]^2with continuous 2-D velocity actions clipped to[-1, 1]per axis. - Two landmarks fixed per episode at random positions sampled from a
determinism-harnessed numpy
Generator(seed routed via :func:concerto.training.seeding.derive_substream). - Shared reward (both agents see the same scalar):
r = -sum_landmark min_agent ||agent_pos - landmark_pos||. The cooperative-coverage gradient encourages the agents to pick distinct landmarks; a single agent cannot driverto zero on its own. - Episode length fixed at :data:
DEFAULT_EPISODE_LENGTH; truncation only (no terminal state).
The env is intentionally Gymnasium-compatible with CONCERTO's other
wrappers (M2 comm + M3 safety): observation is keyed by obs["agent"][uid]
with the state channel that
:class:chamber.partners.heuristic.ScriptedHeuristicPartner already
reads in its three-tier fallback (plan/04 §3.4). Wrapping this env
through :class:chamber.envs.action_repeat.PerAgentActionRepeatWrapper,
:class:chamber.envs.comm_shaping.CommShapingWrapper, etc. is therefore
a zero-config drop-in.
Determinism (P6): two reset(seed=K) calls on the same root_seed
produce byte-identical landmark layouts and agent start poses; the same
action stream then produces a byte-identical state trajectory and reward
curve. test_mpe_cooperative_push.py::test_determinism_byte_identical_traj
pins this contract.
MPECooperativePushEnv
Bases: Env
2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).
See module docstring for the task design. The env exposes the standard
Gymnasium reset + step API with multi-agent dict actions / dict
observations keyed by uid, matching :class:tests.fakes.FakeMultiAgentEnv
so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).
Source code in src/chamber/envs/mpe_cooperative_push.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
__init__
__init__(
*,
agent_uids: tuple[str, str] = ("ego", "partner"),
episode_length: int = DEFAULT_EPISODE_LENGTH,
root_seed: int = 0,
) -> None
Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).
Parameters:
-
agent_uids(tuple[str, str], default:('ego', 'partner')) –The two uids the env exposes. The default
("ego", "partner")matches the ego-AHT training loop convention. Reordering swaps which agent the trainer updates and which is held frozen — the env is agnostic. -
episode_length(int, default:DEFAULT_EPISODE_LENGTH) –Truncation horizon in env ticks (default :data:
DEFAULT_EPISODE_LENGTH). -
root_seed(int, default:0) –Project-wide root seed; the env derives a deterministic numpy
Generatorfrom this via :func:concerto.training.seeding.derive_substream(P6 reproducibility).
Raises:
-
ValueError–If
agent_uidsdoes not have exactly two distinct entries.
Source code in src/chamber/envs/mpe_cooperative_push.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
reset
reset(
*,
seed: int | None = None,
options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]
Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).
.. note::
This env's seeding semantics intentionally diverge from
Gymnasium's convention (where ``reset(seed=K)`` is the
*complete* seed source). Here, ``reset(seed=K)`` is folded
with the constructor's ``root_seed`` via
:func:`concerto.training.seeding.derive_substream` using the
substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
so two training runs at distinct ``root_seed`` values do
*not* collide on the same per-episode RNG even at the same
``K``. This matches the project-wide seeding philosophy
(P6 + ADR-002 §Decisions): the ``root_seed`` is the
run-level pin, and per-episode ``seed`` is an episode index
scoped under it.
Parameters:
-
seed(int | None, default:None) –Per-episode seed (episode index, NOT a complete seed). When provided, re-derives the env's RNG via
derive_substream(f"env.mpe_cooperative_push.episode.{seed}", root_seed=self._root_seed). WhenNone, continues from the RNG's existing state (Gymnasium convention). -
options(dict[str, Any] | None, default:None) –Reserved for the Gymnasium API; ignored.
Returns:
-
dict[str, Any]–(obs, info)whereobsmatches the -
dict[str, Any]–attr:
observation_spaceshape andinfois empty.
Source code in src/chamber/envs/mpe_cooperative_push.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
step
step(
action: dict[str, NDArray[floating]],
) -> tuple[
dict[str, Any], float, bool, bool, dict[str, Any]
]
Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).
Parameters:
-
action(dict[str, NDArray[floating]]) –Dict keyed by
uid, each value a 2-D velocity command. Components are clipped to[-VELOCITY_BOUND, +VELOCITY_BOUND]before integration so a misbehaving partner cannot escape the plane.
Returns:
-
dict[str, Any]–(obs, reward, terminated, truncated, info).rewardis -
float–the shared cooperative-coverage scalar (both agents see the
-
bool–same value); the trainer's ego/partner split happens upstream
-
in(bool) –func:
concerto.training.ego_aht.train.terminatedis -
dict[str, Any]–always
False(no terminal state);truncatedisTrue -
tuple[dict[str, Any], float, bool, bool, dict[str, Any]]–on the last tick of the episode.
Raises:
-
ValueError–If
actionis missing a required uid.
Source code in src/chamber/envs/mpe_cooperative_push.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
Stage-1b pick-place env (ADR-007 §Stage 1b; P1.03 / PR #
Real ManiSkill v3 wrap of the canonical pick-place task with the
panda + fetch / panda + panda_partner robot tuples. Supports the four
Stage-1 condition_id strings the AS / OM pre-registrations name
verbatim (spikes/preregistration/AS.yaml,
spikes/preregistration/OM.yaml; git tags
prereg-stage1-{AS,OM}-2026-05-15):
- AS-homo:
stage1_pickplace_panda_only_mappo_shared_param-> agents("panda_wristcam", "panda_partner"). - AS-hetero:
stage1_pickplace_panda_plus_fetch_ego_aht_happo_per_agent-> agents("panda_wristcam", "fetch"). - OM-homo:
stage1_pickplace_vision_only-> agents("panda_wristcam", "fetch"); vision-only channel keep-set. - OM-hetero:
stage1_pickplace_vision_plus_force_torque_plus_proprio-> same agents; vision + synthesised FT + proprio channel keep-set.
Per ADR-007 §Discipline the pre-registration YAMLs and their git tags carry forward unchanged from Phase 0 — Stage 1b uses the same condition IDs, seed list (5 seeds, 20 episodes per seed = 100 episodes per condition), and episode budget; only the env implementation behind the condition IDs changes. Editing a prereg YAML post-launch is a project anti-pattern (ADR-007 §Discipline).
Stage-1a's MPE stand-in
(:class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv)
sequenced rig validation; this env is the Stage-1b science evaluation
target. Compute budget per ADR-007 §Stage 1b: 5 GPU-h per axis on A100;
:data:DEFAULT_EPISODE_LENGTH = 100 chosen to fit the budget given
~0.8-1.2 s/step at multi-agent SAPIEN scale.
Wrapper-only discipline (ADR-001 §Decision (wrapper-only)):
no patches to mani_skill/; no _* private imports. The env
subclasses :class:mani_skill.envs.sapien_env.BaseEnv directly
(matching the Stage-0 smoke env's _Stage0SmokeEnv pattern, but with
the canonical pick-place reward and a two-robot tuple). The
:func:chamber.envs._sapien_compat.load_agent_with_bare_uids shim
handles the multi-agent uid-suffix strip-rebind that both Stage-0 and
Stage-1 need.
Action space and the AS axis (decision Q3 / ADR-004 §Decision):
All agents use control_mode="pd_joint_delta_pos" so the 7-DOF
panda arm action stays joint-space, not Cartesian — preserving the AS
axis's intended "7-DOF arm vs 2-DOF differential-drive base"
heterogeneity (ADR-007 §Stage 1 AS spike framing). The CBF-QP outer
filter's Cartesian-acceleration interpretation of the action is then
an approximation absorbed by the conformal-slack overlay (Huriot & Sibai
2025 §VI); see the ADR-007 §Stage 1b note for the quantitative
dimensional analysis ("off by 1/dt² (~2500x at dt=0.02s, small-Δp
scale)") and ADR-004 §Open questions / slice P1.05.5 for the
contingent long-term fix (extend :class:AgentSnapshot with qpos).
Force-torque synthesis (decision Q2 / ADR-007 §Revision history rev. 5):
The OM-hetero "force_torque" channel is synthesised from the
panda gripper's fingertip contact wrenches via
:meth:Articulation.get_net_contact_forces(["panda_leftfinger", "panda_rightfinger"])
plus zero-mean Gaussian noise at
:data:SYNTHESISED_FT_NOISE_SIGMA_N Newtons (default 0.5 N — at the
lower bound of the real Franka wrist-FT noise floor of 0.5-1.5 N).
This is not a real Panda wrist FT sensor reading. Wrist-mounted
virtual FT is a Phase-1.5 ablation (ADR-007 §Open questions; Stage-1b+0.5
follow-up). The sigma value lives in this module + the ADR-007 §Revision
history entry only — never in the prereg YAML (ADR-007 §Discipline:
tag rotation forbidden).
Cooperation gradient (decision Q in design conversation):
Cube and goal spawn locations follow the upstream PickCubeEnv
defaults for now (cube within ±5 cm of the table origin; goal within
the same envelope, z ∈ [0, 0.3]). The cooperative gradient between
AS-homo (two pandas) and AS-hetero (panda + fetch) is itself a
Stage-1b empirical question; pose tuning is sequenced as a P1.04
follow-up once the first pilot's success-rate landscape is known
(don't gold-plate at design time).
Tier-1 / Tier-2 split (decision 9):
The module's top-level imports are intentionally Tier-1-safe
(numpy, gymnasium, concerto.safety.api types,
chamber.envs.errors). ManiSkill / SAPIEN /
pytorch_kinematics imports happen inside
:func:make_stage1_pickplace_env so
python -c "import chamber.envs.stage1_pickplace" succeeds on a
Vulkan-less host. The pure-function Tier-1 surface
(:func:resolve_condition, :func:build_control_models_for_condition)
is what the Tier-1 tests exercise; Tier-2 SAPIEN-gated tests cover
real env construction.
References:
- ADR-001 §Decision (wrapper-only); ADR-001 §Validation criteria
(Stage-0 pattern this env mirrors).
- ADR-004 §Decision (CBF-QP + JacobianControlModel contract);
ADR-004 §Open questions (AgentSnapshot.qpos extension, slice P1.05.5).
- ADR-005 §Decision (SAPIEN 3 + Warp-MPM substrate, inherited from
ManiSkill v3).
- ADR-007 §Stage 1b (this env's spec); ADR-007 §Discipline (tag
rotation forbidden); ADR-007 §Open questions (FT wrist-mount
ablation, P1.05.5 contingent slice).
- spikes/preregistration/{AS,OM}.yaml (the condition_id strings
this env resolves verbatim).
- :class:concerto.safety.api.JacobianControlModel,
:class:concerto.safety.api.DoubleIntegratorControlModel (the
per-uid control models :meth:Stage1PickPlaceEnv.build_control_models
returns).
ConditionConfig
Bases: NamedTuple
Resolved per-condition configuration (ADR-007 §Stage 1b).
Returned by :func:resolve_condition. Pure-Python NamedTuple so
Tier-1 tests can construct fakes without any SAPIEN dependency.
Attributes:
-
condition_id(str) –The verbatim pre-registration
condition_idstring (one of the four Stage-1 strings). -
agent_uids(tuple[str, str]) –Ordered
(ego_uid, partner_uid)tuple. The ego is the agent the trainer updates / the safety filter authorises; the partner is the frozen / heuristic partner. -
obs_mode(str) –ManiSkill v3 env-level
obs_modesuperset string passed to :class:BaseEnv.__init__. The OM channel-filter wrapper (:class:chamber.envs.stage1_obs_filter.Stage1OMChannelFilter) slices this superset per-uid to the condition's actual channel keep-set. -
is_om_condition(bool) –Truefor the two OMcondition_idstrings (vision_only/vision_plus_force_torque_plus_proprio). Drives whether the channel-filter wrapper is engaged downstream. -
ft_synthesis_enabled(bool) –Truefor the OM-hetero condition. The env always injects the synthesised force-torque channel intoobs["agent"]["panda_wristcam"]["force_torque"]; the channel filter then masks or keeps it per condition. The flag here is informational only (consumed by the smoke scripts and the docstring report).
Source code in src/chamber/envs/stage1_pickplace.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
build_control_models_for_condition
build_control_models_for_condition(
condition_id: str,
*,
jacobian_fn: Callable[[AgentSnapshot], NDArray[float64]]
| None = None,
fetch_action_dim: int | None = None,
panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]
Build per-uid :class:AgentControlModel map for condition_id (ADR-004 §Decision).
Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable
is plumbed in by the env at construction time (see
:meth:Stage1PickPlaceEnv.build_control_models); Tier-1 tests pass
jacobian_fn=None to exercise the model-construction path, which
keeps the placeholder per :class:JacobianControlModel's
loud-fail-on-use contract (concerto/safety/api.py:552).
Parameters:
-
condition_id(str) –Stage-1
condition_id(raises if unknown). -
jacobian_fn(Callable[[AgentSnapshot], NDArray[float64]] | None, default:None) –Optional Jacobian callable to embed in the panda uid's :class:
JacobianControlModel.Nonemeans the placeholder model is returned (Tier-1 path); pass the env's closure-via-env-reference callable (lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])) to get a working model for Tier-2 / production. -
fetch_action_dim(int | None, default:None) –Action dimension of the fetch uid's Box.
None(default) is the Tier-1 sentinel — :class:DoubleIntegratorControlModelwithaction_dim=2(just the differential-drive base axis; the arm + body + gripper are folded into the same model with the canonical "treat the whole action as a 1-D Cartesian acceleration" approximation P1.05.5 will tighten). Tier-2 paths pass the actualenv.action_space[uid].shape[0]. -
panda_action_dim(int, default:8) –Action dimension of the panda uid's Box. Defaults to
8(7 arm joint deltas + 1 mimic-gripper joint delta underpd_joint_delta_pos). Bothpanda_wristcamandpanda_partnershare this shape.
Returns:
-
dict[str, AgentControlModel]–{uid: AgentControlModel}map keyed by the condition's -
dict[str, AgentControlModel]–attr:
ConditionConfig.agent_uids. The panda uid (always the -
dict[str, AgentControlModel]–ego, always present) gets a :class:
JacobianControlModel; the -
dict[str, AgentControlModel]–partner uid gets a :class:
JacobianControlModelfor -
dict[str, AgentControlModel]–panda_partner(AS-homo) or -
dict[str, AgentControlModel]–class:
DoubleIntegratorControlModelforfetch(AS-hetero / -
dict[str, AgentControlModel]–OM-*).
Raises:
-
ValueError–If
condition_idis not in the four-condition table (delegated to :func:resolve_condition).
Source code in src/chamber/envs/stage1_pickplace.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 | |
make_stage1_pickplace_env
make_stage1_pickplace_env(
*,
condition_id: str,
episode_length: int = DEFAULT_EPISODE_LENGTH,
root_seed: int = 0,
num_envs: int = 1,
render_mode: str | None = None,
render_backend: str | None = None,
force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]
Build a :class:Stage1PickPlaceEnv instance (ADR-007 §Stage 1b).
Factory entry point. SAPIEN / ManiSkill imports are deferred to the
body so python -c "import chamber.envs.stage1_pickplace" works
on a Vulkan-less host (Tier-1 contract). The
:class:Stage1PickPlaceEnv class itself is defined inside the
factory body — same pattern as
:func:chamber.benchmarks.stage0_smoke.make_stage0_env.
Parameters:
-
condition_id(str) –One of the four Stage-1
condition_idstrings; raises :class:ValueErroron any other input (cf. :func:resolve_condition). -
episode_length(int, default:DEFAULT_EPISODE_LENGTH) –Truncation horizon, in env ticks. Default :data:
DEFAULT_EPISODE_LENGTH(100). See module docstring for the budget rationale. -
root_seed(int, default:0) –Project root seed routed to the env's :func:
concerto.training.seeding.derive_substreamsubstream for P6 reproducibility (P6 / ADR-002 determinism rule). -
num_envs(int, default:1) –ManiSkill vectorisation count; default 1 (Stage-1b paths are single-env per cell). Higher values are supported but untested.
-
render_mode(str | None, default:None) –ManiSkill
render_mode(None,"human","rgb_array", etc.).None(default) disables rendering for the spike-runner path. -
render_backend(str | None, default:None) –ManiSkill
render_backend("gpu","none", …).None(default) falls through to ManiSkill's own default. The Stage-0 smoke env usesCHAMBER_RENDER_BACKENDenv-var probe + the visual-material strip patch; Stage-1b follows the same pattern but exposes the backend choice as a constructor kwarg too. -
force_torque_noise_sigma(float, default:SYNTHESISED_FT_NOISE_SIGMA_N) –sigma of the zero-mean Gaussian noise added to the synthesised force-torque channel, in Newtons. Default :data:
SYNTHESISED_FT_NOISE_SIGMA_N. Override only with an ADR-007 §Revision history entry (the value is load-bearing for the OM gate; see module docstring).
Returns:
-
Env[Any, Any]–class:
Stage1PickPlaceEnvinstance, ready to call -
Env[Any, Any]–reset(seed=K)on. The action space is a -
Env[Any, Any]–class:
gym.spaces.Dictkeyed by the resolved -
Env[Any, Any]–attr:
ConditionConfig.agent_uids.
Raises:
-
ValueError–If
condition_idis unknown (see :func:resolve_condition). -
ChamberEnvCompatibilityError–If ManiSkill / SAPIEN / Vulkan initialisation fails (CPU-only host without a fallback
render_backend="none"; see ADR-001 §Risks).
Source code in src/chamber/envs/stage1_pickplace.py
498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 | |
resolve_condition
resolve_condition(condition_id: str) -> ConditionConfig
Resolve a pre-registered condition_id to its config (ADR-007 §Stage 1b).
Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the
four-condition coverage and the ValueError raise on bogus IDs.
Parameters:
-
condition_id(str) –One of the four verbatim Stage-1 pre-registration strings (see
spikes/preregistration/{AS,OM}.yaml).
Returns:
-
Resolved(ConditionConfig) –class:
ConditionConfigcarrying the -
ConditionConfig–agent_uidstuple, env-levelobs_modesuperset, and the -
ConditionConfig–boolean flags downstream wrappers / scripts read.
Raises:
-
ValueError–If
condition_idis not one of the four Stage-1 pre-registration strings. The message names the four valid options and cites ADR-007 §Discipline (the project anti-pattern for editing prereg YAMLs).
Source code in src/chamber/envs/stage1_pickplace.py
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
Stage-1 observation wrappers (ADR-007 §Stage 1b).
Two wrappers, applied in fixed order by
:func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env:
- :class:
Stage1ASStateSynthesizer— synthesisesobs["agent"][uid]["state"]asconcat(qpos, qvel)for AS conditions, bridging ManiSkill v3obs_mode="state_dict"(which emits per-agentDict(qpos, qvel, ...)) to :class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer'sobs["agent"][ego_uid]["state"]contract. Pass-through for OM conditions and for envs without a recognisedcondition_id. - :class:
Stage1OMChannelFilter— per-condition channel keep-set for the OM axis. The Stage-1 AS conditions don't filter at all (they useobs_mode="state_dict"which already excludes the camera data the OM axis differentiates on); the wrapper passes those conditions through untouched.
For the two OM conditions:
stage1_pickplace_vision_only— keepobs["sensor_data"](the RGB-D camera channels) plus the task-extra fields the policy needs to be functional at the Stage-1b episode budget (tcp_pose,goal_pos). Zero-mask the agent-proprio sub-dicts underobs["agent"][uid]and the synthesisedobs["extra"]["force_torque"]channel.stage1_pickplace_vision_plus_force_torque_plus_proprio— pass-through (no filtering); the policy sees everything.
ManiSkill v3 obs layout reminder (mani_skill/envs/sapien_env.py:546-630):
obs["agent"][uid] carries proprio (state, joint_pos, joint_vel);
obs["extra"] carries task-level fields from
:meth:_get_obs_extra; obs["sensor_data"] carries camera channels
when the env's obs_mode is image-bearing.
Shape preservation: zero-masked channels become zero arrays of the same
shape and dtype (mirrors
:class:chamber.envs.texture_filter.TextureFilterObsWrapper's pattern).
This keeps downstream policy networks oblivious to which modalities a
given uid actually receives at runtime — only the values change
across conditions, not the shapes.
References:
- ADR-007 §Stage 1b (OM axis spec); spikes/preregistration/OM.yaml
(the condition_id strings this wrapper resolves).
- :class:chamber.envs.texture_filter.TextureFilterObsWrapper (the
parallel pattern for the Stage-0 smoke env's per-uid keep-set).
- :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv (the
inner env that supplies the synthesised FT channel and the
condition_id attribute).
Stage1ASStateSynthesizer
Bases: ObservationWrapper
Synthesise obs["agent"][uid]["state"] for AS conditions (ADR-007 §Stage 1b).
ManiSkill v3.0.1 obs_mode="state_dict" (used by the Stage-1 AS
conditions) emits per-agent Dict(qpos: Box, qvel: Box, ...) with
no top-level "state" key. But
:meth:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config
reads env.observation_space["agent"][ego_uid]["state"] as a 1-D
Box and derives obs_dim = ego_state_space.shape[0]. ManiSkill v3
obs_mode="state" flattens concat(qpos, qvel) into a single
"state" key with that shape; this wrapper replicates that concat
in obs-space so callers can keep obs_mode="state_dict" for the
synthesised-channel injection (force_torque, tcp_pose, ...)
the OM-hetero condition needs while still satisfying the trainer's
1-D state contract.
Pass-through for OM conditions (is_om_condition=True — those go
through :class:Stage1OMChannelFilter instead) and for envs whose
condition_id is missing or unrecognised (Tier-1 safety: keeps
the wrapper a no-op for envs that don't satisfy the Stage-1b
condition_id contract).
Parameters:
-
env(Env[Any, Any]) –Inner env exposing a
condition_idattribute.
Raises:
-
TypeError–If
env.observation_spaceis not a :class:gym.spaces.Dict.
Source code in src/chamber/envs/stage1_obs_filter.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
__getattr__
__getattr__(name: str) -> Any
Forward attribute access to the inner env (Tier-2 caller contract).
Gymnasium 1.3 removed :class:gym.Wrapper's implicit
__getattr__; existing P1.03 callers of
:func:make_stage1_pickplace_env read SAPIEN-env attributes
directly (env.agent, env._jacobian_provider,
env.condition_config...). Forwarding here keeps that
contract without forcing every caller to switch to
:meth:gym.Wrapper.get_wrapper_attr.
__getattr__ only fires when normal attribute lookup misses;
self.env is set by :meth:gym.Wrapper.__init__ before any
downstream access, so the delegation is safe. The env-name
guard is a defensive backstop against recursive lookup on the
env attr itself (only reachable in pathological half-init
paths).
Source code in src/chamber/envs/stage1_obs_filter.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
__init__
__init__(env: Env[Any, Any]) -> None
Detect AS-condition envs at construction (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
observation
observation(observation: dict[str, Any]) -> dict[str, Any]
Inject synthesised state per agent (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
Stage1OMChannelFilter
Bases: ObservationWrapper
OM-axis per-condition channel filter (ADR-007 §Stage 1b).
Reads the inner env's condition_id at construction time and
captures the resulting filter policy. The wrapper does not re-read
condition_id on every step — Stage-1b envs are built once per
(seed, condition) cell (see stage1_as.py:253-275) and the
condition is fixed for the env's lifetime.
Parameters:
-
env(Env[Any, Any]) –Inner env exposing a
condition_idattribute. The :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnvclass satisfies this; the wrapper falls back to a pass-through identity ifcondition_idis missing or not one of the two OM strings.
Raises:
-
TypeError–If the inner env's observation space is not a :class:
gym.spaces.Dict(the wrapper relies on dict-shaped obs).
Source code in src/chamber/envs/stage1_obs_filter.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 | |
condition_id
property
condition_id: str | None
ADR-007 §Stage 1b OM axis: the condition_id resolved at construction.
__getattr__
__getattr__(name: str) -> Any
Forward attribute access to the inner env (Tier-2 caller contract).
Mirrors :meth:Stage1ASStateSynthesizer.__getattr__: Gymnasium 1.3
removed :class:gym.Wrapper's implicit forwarding, so callers of
:func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env
— which now applies this wrapper on top of the AS synthesizer —
would otherwise lose access to SAPIEN-env attributes such as
env.agent, env.obs_mode, env.condition_config.
Source code in src/chamber/envs/stage1_obs_filter.py
335 336 337 338 339 340 341 342 343 344 345 346 347 | |
__init__
__init__(env: Env[Any, Any]) -> None
Store the per-condition filter policy (ADR-007 §Stage 1b).
Source code in src/chamber/envs/stage1_obs_filter.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | |
observation
observation(observation: dict[str, Any]) -> dict[str, Any]
Apply the per-condition channel mask (ADR-007 §Stage 1b).
OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
and the obs["extra"]["force_torque"] channel; keeps
obs["extra"]["tcp_pose"] + obs["extra"]["goal_pos"]
(the minimal task-relevant fields) plus obs["sensor_data"]
(RGB-D camera) untouched. Other conditions pass through.
Source code in src/chamber/envs/stage1_obs_filter.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
Custom errors for CHAMBER environment wrappers.
ADR-001 §Risks — wrappers raise this error rather than silently falling back when a wrapped env does not satisfy the wrapper's structural assumptions.
ChamberEnvCompatibilityError
Bases: RuntimeError
Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.
ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate the gate; failing loudly forces ADR re-review.
Source code in src/chamber/envs/errors.py
11 12 13 14 15 16 | |
chamber.agents
CHAMBER-side agent extensions (ADR-001 §Decision; ADR-007 §Stage 1b).
Hosts robot-agent classes the project needs in addition to ManiSkill v3's built-in registry. Currently:
- :class:
PandaPartner— a 7-DOF Franka withuid="panda_partner", used by the Stage-1 AS-homogeneous condition pair(panda_wristcam, panda_partner)so the two-panda case resolves to distinct uid strings under the :func:chamber.envs._sapien_compat.load_agent_with_bare_uidsstrip- suffix path (otherwise("panda_wristcam", "panda_wristcam")would collapse to a single dict entry post-strip). - :class:
PandaJacobianProvider— wraps :mod:pytorch_kinematicsto expose the 3x7 linear end-effector Jacobian :math:J(q) \\in \\mathbb{R}^{3 \\times 7}that :class:concerto.safety.api.JacobianControlModelneeds (ADR-004 §Decision; spike_004A §Per-agent control model).
Wrapper-only discipline (ADR-001 §Decision (wrapper-only)):
:class:PandaPartner is registered via the public
mani_skill.agents.registration.register_agent decorator. No
mani_skill/ source patching; no _* private imports.
Tier-1 safety: this module's top-level imports tolerate missing
ManiSkill/SAPIEN at module-load time. The :class:PandaPartner
registration side-effect is deferred behind a try/except so
python -c "import chamber.agents" succeeds on a Vulkan-less host —
:class:chamber.envs.errors.ChamberEnvCompatibilityError only fires at
env-construction time, never at module-import time (mirrors the
stage0_smoke pattern; ADR-001 §Risks).
PandaJacobianProvider
Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).
Parses the panda URDF once at construction (via
:func:pytorch_kinematics.build_serial_chain_from_urdf) and caches
the resulting :class:pytorch_kinematics.SerialChain. Each
:meth:__call__ evaluates :math:J(q) on the cached chain.
Parameters:
-
urdf_path(str | None, default:None) –Absolute path to the panda URDF on disk. The default (
None) selects the wristcam URDF ({PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf) since the Stage-1b panda is by convention the wristcam variant; passpanda_v2.urdfto target the no-camera URDF used by :class:PandaPartner. -
ee_link(str, default:'panda_hand_tcp') –Name of the end-effector link the chain terminates at. Defaults to
"panda_hand_tcp"— the tool-center-point link the ManiSkill PickCube task uses forself.agent.tcp_pose(seemani_skill/envs/tasks/tabletop/pick_cube.py:136). The serial chain extracted with this end-link has exactly 7 joints (the 7 arm joints; the two gripper-finger joints are siblings ofpanda_hand, not in the chain).
Raises:
-
FileNotFoundError–If
urdf_pathdoes not exist on disk. -
ValueError–If :func:
pytorch_kinematics.build_serial_chain_from_urdffails to extract a chain ending atee_link.
Source code in src/chamber/agents/panda_jacobian.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
ee_link
property
ee_link: str
The end-effector link the chain terminates at (read-only).
__call__
__call__(
snap: AgentSnapshot, qpos_arm: NDArray[floating]
) -> NDArray[np.float64]
Compute the 3x7 linear end-effector Jacobian at qpos_arm (ADR-004 §Decision).
Parameters:
-
snap(AgentSnapshot) –Current :class:
AgentSnapshot— unused at the provider level (the Jacobian is a function of joint state, not Cartesian state); accepted for Protocol conformance with :class:concerto.safety.api.JacobianControlModel.jacobian_fn. See module docstring for the closure-via-env-reference rationale and the contingent P1.05.5 follow-up that folds qpos into :class:AgentSnapshotdirectly. -
qpos_arm(NDArray[floating]) –Joint-angle vector for the 7 panda arm joints, in the same order as
mani_skill.agents.robots.panda.Panda.arm_joint_names. Shape(7,); dtype float-coercible tofloat32.
Returns:
-
NDArray[float64]–Linear Jacobian rows of the 6x7 SE(3) Jacobian at
-
NDArray[float64]–qpos_arm, shape(3, 7), dtypefloat64(matches -
the(NDArray[float64]) –class:
JacobianControlModelarithmetic contract at -
NDArray[float64]–concerto/safety/api.py:560-561).
Raises:
-
ValueError–If
qpos_armshape mismatches(7,).
Source code in src/chamber/agents/panda_jacobian.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
__init__
__init__(
urdf_path: str | None = None,
*,
ee_link: str = "panda_hand_tcp",
) -> None
Parse the URDF and cache the serial chain (ADR-007 §Stage 1b).
Source code in src/chamber/agents/panda_jacobian.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
PandaPartner
Bases: Panda
Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).
Inherits every Panda hook (arm joints, gripper joints, TCP pose,
grasp / static predicates, controller configs) from
:class:mani_skill.agents.robots.panda.panda.Panda. The only
difference is the URDF: :class:PandaPartner uses panda_v2.urdf
(no on-gripper camera mount), since the AS-homo partner is driven by
the scripted-heuristic / trained policy rather than by camera-fed
perception (the ego owns the camera via panda_wristcam).
Source code in src/chamber/agents/panda_partner.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
PandaPartner — second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).
The Stage-1 AS axis pre-registration (spikes/preregistration/AS.yaml,
git tag prereg-stage1-AS-2026-05-15) names the homogeneous
condition's agent tuple as ("panda_wristcam", "panda_partner").
ManiSkill v3.0.1's built-in agent registry covers panda,
panda_wristcam, panda_stick, fetch, and the Allegro hands
(see mani_skill.agents.registration.REGISTERED_AGENTS) — there is
no panda_partner. The strip-suffix path in
:func:chamber.envs._sapien_compat.load_agent_with_bare_uids would
collapse two panda_wristcam entries to a single agents_dict
key, so the AS-homo "two pandas" case needs a second, distinct uid.
This module defines that uid: a thin :class:Panda subclass mounted on
the no-wristcam panda_v2.urdf (the AS-homo partner doesn't need its
own perception — the ego owns the camera mount via panda_wristcam).
Wrapper-only discipline (ADR-001 §Decision (wrapper-only)):
:class:PandaPartner is registered via the public
@register_agent() decorator from
mani_skill.agents.registration. No mani_skill/ source patching;
no _* private imports. The base :class:Panda class's
controller_configs, arm_joint_names, gripper_joint_names,
tcp_pose property, is_grasping, and is_static are inherited
unchanged — the only difference from panda_wristcam is the URDF
choice (no on-gripper camera mount).
References:
- ADR-007 §Stage 1b (P1.03 introduces the partner agent for the AS axis).
- spikes/preregistration/AS.yaml (the pre-registration that names
the ("panda_wristcam", "panda_partner") tuple).
- mani_skill.agents.robots.panda.panda_wristcam.PandaWristCam (the
parallel uid; pattern this class mirrors).
PandaPartner
Bases: Panda
Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).
Inherits every Panda hook (arm joints, gripper joints, TCP pose,
grasp / static predicates, controller configs) from
:class:mani_skill.agents.robots.panda.panda.Panda. The only
difference is the URDF: :class:PandaPartner uses panda_v2.urdf
(no on-gripper camera mount), since the AS-homo partner is driven by
the scripted-heuristic / trained policy rather than by camera-fed
perception (the ego owns the camera via panda_wristcam).
Source code in src/chamber/agents/panda_partner.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |
Panda 7-DOF end-effector Jacobian provider (ADR-004 §Decision; ADR-007 §Stage 1b).
Wraps :mod:pytorch_kinematics (already a transitive dep of
mani-skill==3.0.1 — :mod:mani_skill.agents.controllers.utils.kinematics
uses it for built-in IK; P1.03 promotes it to a load-bearing direct dep in
pyproject.toml so :class:PandaJacobianProvider has a stable surface
under wrapper-only discipline) to compute the analytical 3x7 linear
end-effector Jacobian :math:J(q) \\in \\mathbb{R}^{3 \\times 7} the
:class:concerto.safety.api.JacobianControlModel needs.
Why the 3x7 linear slice rather than the full 6x7 SE(3) Jacobian:
:class:concerto.safety.cbf_qp.AgentSnapshot carries Cartesian
position (shape (d,), typically d == 3) and velocity
only — no orientation channel. The CBF backbone's pairwise barrier
:math:h_{ij} is a Euclidean-distance constraint; angular Jacobian
rows would be dimensionally orthogonal to it. The damped-pseudoinverse
in :meth:JacobianControlModel.cartesian_accel_to_action
(Nakamura & Hanafusa 1986) operates on the 3x7 slice; consistency with
the safety-stack contract is the design constraint.
Note on the closure-via-env-reference workaround (Q3 mod 3 in the P1.03
design conversation):
:class:AgentSnapshot does not carry qpos (extension is sequenced
to slice P1.05.5 per ADR-004 §Open questions; the contingent trigger is
Stage-1b λ-telemetry saturation). The
:class:PandaJacobianProvider.__call__ signature therefore takes
(snap, qpos_arm) — the env's
:meth:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv.build_control_models
constructs a closure lambda snap: provider(snap, env._latest_qpos["panda_wristcam"])
that captures a fresh env reference per (seed, condition) cell
(safe because :func:chamber.benchmarks.stage1_as._run_axis_with_factories
builds the env once per cell and reuses it across the 20 evaluation
episodes within the cell — the closure stays valid for the cell's
lifetime; cited at stage1_as.py:253-275).
References:
- ADR-004 §Decision (the JacobianControlModel contract).
- ADR-004 §Open questions (the AgentSnapshot.qpos extension as the
long-term fix; slice P1.05.5).
- ADR-007 §Stage 1b (the slice introducing this provider).
- :class:concerto.safety.api.JacobianControlModel (the consumer).
- mani_skill.agents.controllers.utils.kinematics.Kinematics (the
precedent for using pytorch_kinematics in ManiSkill's own IK).
PandaJacobianProvider
Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).
Parses the panda URDF once at construction (via
:func:pytorch_kinematics.build_serial_chain_from_urdf) and caches
the resulting :class:pytorch_kinematics.SerialChain. Each
:meth:__call__ evaluates :math:J(q) on the cached chain.
Parameters:
-
urdf_path(str | None, default:None) –Absolute path to the panda URDF on disk. The default (
None) selects the wristcam URDF ({PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf) since the Stage-1b panda is by convention the wristcam variant; passpanda_v2.urdfto target the no-camera URDF used by :class:PandaPartner. -
ee_link(str, default:'panda_hand_tcp') –Name of the end-effector link the chain terminates at. Defaults to
"panda_hand_tcp"— the tool-center-point link the ManiSkill PickCube task uses forself.agent.tcp_pose(seemani_skill/envs/tasks/tabletop/pick_cube.py:136). The serial chain extracted with this end-link has exactly 7 joints (the 7 arm joints; the two gripper-finger joints are siblings ofpanda_hand, not in the chain).
Raises:
-
FileNotFoundError–If
urdf_pathdoes not exist on disk. -
ValueError–If :func:
pytorch_kinematics.build_serial_chain_from_urdffails to extract a chain ending atee_link.
Source code in src/chamber/agents/panda_jacobian.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
ee_link
property
ee_link: str
The end-effector link the chain terminates at (read-only).
__call__
__call__(
snap: AgentSnapshot, qpos_arm: NDArray[floating]
) -> NDArray[np.float64]
Compute the 3x7 linear end-effector Jacobian at qpos_arm (ADR-004 §Decision).
Parameters:
-
snap(AgentSnapshot) –Current :class:
AgentSnapshot— unused at the provider level (the Jacobian is a function of joint state, not Cartesian state); accepted for Protocol conformance with :class:concerto.safety.api.JacobianControlModel.jacobian_fn. See module docstring for the closure-via-env-reference rationale and the contingent P1.05.5 follow-up that folds qpos into :class:AgentSnapshotdirectly. -
qpos_arm(NDArray[floating]) –Joint-angle vector for the 7 panda arm joints, in the same order as
mani_skill.agents.robots.panda.Panda.arm_joint_names. Shape(7,); dtype float-coercible tofloat32.
Returns:
-
NDArray[float64]–Linear Jacobian rows of the 6x7 SE(3) Jacobian at
-
NDArray[float64]–qpos_arm, shape(3, 7), dtypefloat64(matches -
the(NDArray[float64]) –class:
JacobianControlModelarithmetic contract at -
NDArray[float64]–concerto/safety/api.py:560-561).
Raises:
-
ValueError–If
qpos_armshape mismatches(7,).
Source code in src/chamber/agents/panda_jacobian.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
__init__
__init__(
urdf_path: str | None = None,
*,
ee_link: str = "panda_hand_tcp",
) -> None
Parse the URDF and cache the serial chain (ADR-007 §Stage 1b).
Source code in src/chamber/agents/panda_jacobian.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
chamber.comm
CHAMBER communication stack — fixed-format channel + URLLC degradation.
ADR-003 §Decision (mandatory fixed-format channel + opt-in learned overlay); ADR-006 §Decision (URLLC-anchored numeric bounds for latency, jitter, drop).
Public surface:
- :class:
CommChannel— Protocol (re-exported from :mod:chamber.comm.api). - :class:
CommPacket, :class:Pose, :class:TaskStatePredicate, :data:SCHEMA_VERSION— wire format. - :class:
FixedFormatCommChannel— the mandatory fixed-format encoder/decoder. - :class:
AoIClock— per-uid Age-of-Information accounting. - :class:
ChamberCommError, :class:ChamberCommQPSaturationWarning— error and warning types.
Subsequent PRs (T2.4-T2.6) add the learned-overlay slot, the
:class:CommDegradationWrapper, and the URLLC profile table.
AoIClock
Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).
Usage::
clock = AoIClock()
clock.mark_fresh("panda") # registers "panda" with AoI=0
clock.tick(0.01) # +10 ms for every registered uid
clock.aoi("panda") # -> 0.01
Uids are auto-registered on first :meth:mark_fresh. :meth:reset
clears all registrations so a fresh episode begins from a clean slate.
Source code in src/chamber/comm/aoi.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
__init__
__init__() -> None
Create an empty clock with no uids registered (ADR-003 §Decision).
Source code in src/chamber/comm/aoi.py
34 35 36 | |
aoi
aoi(uid: str) -> float
Return the current AoI for uid (ADR-003 §Decision).
Raises:
-
KeyError–if
uidis not registered (loud-fail per ADR-003).
Source code in src/chamber/comm/aoi.py
71 72 73 74 75 76 77 | |
mark_fresh
mark_fresh(uid: str) -> None
Mark uid as having received a fresh sample (AoI -> 0).
ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
Auto-registers uid on first call.
Source code in src/chamber/comm/aoi.py
47 48 49 50 51 52 53 | |
reset
reset() -> None
Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).
After reset, :meth:aoi raises KeyError for every uid until
the next :meth:mark_fresh. This matches the channel's
CommChannel.reset semantics: a new episode starts with no state.
Source code in src/chamber/comm/aoi.py
38 39 40 41 42 43 44 45 | |
snapshot
snapshot() -> dict[str, float]
Return an independent copy of the per-uid AoI map (ADR-003 §Decision).
The returned dict is decoupled from the clock's internal state, so
callers may freely mutate it (typical use: pack into the next
CommPacket["aoi"] field).
Source code in src/chamber/comm/aoi.py
79 80 81 82 83 84 85 86 | |
tick
tick(dt: float) -> None
Advance time by dt for every registered uid (ADR-003 §Decision).
Parameters:
-
dt(float) –Elapsed simulation time in seconds. Must be non-negative; AoI cannot run backwards (notes/tier2/41 §V definition).
Raises:
-
ValueError–when
dtis negative.
Source code in src/chamber/comm/aoi.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
ChamberCommError
Bases: RuntimeError
Raised when a comm-channel contract is violated (ADR-003 §Decision).
Examples include malformed packets (missing schema_version,
unrecognised packet keys), unsupported profile lookups, and channel
state that cannot be reconciled (e.g., decoding a packet whose
schema version differs from the channel's).
Source code in src/chamber/comm/errors.py
13 14 15 16 17 18 19 20 | |
ChamberCommQPSaturationWarning
Bases: UserWarning
Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).
The QP-saturation guard test in
tests/property/test_qp_saturation.py sweeps the
:data:chamber.comm.profiles.URLLC_3GPP_R17 table and asserts that
the inner CBF QP solve time stays below 1 ms (OSCBF target from
ADR-004). If the most aggressive profile saturates, the wrapper
raises this warning rather than silently allowing it (ADR-006
Risk register R5).
Source code in src/chamber/comm/errors.py
23 24 25 26 27 28 29 30 31 32 33 | |
CommChannel
Bases: Protocol
Contract for an inter-agent communication channel (ADR-003 §Decision).
Implementations encode the env-side state into a :class:CommPacket and
decode it back into the dict subset that downstream consumers (safety
filter M3, partner stack M4, evaluation harness M5) expect under
obs["comm"]. ADR-006 §Decision: implementations honour the
URLLC-anchored latency / jitter / drop bounds when composed with
:class:chamber.comm.degradation.CommDegradationWrapper.
Source code in src/chamber/comm/api.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).
Parameters:
-
packet(CommPacket) –A :class:
CommPacketpreviously produced by :meth:encode.
Returns:
-
dict[str, object]–The schema-typed subset of the env state expressible via the
-
dict[str, object]–packet (pose / task-state predicate / AoI), keyed by uid.
Source code in src/chamber/comm/api.py
113 114 115 116 117 118 119 120 121 122 123 | |
encode
encode(state: object) -> CommPacket
Encode env state into a comm-channel packet (ADR-003 §Decision).
Parameters:
-
state(object) –Env-side observation or raw physics state (impl-defined).
Returns:
-
A(CommPacket) –class:
CommPackethonouring :data:SCHEMA_VERSION.
Source code in src/chamber/comm/api.py
102 103 104 105 106 107 108 109 110 111 | |
reset
reset(*, seed: int | None = None) -> None
Reset channel state at episode start (ADR-003 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional root seed for the channel's deterministic RNG substream (P6 reproducibility). Implementations route this through
concerto.training.seeding.derive_substream.
Source code in src/chamber/comm/api.py
92 93 94 95 96 97 98 99 100 | |
CommDegradationStats
dataclass
Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).
Each delivered packet contributes one entry to latency_ticks
(the in-queue dwell time, in ticks). Each dropped packet increments
dropped without producing a latency entry.
Source code in src/chamber/comm/degradation.py
78 79 80 81 82 83 84 85 86 87 88 89 | |
CommDegradationWrapper
Compose-time degradation around a fixed-format channel (ADR-006 §Decision).
Wraps any :class:~chamber.comm.api.CommChannel and injects
URLLC-anchored latency / jitter / drop semantics described above. The
inner channel is unaware of degradation; the wrapper is itself a
:class:CommChannel so it composes transparently with
:class:chamber.envs.comm_shaping.CommShapingWrapper.
Parameters:
-
channel(CommChannel) –The inner channel whose packets will be degraded.
-
profile(DegradationProfile) –The degradation envelope (see :class:
DegradationProfile). -
tick_period_ms(float, default:1.0) –How many milliseconds each
encodecall represents. Defaults to1.0so the URLLC ms-denominated profiles map tick-for-millisecond. -
root_seed(int, default:0) –Root seed for the wrapper's RNG (P6 determinism).
Source code in src/chamber/comm/degradation.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
stats
property
stats: CommDegradationStats
Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM).
__init__
__init__(
channel: CommChannel,
profile: DegradationProfile,
*,
tick_period_ms: float = 1.0,
root_seed: int = 0,
) -> None
Build the wrapper (ADR-006 §Decision).
Source code in src/chamber/comm/degradation.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Delegate decoding to the inner channel (ADR-003 §Decision).
Source code in src/chamber/comm/degradation.py
196 197 198 | |
encode
encode(state: object) -> CommPacket
Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).
See module docstring for the per-step behaviour spec.
Source code in src/chamber/comm/degradation.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
reset
reset(*, seed: int | None = None) -> None
Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional new root seed; otherwise the previously stored root seed is reused so episode resets stay reproducible.
Source code in src/chamber/comm/degradation.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
CommPacket
Bases: TypedDict
The mandatory fixed-format wire packet (ADR-003 §Decision).
Every concrete :class:CommChannel produces and consumes packets of
exactly this shape. Downstream:
- The safety filter (M3) reads
packet["pose"][uid]andpacket["aoi"][uid]deterministically. - The partner stack (M4) writes
packet["pose"][uid] = self.last_pose. - The evaluation harness (M5) logs from the same surface.
ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.
Source code in src/chamber/comm/api.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
DegradationProfile
dataclass
Latency / jitter / drop bounds for a single experimental condition.
ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial
trial corpus. The named profiles in :mod:chamber.comm.profiles cover the
Phase-0 Stage-2 CM sweep table.
Parameters:
-
latency_mean_ms(float) –Mean of the Gaussian latency draw (ms).
-
latency_std_ms(float) –Std of the Gaussian latency draw (ms); the draw is clipped to
[0, 2 * latency_mean_ms]. -
drop_rate(float) –Bernoulli drop probability per packet (per encode call).
Source code in src/chamber/comm/degradation.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
FixedFormatCommChannel
Mandatory fixed-format channel (ADR-003 §Decision).
On every :meth:encode call:
1. The internal :class:AoIClock advances by one env tick (+1.0).
2. Every uid present in state["pose"] / state["task_state"] is
:meth:AoIClock.mark_fresh-ed (its AoI drops to zero).
3. The result is bundled into a :class:CommPacket carrying
:data:~chamber.comm.api.SCHEMA_VERSION and a snapshot of every
registered uid's AoI.
:meth:decode is the inverse projection: it returns the schema-typed
subset of the state expressible via the packet (pose / task-state / AoI /
learned overlay). A schema-version mismatch raises
:class:~chamber.comm.errors.ChamberCommError.
Determinism: an internal numpy Generator is seeded via
:func:concerto.training.seeding.derive_substream and re-seeded on
:meth:reset. The encoder is currently fully deterministic from input;
the RNG is held for future randomised-encoding features (e.g. dropout on
optional fields) without changing the public surface.
Parameters:
-
schema_version(int, default:SCHEMA_VERSION) –Version stamp for emitted packets and the value decoders demand. Bumping it requires a new ADR (ADR-003 §"Schema evolution").
Source code in src/chamber/comm/fixed_format.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
__init__
__init__(*, schema_version: int = SCHEMA_VERSION) -> None
Build a fixed-format channel (ADR-003 §Decision).
Parameters:
-
schema_version(int, default:SCHEMA_VERSION) –Wire-format version stamp; defaults to the project-wide :data:
SCHEMA_VERSION.
Source code in src/chamber/comm/fixed_format.py
57 58 59 60 61 62 63 64 65 66 67 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Decode a :class:CommPacket back into the typed state subset (ADR-003 §Decision).
Round-trip guarantee: decode(encode(state)) reproduces every
schema-typed field of state (pose / task-state / learned overlay)
and adds the AoI snapshot the channel computed during encode.
Parameters:
-
packet(CommPacket) –A packet previously emitted by :meth:
encode(or another channel honouring the same :data:SCHEMA_VERSION).
Returns:
-
dict[str, object]–A dict with keys
"pose"/"task_state"/"aoi"/ -
dict[str, object]–"learned_overlay"(each a shallow copy where applicable).
Raises:
-
ChamberCommError–when
packet["schema_version"]differs from this channel's schema version (ADR-003 §"Schema evolution").
Source code in src/chamber/comm/fixed_format.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
encode
encode(state: object) -> CommPacket
Encode env state into a :class:CommPacket (ADR-003 §Decision).
Accepts any mapping with optional "pose" / "task_state" /
"learned_overlay" keys; non-mapping inputs are treated as empty
state (the channel simply ages every previously-registered uid).
Parameters:
-
state(object) –Source state mapping. Recognised keys:
"pose":dict[str, Pose]— fresh pose per uid."task_state":dict[str, TaskStatePredicate]— optional fresh predicate per uid."learned_overlay": optionaltorch.Tensorfor jointly-trained B6 partners; passed through unchanged.
Returns:
-
A(CommPacket) –class:
CommPacketcarrying the current schema version, a -
CommPacket–shallow copy of pose / task-state, the AoI snapshot, and the
-
CommPacket–opt-in learned overlay.
Source code in src/chamber/comm/fixed_format.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
reset
reset(*, seed: int | None = None) -> None
Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional new root seed for the channel substream (P6 determinism). When provided, replaces the previously stored root seed; when omitted, the existing root seed is reused so episode resets remain reproducible without a fresh seed argument.
Source code in src/chamber/comm/fixed_format.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
LearnedOverlay
Holds an optional learned-message tensor (ADR-003 §Decision).
The class is a thin slot wrapper: encode returns whatever tensor a
jointly-trained partner has set (or None), and decode is
null-safe so black-box consumers can call it unconditionally.
Parameters:
-
tensor(Tensor | None, default:None) –Initial overlay value. Defaults to
Noneso the slot is interoperable with the black-box partner contract out of the box.
Source code in src/chamber/comm/learned_overlay.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
tensor
property
tensor: Tensor | None
Return the currently held tensor, or None (ADR-003 §Decision).
__init__
__init__(tensor: Tensor | None = None) -> None
Build an overlay slot, optionally pre-populated (ADR-003 §Decision).
Source code in src/chamber/comm/learned_overlay.py
34 35 36 | |
decode
staticmethod
decode(
packet_overlay: Tensor | None,
) -> torch.Tensor | None
Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).
Black-box partners receive a packet whose learned_overlay is
None; jointly-trained partners receive the producer's tensor
unchanged. Either path returns without raising.
Source code in src/chamber/comm/learned_overlay.py
69 70 71 72 73 74 75 76 77 | |
encode
encode() -> torch.Tensor | None
No-op encoder: returns the held tensor (ADR-003 §Decision).
Phase-1 will replace this with the message GNN forward pass; for
Phase-0 the slot semantics are sufficient for the
:class:~chamber.comm.api.CommPacket learned_overlay field.
Source code in src/chamber/comm/learned_overlay.py
60 61 62 63 64 65 66 67 | |
reset
reset() -> None
Clear the overlay (ADR-003 §Decision; episode boundary).
Called by the channel on env reset so cross-episode message tensors do not leak across episode boundaries.
Source code in src/chamber/comm/learned_overlay.py
52 53 54 55 56 57 58 | |
set
set(tensor: Tensor | None) -> None
Replace the held tensor (ADR-003 §Decision).
Jointly-trained partners call this from their forward pass; black-box
partners never invoke it (or invoke it with None), preserving the
null-safety guarantee.
Source code in src/chamber/comm/learned_overlay.py
43 44 45 46 47 48 49 50 | |
Pose
Bases: TypedDict
SE(3) pose in the env's base frame (ADR-003 §Decision).
The fixed-format channel ships pose as the minimum every black-box partner can produce. Quaternion convention is wxyz (real part first), matching SAPIEN / ManiSkill v3.
Source code in src/chamber/comm/api.py
34 35 36 37 38 39 40 41 42 43 | |
TaskStatePredicate
Bases: TypedDict
Typed task-state predicates for richer coordination (ADR-003 §Decision).
The fixed-format extension field; all keys are optional so partners that do not produce them remain interoperable. Adding a new key is a schema bump.
Source code in src/chamber/comm/api.py
46 47 48 49 50 51 52 53 54 55 56 | |
saturation_guard
saturation_guard(
profile: DegradationProfile,
qp_solve_fn: Callable[
..., tuple[float, float]
] = solve_qp_stub,
*,
qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None
Check the inner CBF QP solver's headroom under profile (ADR-006 §Risks R5).
Two conditions trigger :class:~chamber.comm.errors.ChamberCommQPSaturationWarning:
- The measured QP solve time exceeds
qp_time_budget_ms(M3 enforces this on the real solver; the M2 stub returns immediately). - The profile is in the saturation regime per ADR-006 R5 — drop rate >= 10 % or latency mean >= 100 ms — even when timing is fine. The regime check exists so the test exists from M2 and M3 cannot regress it.
Parameters:
-
profile(DegradationProfile) –The :class:
DegradationProfilewhose feasibility against the inner CBF QP we are testing. -
qp_solve_fn(Callable[..., tuple[float, float]], default:solve_qp_stub) –The QP solver under test. Defaults to :func:
concerto.safety.solve_qp_stub; M3 swaps in the real solver. Must return a(decision, elapsed_seconds)pair. -
qp_time_budget_ms(float, default:_QP_TIME_BUDGET_MS) –Solve-time budget. Defaults to the OSCBF target from ADR-004 (1 ms).
Source code in src/chamber/comm/degradation.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | |
Public API for the CHAMBER comm stack — Protocol + packet shape + schema.
ADR-003 §Decision: defines the canonical wire format of the mandatory
fixed-format channel (pose per uid, task-state predicate per uid, AoI per uid)
plus an opt-in learned-overlay tensor field, and the
:class:CommChannel Protocol that every concrete channel implements.
ADR-006 §Decision: schema versioning protects the URLLC-anchored numeric
bounds against silent drift; bumping :data:SCHEMA_VERSION requires a new ADR.
The Protocol lives on the benchmark side (this module) rather than under
concerto.api because its signatures reference :class:CommPacket, which
is the benchmark's wire format. The dependency-direction rule (plan/10 §2)
forbids concerto.* from importing chamber.*; placing the Protocol
here keeps the rule intact.
CommChannel
Bases: Protocol
Contract for an inter-agent communication channel (ADR-003 §Decision).
Implementations encode the env-side state into a :class:CommPacket and
decode it back into the dict subset that downstream consumers (safety
filter M3, partner stack M4, evaluation harness M5) expect under
obs["comm"]. ADR-006 §Decision: implementations honour the
URLLC-anchored latency / jitter / drop bounds when composed with
:class:chamber.comm.degradation.CommDegradationWrapper.
Source code in src/chamber/comm/api.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).
Parameters:
-
packet(CommPacket) –A :class:
CommPacketpreviously produced by :meth:encode.
Returns:
-
dict[str, object]–The schema-typed subset of the env state expressible via the
-
dict[str, object]–packet (pose / task-state predicate / AoI), keyed by uid.
Source code in src/chamber/comm/api.py
113 114 115 116 117 118 119 120 121 122 123 | |
encode
encode(state: object) -> CommPacket
Encode env state into a comm-channel packet (ADR-003 §Decision).
Parameters:
-
state(object) –Env-side observation or raw physics state (impl-defined).
Returns:
-
A(CommPacket) –class:
CommPackethonouring :data:SCHEMA_VERSION.
Source code in src/chamber/comm/api.py
102 103 104 105 106 107 108 109 110 111 | |
reset
reset(*, seed: int | None = None) -> None
Reset channel state at episode start (ADR-003 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional root seed for the channel's deterministic RNG substream (P6 reproducibility). Implementations route this through
concerto.training.seeding.derive_substream.
Source code in src/chamber/comm/api.py
92 93 94 95 96 97 98 99 100 | |
CommPacket
Bases: TypedDict
The mandatory fixed-format wire packet (ADR-003 §Decision).
Every concrete :class:CommChannel produces and consumes packets of
exactly this shape. Downstream:
- The safety filter (M3) reads
packet["pose"][uid]andpacket["aoi"][uid]deterministically. - The partner stack (M4) writes
packet["pose"][uid] = self.last_pose. - The evaluation harness (M5) logs from the same surface.
ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.
Source code in src/chamber/comm/api.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
Pose
Bases: TypedDict
SE(3) pose in the env's base frame (ADR-003 §Decision).
The fixed-format channel ships pose as the minimum every black-box partner can produce. Quaternion convention is wxyz (real part first), matching SAPIEN / ManiSkill v3.
Source code in src/chamber/comm/api.py
34 35 36 37 38 39 40 41 42 43 | |
TaskStatePredicate
Bases: TypedDict
Typed task-state predicates for richer coordination (ADR-003 §Decision).
The fixed-format extension field; all keys are optional so partners that do not produce them remain interoperable. Adding a new key is a schema bump.
Source code in src/chamber/comm/api.py
46 47 48 49 50 51 52 53 54 55 56 | |
Per-uid Age-of-Information clock (T2.2).
ADR-003 §Decision: AoI is the latency proxy carried in every
:class:~chamber.comm.api.CommPacket. ADR-006 §Decision: AoI is the
URLLC-anchored latency-axis instrumentation for the ADR-007 Stage-2 CM
spike.
Semantics follow notes/tier2/41 (Ballotta & Talak 2024): for each uid,
AoI is the elapsed simulation time since the last fresh sample arrived.
mark_fresh(uid) resets that uid to zero; tick(dt) advances every
registered uid by dt. The clock holds no env-side dependency — wall
time is supplied by the caller (typically the env tick) so the class
remains pure-Python and trivially deterministic (P6).
AoIClock
Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).
Usage::
clock = AoIClock()
clock.mark_fresh("panda") # registers "panda" with AoI=0
clock.tick(0.01) # +10 ms for every registered uid
clock.aoi("panda") # -> 0.01
Uids are auto-registered on first :meth:mark_fresh. :meth:reset
clears all registrations so a fresh episode begins from a clean slate.
Source code in src/chamber/comm/aoi.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
__init__
__init__() -> None
Create an empty clock with no uids registered (ADR-003 §Decision).
Source code in src/chamber/comm/aoi.py
34 35 36 | |
aoi
aoi(uid: str) -> float
Return the current AoI for uid (ADR-003 §Decision).
Raises:
-
KeyError–if
uidis not registered (loud-fail per ADR-003).
Source code in src/chamber/comm/aoi.py
71 72 73 74 75 76 77 | |
mark_fresh
mark_fresh(uid: str) -> None
Mark uid as having received a fresh sample (AoI -> 0).
ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
Auto-registers uid on first call.
Source code in src/chamber/comm/aoi.py
47 48 49 50 51 52 53 | |
reset
reset() -> None
Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).
After reset, :meth:aoi raises KeyError for every uid until
the next :meth:mark_fresh. This matches the channel's
CommChannel.reset semantics: a new episode starts with no state.
Source code in src/chamber/comm/aoi.py
38 39 40 41 42 43 44 45 | |
snapshot
snapshot() -> dict[str, float]
Return an independent copy of the per-uid AoI map (ADR-003 §Decision).
The returned dict is decoupled from the clock's internal state, so
callers may freely mutate it (typical use: pack into the next
CommPacket["aoi"] field).
Source code in src/chamber/comm/aoi.py
79 80 81 82 83 84 85 86 | |
tick
tick(dt: float) -> None
Advance time by dt for every registered uid (ADR-003 §Decision).
Parameters:
-
dt(float) –Elapsed simulation time in seconds. Must be non-negative; AoI cannot run backwards (notes/tier2/41 §V definition).
Raises:
-
ValueError–when
dtis negative.
Source code in src/chamber/comm/aoi.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
Fixed-format communication channel — encoder + decoder (T2.3).
ADR-003 §Decision: this channel is the mandatory baseline every black-box
partner can speak. It encodes pose + task-state predicates + AoI per uid into
a schema-versioned :class:~chamber.comm.api.CommPacket, with an opt-in
learned_overlay slot that jointly-trained partners may fill.
ADR-006 §Decision: the channel itself ships no degradation. URLLC-anchored
latency / jitter / drop is composed externally via
:class:chamber.comm.degradation.CommDegradationWrapper (T2.5) so that the
encoding logic stays pure, deterministic, and trivially testable.
FixedFormatCommChannel
Mandatory fixed-format channel (ADR-003 §Decision).
On every :meth:encode call:
1. The internal :class:AoIClock advances by one env tick (+1.0).
2. Every uid present in state["pose"] / state["task_state"] is
:meth:AoIClock.mark_fresh-ed (its AoI drops to zero).
3. The result is bundled into a :class:CommPacket carrying
:data:~chamber.comm.api.SCHEMA_VERSION and a snapshot of every
registered uid's AoI.
:meth:decode is the inverse projection: it returns the schema-typed
subset of the state expressible via the packet (pose / task-state / AoI /
learned overlay). A schema-version mismatch raises
:class:~chamber.comm.errors.ChamberCommError.
Determinism: an internal numpy Generator is seeded via
:func:concerto.training.seeding.derive_substream and re-seeded on
:meth:reset. The encoder is currently fully deterministic from input;
the RNG is held for future randomised-encoding features (e.g. dropout on
optional fields) without changing the public surface.
Parameters:
-
schema_version(int, default:SCHEMA_VERSION) –Version stamp for emitted packets and the value decoders demand. Bumping it requires a new ADR (ADR-003 §"Schema evolution").
Source code in src/chamber/comm/fixed_format.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
__init__
__init__(*, schema_version: int = SCHEMA_VERSION) -> None
Build a fixed-format channel (ADR-003 §Decision).
Parameters:
-
schema_version(int, default:SCHEMA_VERSION) –Wire-format version stamp; defaults to the project-wide :data:
SCHEMA_VERSION.
Source code in src/chamber/comm/fixed_format.py
57 58 59 60 61 62 63 64 65 66 67 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Decode a :class:CommPacket back into the typed state subset (ADR-003 §Decision).
Round-trip guarantee: decode(encode(state)) reproduces every
schema-typed field of state (pose / task-state / learned overlay)
and adds the AoI snapshot the channel computed during encode.
Parameters:
-
packet(CommPacket) –A packet previously emitted by :meth:
encode(or another channel honouring the same :data:SCHEMA_VERSION).
Returns:
-
dict[str, object]–A dict with keys
"pose"/"task_state"/"aoi"/ -
dict[str, object]–"learned_overlay"(each a shallow copy where applicable).
Raises:
-
ChamberCommError–when
packet["schema_version"]differs from this channel's schema version (ADR-003 §"Schema evolution").
Source code in src/chamber/comm/fixed_format.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
encode
encode(state: object) -> CommPacket
Encode env state into a :class:CommPacket (ADR-003 §Decision).
Accepts any mapping with optional "pose" / "task_state" /
"learned_overlay" keys; non-mapping inputs are treated as empty
state (the channel simply ages every previously-registered uid).
Parameters:
-
state(object) –Source state mapping. Recognised keys:
"pose":dict[str, Pose]— fresh pose per uid."task_state":dict[str, TaskStatePredicate]— optional fresh predicate per uid."learned_overlay": optionaltorch.Tensorfor jointly-trained B6 partners; passed through unchanged.
Returns:
-
A(CommPacket) –class:
CommPacketcarrying the current schema version, a -
CommPacket–shallow copy of pose / task-state, the AoI snapshot, and the
-
CommPacket–opt-in learned overlay.
Source code in src/chamber/comm/fixed_format.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
reset
reset(*, seed: int | None = None) -> None
Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional new root seed for the channel substream (P6 determinism). When provided, replaces the previously stored root seed; when omitted, the existing root seed is reused so episode resets remain reproducible without a fresh seed argument.
Source code in src/chamber/comm/fixed_format.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
Opt-in learned-message overlay slot (T2.4).
ADR-003 §Decision (the second of the two channels — opt-in, null-safe);
ADR-003 §Risks risk-mitigation 1: black-box partners leave this slot as
None so the fixed-format channel alone is sufficient for ad-hoc
deployment. Jointly-trained partners (HetGPPO, CommFormer) populate it
with a message tensor.
Phase-0 ships only the slot itself plus a no-op encoder; the real learned-comm machinery (HetGPPO message GNN) is a Phase-1 concern.
LearnedOverlay
Holds an optional learned-message tensor (ADR-003 §Decision).
The class is a thin slot wrapper: encode returns whatever tensor a
jointly-trained partner has set (or None), and decode is
null-safe so black-box consumers can call it unconditionally.
Parameters:
-
tensor(Tensor | None, default:None) –Initial overlay value. Defaults to
Noneso the slot is interoperable with the black-box partner contract out of the box.
Source code in src/chamber/comm/learned_overlay.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
tensor
property
tensor: Tensor | None
Return the currently held tensor, or None (ADR-003 §Decision).
__init__
__init__(tensor: Tensor | None = None) -> None
Build an overlay slot, optionally pre-populated (ADR-003 §Decision).
Source code in src/chamber/comm/learned_overlay.py
34 35 36 | |
decode
staticmethod
decode(
packet_overlay: Tensor | None,
) -> torch.Tensor | None
Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).
Black-box partners receive a packet whose learned_overlay is
None; jointly-trained partners receive the producer's tensor
unchanged. Either path returns without raising.
Source code in src/chamber/comm/learned_overlay.py
69 70 71 72 73 74 75 76 77 | |
encode
encode() -> torch.Tensor | None
No-op encoder: returns the held tensor (ADR-003 §Decision).
Phase-1 will replace this with the message GNN forward pass; for
Phase-0 the slot semantics are sufficient for the
:class:~chamber.comm.api.CommPacket learned_overlay field.
Source code in src/chamber/comm/learned_overlay.py
60 61 62 63 64 65 66 67 | |
reset
reset() -> None
Clear the overlay (ADR-003 §Decision; episode boundary).
Called by the channel on env reset so cross-episode message tensors do not leak across episode boundaries.
Source code in src/chamber/comm/learned_overlay.py
52 53 54 55 56 57 58 | |
set
set(tensor: Tensor | None) -> None
Replace the held tensor (ADR-003 §Decision).
Jointly-trained partners call this from their forward pass; black-box
partners never invoke it (or invoke it with None), preserving the
null-safety guarantee.
Source code in src/chamber/comm/learned_overlay.py
43 44 45 46 47 48 49 50 | |
URLLC-anchored comm degradation wrapper (T2.5).
ADR-006 §Decision: URLLC-anchored numeric bounds — latency 1-100 ms,
jitter μs-10 ms, drop 10⁻⁶ to 10⁻² — are applied externally by this
wrapper rather than by the channel itself, so the encoding logic in
:class:chamber.comm.fixed_format.FixedFormatCommChannel stays pure and
testable. ADR-007 Stage-2 CM sweep ranges are exposed via
:data:chamber.comm.profiles.URLLC_3GPP_R17.
Behaviour (plan/02 §3.5):
- With probability
profile.drop_rate, the freshly-encoded packet is dropped; the wrapper returns the previous visible packet with all per-uid AoI values incremented by one tick. - Otherwise the packet is queued with a delay drawn from
Normal(latency_mean_ms, latency_std_ms)clipped to[0, 2 * latency_mean_ms]. Packets due at or before the current tick are released; the most-recently-due packet becomes the visible one, with its per-uid AoI incremented by the time it spent in queue.
Determinism (P6): every random draw is seeded via
concerto.training.seeding.derive_substream("comm.degrade", ...) so two
wrappers reset with the same seed produce byte-identical traces.
CommDegradationStats
dataclass
Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).
Each delivered packet contributes one entry to latency_ticks
(the in-queue dwell time, in ticks). Each dropped packet increments
dropped without producing a latency entry.
Source code in src/chamber/comm/degradation.py
78 79 80 81 82 83 84 85 86 87 88 89 | |
CommDegradationWrapper
Compose-time degradation around a fixed-format channel (ADR-006 §Decision).
Wraps any :class:~chamber.comm.api.CommChannel and injects
URLLC-anchored latency / jitter / drop semantics described above. The
inner channel is unaware of degradation; the wrapper is itself a
:class:CommChannel so it composes transparently with
:class:chamber.envs.comm_shaping.CommShapingWrapper.
Parameters:
-
channel(CommChannel) –The inner channel whose packets will be degraded.
-
profile(DegradationProfile) –The degradation envelope (see :class:
DegradationProfile). -
tick_period_ms(float, default:1.0) –How many milliseconds each
encodecall represents. Defaults to1.0so the URLLC ms-denominated profiles map tick-for-millisecond. -
root_seed(int, default:0) –Root seed for the wrapper's RNG (P6 determinism).
Source code in src/chamber/comm/degradation.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
stats
property
stats: CommDegradationStats
Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM).
__init__
__init__(
channel: CommChannel,
profile: DegradationProfile,
*,
tick_period_ms: float = 1.0,
root_seed: int = 0,
) -> None
Build the wrapper (ADR-006 §Decision).
Source code in src/chamber/comm/degradation.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
decode
decode(packet: CommPacket) -> dict[str, object]
Delegate decoding to the inner channel (ADR-003 §Decision).
Source code in src/chamber/comm/degradation.py
196 197 198 | |
encode
encode(state: object) -> CommPacket
Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).
See module docstring for the per-step behaviour spec.
Source code in src/chamber/comm/degradation.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
reset
reset(*, seed: int | None = None) -> None
Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).
Parameters:
-
seed(int | None, default:None) –Optional new root seed; otherwise the previously stored root seed is reused so episode resets stay reproducible.
Source code in src/chamber/comm/degradation.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
DegradationProfile
dataclass
Latency / jitter / drop bounds for a single experimental condition.
ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial
trial corpus. The named profiles in :mod:chamber.comm.profiles cover the
Phase-0 Stage-2 CM sweep table.
Parameters:
-
latency_mean_ms(float) –Mean of the Gaussian latency draw (ms).
-
latency_std_ms(float) –Std of the Gaussian latency draw (ms); the draw is clipped to
[0, 2 * latency_mean_ms]. -
drop_rate(float) –Bernoulli drop probability per packet (per encode call).
Source code in src/chamber/comm/degradation.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
saturation_guard
saturation_guard(
profile: DegradationProfile,
qp_solve_fn: Callable[
..., tuple[float, float]
] = solve_qp_stub,
*,
qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None
Check the inner CBF QP solver's headroom under profile (ADR-006 §Risks R5).
Two conditions trigger :class:~chamber.comm.errors.ChamberCommQPSaturationWarning:
- The measured QP solve time exceeds
qp_time_budget_ms(M3 enforces this on the real solver; the M2 stub returns immediately). - The profile is in the saturation regime per ADR-006 R5 — drop rate >= 10 % or latency mean >= 100 ms — even when timing is fine. The regime check exists so the test exists from M2 and M3 cannot regress it.
Parameters:
-
profile(DegradationProfile) –The :class:
DegradationProfilewhose feasibility against the inner CBF QP we are testing. -
qp_solve_fn(Callable[..., tuple[float, float]], default:solve_qp_stub) –The QP solver under test. Defaults to :func:
concerto.safety.solve_qp_stub; M3 swaps in the real solver. Must return a(decision, elapsed_seconds)pair. -
qp_time_budget_ms(float, default:_QP_TIME_BUDGET_MS) –Solve-time budget. Defaults to the OSCBF target from ADR-004 (1 ms).
Source code in src/chamber/comm/degradation.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | |
URLLC-anchored degradation sweep profiles (T2.6).
ADR-006 §Decision: latency 1-100 ms, jitter μs-10 ms, drop 10⁻⁶ to 10⁻²
(URLLC + 3GPP Release 17 + arXiv 2501.12792 industrial-trial anchors). The
six named profiles below cover the sweep range the ADR-007 Stage-2 CM spike
will reproduce; the saturation row is held aside so the QP-saturation
property test (T2.7) can prove or refute ADR-006 risk #5 (R5).
Profiles are evaluated in production by composition with
:class:chamber.comm.degradation.CommDegradationWrapper; the channel itself
ships no degradation (ADR-003 §Decision).
Comm-stack error and warning types.
ADR-003 §Decision (every concrete channel raises :class:ChamberCommError
on contract violations, never silently degrades). ADR-006 §Risks #5 / Risk
register R5 (the QP-saturation guard fires :class:ChamberCommQPSaturationWarning
when the URLLC sweep saturates the inner CBF QP solver).
ChamberCommError
Bases: RuntimeError
Raised when a comm-channel contract is violated (ADR-003 §Decision).
Examples include malformed packets (missing schema_version,
unrecognised packet keys), unsupported profile lookups, and channel
state that cannot be reconciled (e.g., decoding a packet whose
schema version differs from the channel's).
Source code in src/chamber/comm/errors.py
13 14 15 16 17 18 19 20 | |
ChamberCommQPSaturationWarning
Bases: UserWarning
Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).
The QP-saturation guard test in
tests/property/test_qp_saturation.py sweeps the
:data:chamber.comm.profiles.URLLC_3GPP_R17 table and asserts that
the inner CBF QP solve time stays below 1 ms (OSCBF target from
ADR-004). If the most aggressive profile saturates, the wrapper
raises this warning rather than silently allowing it (ADR-006
Risk register R5).
Source code in src/chamber/comm/errors.py
23 24 25 26 27 28 29 30 31 32 33 | |
chamber.partners
CHAMBER partner zoo — concrete implementations of the FrozenPartner Protocol.
ADR-009 §Decision (zoo construction); ADR-010 §Decision (FM partner selection); plan/04 §3 (architecture). Phase-0 ships:
- The :class:
FrozenPartnerProtocol + :class:PartnerSpecdataclass (:mod:chamber.partners.api). - The plug-in :func:
register_partnerdecorator + :func:load_partnerfactory + :func:list_registered(:mod:chamber.partners.registry). - :class:
PartnerBasewith the_FORBIDDEN_ATTRSshield enforcing the black-box AHT no-joint-training constraint at runtime (:mod:chamber.partners.interface). - :class:
ScriptedHeuristicPartner— Stage-3 draft-zoo entry #1 (:mod:chamber.partners.heuristic). - :class:
FrozenMAPPOPartner— Stage-3 draft-zoo entry #2 (T4.5; :mod:chamber.partners.frozen_mappo). Loads a torch state-dict via :func:concerto.training.checkpoints.load_checkpoint, freezes every parameter, runs inference under :func:torch.no_grad. - :class:
FrozenHARLPartner— Stage-3 draft-zoo entry #3 (T4.6; :mod:chamber.partners.frozen_harl). Loads the HARL HAPPO actor checkpoint produced by :class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer, rebuilds the :class:StochasticPolicywith the same args, freezes every parameter, runsdeterministic=Trueinference under :func:torch.no_grad. - :class:
OpenVLAPartnerand :class:CrossFormerPartnerPhase-1 stubs (:mod:chamber.partners.stubs) — ADR-010 §Decision Option B; both raise :class:NotImplementedErrorreferencing the Phase-1 ticket. - The Phase-0 :func:
make_phase0_draft_zoo(real) and the Phase-1 :func:select_zoostub (:mod:chamber.partners.selection).
The M4-gate integration test (T4.9) lands in a follow-up PR (plan/04 §1).
CrossFormerPartner
Bases: PartnerBase
CrossFormer frozen zero-shot generalist — Phase-1 stub (ADR-010 §Decision Option B).
Frozen across all 20 cross-embodiment classes; no per-task LoRA budget (ADR-010 §Decision: zero fine-tuning cost). Per ADR-010 §Risks the zero-shot deployment must pass a per-task ablation before use as a primary partner; the stub does not bypass that gate.
Source code in src/chamber/partners/stubs/crossformer.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Phase-1 stub — raises (ADR-010 §Decision Option B).
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation mapping.
-
deterministic(bool, default:True) –Accepted for Protocol conformance only.
Raises:
-
NotImplementedError–Always; the real action-chunk inference is Phase-1 work.
Source code in src/chamber/partners/stubs/crossformer.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
reset
reset(*, seed: int | None = None) -> NoReturn
Phase-1 stub — raises (ADR-010 §Decision Option B).
Parameters:
-
seed(int | None, default:None) –Accepted for Protocol conformance only.
Raises:
-
NotImplementedError–Always; the real zero-shot inference harness is Phase-1 work.
Source code in src/chamber/partners/stubs/crossformer.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
FrozenHARLPartner
Bases: PartnerBase
Frozen HARL HAPPO partner (ADR-009 §Decision; plan/04 §3.6).
Loads a HARL :class:StochasticPolicy checkpoint via
:func:concerto.training.checkpoints.load_checkpoint (which verifies
the sidecar SHA-256), freezes every parameter
(requires_grad=False), and runs inference under
:func:torch.no_grad with deterministic=True. The actor's layer
widths are inferred from the loaded state-dict tensor shapes.
Spec contract (plan/04 §3.6; ADR-009 §Decision):
spec.class_name == "frozen_harl".spec.weights_uriis thelocal://artifacts/<name>.ptURI.spec.extra["uid"]is the env-side uid the partner acts on; it indexesobs["agent"][uid]["state"]at action time.
Statefulness (plan/04 §2): the partner is stateless across episodes;
:meth:reset is a no-op once the checkpoint is loaded. Loading is
deferred to the first :meth:reset / :meth:act call so the
registry lookup in :func:chamber.partners.registry.load_partner
stays cheap.
Device placement (Phase-0): inference runs on CPU. HAPPO actor forward passes are microseconds-scale and not the project's throughput bottleneck (the OpenVLA partner stratum is); partner-side GPU placement is deferred to a future partner-config plane and would need its own ADR amendment.
Source code in src/chamber/partners/frozen_harl.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
__init__
__init__(spec: PartnerSpec) -> None
Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).
Parameters:
-
spec(PartnerSpec) –Identity-bearing handle. Reads
spec.extra["uid"]for the env-side agent uid andspec.weights_urifor the checkpoint URI. Construction does not touch disk.
Raises:
-
ValueError–If
spec.weights_uriisNoneorspec.extra["uid"]is missing / empty.
Source code in src/chamber/partners/frozen_harl.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Greedy HARL-actor forward pass under no-grad (ADR-009 §Decision).
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation mapping. Reads the ego's flat state from
obs["agent"][uid]["state"](plan/04 §3.4); other keys are ignored. -
deterministic(bool, default:True) –Accepted for Protocol conformance but ignored. Phase-0 pins the partner to the distribution mode so a frozen partner's actions stay decoupled from torch's global RNG (see module docstring).
Returns:
-
NDArray[floating]–np.float32action vector of lengthaction_dim(the -
NDArray[floating]–mean head's output dimension, inferred from the loaded
-
NDArray[floating]–checkpoint).
Raises:
-
ValueError–If
obs["agent"][uid]["state"]is missing or its length disagrees with the loaded actor's input dimension.
Source code in src/chamber/partners/frozen_harl.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 | |
reset
reset(*, seed: int | None = None) -> None
Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).
Parameters:
-
seed(int | None, default:None) –Accepted for :class:
~chamber.partners.api.FrozenPartnerProtocol conformance. The partner is fully deterministic given the loaded weights andobs; seed is ignored.
Source code in src/chamber/partners/frozen_harl.py
203 204 205 206 207 208 209 210 211 212 213 | |
FrozenMAPPOPartner
Bases: PartnerBase
Frozen shared-parameter MAPPO partner (ADR-009 §Decision; plan/04 §3.5).
Loads a :class:_MAPPOActor checkpoint via
:func:concerto.training.checkpoints.load_checkpoint (which verifies
the sidecar SHA-256), freezes every parameter
(requires_grad=False), and runs inference under
:func:torch.no_grad. The actor's layer widths are inferred from the
loaded state-dict tensor shapes — no architecture-metadata sidecar is
required (ADR-002 §Decisions: provenance lives on the sidecar; shape
lives in the tensors).
Spec contract (plan/04 §3.5; ADR-009 §Decision):
spec.class_name == "frozen_mappo".spec.weights_uriis thelocal://artifacts/<name>.ptURI.spec.extra["uid"]is the env-side uid the partner acts on; it indexesobs["agent"][uid]["state"]at action time.
Statefulness (plan/04 §2): the partner is stateless across episodes;
:meth:reset is a no-op once the checkpoint is loaded. Loading is
deferred to the first :meth:reset / :meth:act call so the
registry lookup in :func:chamber.partners.registry.load_partner
stays cheap and so test fixtures can construct + monkey-patch the
env var before any I/O happens.
Source code in src/chamber/partners/frozen_mappo.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | |
__init__
__init__(spec: PartnerSpec) -> None
Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).
Parameters:
-
spec(PartnerSpec) –Identity-bearing handle. Reads
spec.extra["uid"]for the env-side agent uid andspec.weights_urifor the checkpoint URI. Construction does not touch disk — the checkpoint is loaded lazily on first action / reset.
Raises:
-
ValueError–If
spec.weights_uriisNoneorspec.extra["uid"]is missing / empty.
Source code in src/chamber/partners/frozen_mappo.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Greedy MAPPO-actor forward pass under no-grad (ADR-009 §Decision).
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation mapping. Reads the ego's flat state from
obs["agent"][uid]["state"](plan/04 §3.4); other keys are ignored. -
deterministic(bool, default:True) –Accepted for Protocol conformance. The Phase-0 actor head is a deterministic linear layer with no sampled noise, so this kwarg has no effect.
Returns:
-
NDArray[floating]–np.float32action vector of lengthaction_dim(the -
NDArray[floating]–head's output dimension, inferred from the loaded
-
NDArray[floating]–checkpoint).
Raises:
-
ValueError–If
obs["agent"][uid]["state"]is missing or its length disagrees with the loaded actor's input dimension.
Source code in src/chamber/partners/frozen_mappo.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | |
reset
reset(*, seed: int | None = None) -> None
Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).
Parameters:
-
seed(int | None, default:None) –Accepted for :class:
~chamber.partners.api.FrozenPartnerProtocol conformance. The partner is fully deterministic given the loaded weights andobs; seed is ignored.
Source code in src/chamber/partners/frozen_mappo.py
187 188 189 190 191 192 193 194 195 196 | |
FrozenPartner
Bases: Protocol
A partner the ego agent treats as a black box (ADR-009 §Decision).
Implementations MUST NOT expose any update / train / learn methods at
runtime. The base class :class:chamber.partners.interface.PartnerBase
enforces this via a __getattr__ shield over _FORBIDDEN_ATTRS.
Type checkers reject Protocol-typed access to train / learn /
update_params because those names are not part of this Protocol.
Stateless across episodes: reset(*, seed) clears every internal
cache (plan/04 §2 "Partner state" — Phase-0 deliberately excludes
recurrent partner policies; if Phase 1 needs them, ADR amendment).
Source code in src/chamber/partners/api.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Map observation to action (ADR-009 §Decision; ADR-003 §Decision).
The observation mapping is the Gymnasium-conformant
{"agent": {uid: proprio}, "comm": CommPacket, "meta": {...}}
produced by :mod:chamber.envs wrappers. Partners read
obs["agent"][uid] for proprioception and obs["comm"] for the
ADR-003 fixed-format channel; they MUST NOT touch any other key.
The accepted type is :class:~collections.abc.Mapping rather than
:class:dict so that callers can pass concrete nested dicts without
running into Mapping-vs-dict invariance issues at the call site.
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation dict.
-
deterministic(bool, default:True) –Whether to sample greedily (default
True); stochastic policies passFalse. Scripted partners ignore.
Returns:
-
NDArray[floating]–Action vector for the partner's own uid, shape and dtype defined
-
NDArray[floating]–by the env's action space slice for that uid.
Source code in src/chamber/partners/api.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
reset
reset(*, seed: int | None = None) -> None
Reset internal episode state (ADR-009 §Decision; P6 reproducibility).
Parameters:
-
seed(int | None, default:None) –Optional root seed for the partner's deterministic RNG substream. Implementations route this through :func:
concerto.training.seeding.derive_substream.
Source code in src/chamber/partners/api.py
110 111 112 113 114 115 116 117 118 | |
OpenVLAPartner
Bases: PartnerBase
OpenVLA LoRA-adapted specialist — Phase-1 stub (ADR-010 §Decision Option B).
LoRA fine-tune protocol (ADR-010 §Decision; ADR-009 §Decision):
10-150 demos, single GPU at ≤7.0 GB VRAM, 256-bin-per-DoF action
tokenisation. The 5-15 Hz throughput ceiling is enforced at the env
wrapper layer (chamber.envs.action_repeat), not here.
Source code in src/chamber/partners/stubs/openvla.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Phase-1 stub — raises (ADR-010 §Decision Option B).
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation mapping.
-
deterministic(bool, default:True) –Accepted for Protocol conformance only.
Raises:
-
NotImplementedError–Always; the real 256-bin action-token inference is Phase-1 work.
Source code in src/chamber/partners/stubs/openvla.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
reset
reset(*, seed: int | None = None) -> NoReturn
Phase-1 stub — raises (ADR-010 §Decision Option B).
Parameters:
-
seed(int | None, default:None) –Accepted for Protocol conformance only.
Raises:
-
NotImplementedError–Always; the real LoRA inference harness is Phase-1 work.
Source code in src/chamber/partners/stubs/openvla.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
PartnerBase
Common base class for every concrete partner (ADR-009 §Consequences).
Subclasses MUST implement reset(*, seed) and
act(obs, *, deterministic) per the
:class:~chamber.partners.api.FrozenPartner Protocol. The base class
blocks any attribute lookup in :data:_FORBIDDEN_ATTRS so the AHT
no-joint-training constraint is enforced even when a careless caller
imports the underlying torch module directly.
Attributes:
-
spec(PartnerSpec) –The :class:
~chamber.partners.api.PartnerSpecthe partner was constructed from. Read-only by convention; the registry assumesClass(spec)is the only constructor signature.
Source code in src/chamber/partners/interface.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
__getattr__
__getattr__(name: str) -> NoReturn
Block forbidden attribute lookups (ADR-009 §Consequences).
__getattr__ is only invoked for names not found via the normal
attribute lookup path, so this method does not interfere with
legitimate methods defined on subclasses or with self.spec.
Parameters:
-
name(str) –Attribute name being looked up.
Raises:
-
AttributeError–Always — either with the ADR-009 message for forbidden names, or with the Pythonic-default message for anything else (so
hasattrworks as expected).
Source code in src/chamber/partners/interface.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
__init__
__init__(spec: PartnerSpec) -> None
Bind the partner spec (ADR-009 §Decision).
Parameters:
-
spec(PartnerSpec) –Identity-bearing handle from :class:
~chamber.partners.api.PartnerSpec. The constructor does no checkpoint I/O so subclasses can defer heavy loads until :meth:reset.
Source code in src/chamber/partners/interface.py
59 60 61 62 63 64 65 66 67 68 | |
PartnerSpec
dataclass
Identity-bearing spec used to build, load, and address a partner (ADR-009 §Decision).
The :attr:partner_id property is the stable hash that the M3 conformal
filter reads on every step (ADR-004 §risk-mitigation #2; ADR-006 risk #3).
Two specs that hash the same partner_id refer to the same logical partner;
a partner-set change between steps re-initialises the conformal lambda.
Attributes:
-
class_name(str) –Registry key — the string passed to :func:
chamber.partners.registry.register_partner. -
seed(int) –Training-time seed; for scripted partners use
0. -
checkpoint_step(int | None) –Training step at which the checkpoint was taken;
Nonefor scripted partners or for stubs. -
weights_uri(str | None) –local://or remote URI pointing at the frozen checkpoint;Nonefor scripted partners. -
extra(dict[str, str]) –Free-form string→string dict for task / embodiment / uid metadata. Read by the partner implementation, NOT by the hash.
Source code in src/chamber/partners/api.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
partner_id
property
partner_id: str
Stable 16-hex-char hash of (class_name, seed, checkpoint_step, weights_uri).
ADR-004 §risk-mitigation #2 / ADR-006 risk #3: surfaced on
obs["meta"]["partner_id"] so the conformal filter can re-initialise
lambda on partner identity change. The 16-char (64-bit) keyspace
is sufficient for the Phase-1 zoo target of 144 partners with vast
margin (plan/04 §8 Notes).
The hash material uses :func:repr of a tuple of the four identity
fields (in this fixed order) rather than f-string interpolation so
None cannot collide with the literal string "None" — a real
risk if a Phase-1 partner ever ships with weights_uri="None"
(ADR-006 risk #3 swap-detection contract). Reordering the fields
in :class:PartnerSpec would silently change every existing
partner_id; do not reorder without a new ADR.
Returns:
-
str–Lowercase 16-character hex digest of the SHA-256 of
-
str–repr((class_name, seed, checkpoint_step, weights_uri)). -
str–extrais intentionally excluded so per-task / per-embodiment -
str–metadata does not collide with the conformal filter's
-
str–swap-detection contract (plan/04 §3.1).
ScriptedHeuristicPartner
Bases: PartnerBase
Greedy planar-reach heuristic partner (ADR-009 §Consequences; plan/04 §3.4).
Reads the partner's own xy from obs["comm"]["pose"][uid]["xyz"] (ADR-003
§Decision) when available, else from obs["agent"][uid]["state"][:2].
Outputs a unit-clipped xy velocity toward the configured target xy.
The partner's own uid is read from spec.extra["uid"] and the target
from spec.extra["target_xy"] (default "0.0,0.0"). Action dim is
set via spec.extra["action_dim"] (default "2"); components beyond
the planar xy are zeroed.
Source code in src/chamber/partners/heuristic.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
__init__
__init__(spec: PartnerSpec) -> None
Bind the spec and parse the heuristic's tunables (ADR-009 §Decision).
Parameters:
-
spec(PartnerSpec) –Partner identity.
spec.extracontrols the policy:uidselects the env-side agent,target_xythe goal,action_dimthe output vector length.
Raises:
-
ValueError–If
spec.extra["target_xy"]is malformed oraction_dimis not a positive integer ≥ 2.
Source code in src/chamber/partners/heuristic.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
act
act(
obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]
Greedy planar-reach action toward spec.extra["target_xy"] (ADR-009 §Consequences).
Parameters:
-
obs(Mapping[str, object]) –Gymnasium observation mapping keyed by
"agent"and optionally"comm". ADR-003 §Decision: when"comm"is present, pose is read fromobs["comm"]["pose"][uid]. -
deterministic(bool, default:True) –Ignored; the heuristic is deterministic regardless.
Returns:
-
NDArray[floating]–Float32 action vector of length
spec.extra["action_dim"]; -
NDArray[floating]–components 0/1 hold the unit-clipped xy velocity toward target,
-
NDArray[floating]–remaining components are zero.
Source code in src/chamber/partners/heuristic.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
reset
reset(*, seed: int | None = None) -> None
No-op reset (ADR-009 §Decision; plan/04 §2 stateless across episodes).
Phase-0 heuristic carries no episode state. The seed is ignored
deliberately; the policy is fully deterministic from obs.
Parameters:
-
seed(int | None, default:None) –Accepted for Protocol conformance only.
Source code in src/chamber/partners/heuristic.py
112 113 114 115 116 117 118 119 120 121 | |
list_registered
list_registered() -> list[str]
Return the sorted list of registered partner class names (ADR-009 §Decision).
Used by the eval CLI to enumerate the available zoo strata and by tests to assert the Phase-0 draft zoo classes are wired up.
Returns:
-
list[str]–A new list, freshly sorted on every call.
Source code in src/chamber/partners/registry.py
91 92 93 94 95 96 97 98 99 100 | |
load_partner
load_partner(spec: PartnerSpec) -> FrozenPartner
Build a partner instance from its spec via the registry (ADR-009 §Decision).
Parameters:
-
spec(PartnerSpec) –The :class:
~chamber.partners.api.PartnerSpecwhose :attr:~chamber.partners.api.PartnerSpec.class_nameselects the registered class.
Returns:
-
FrozenPartner–A fresh :class:
~chamber.partners.api.FrozenPartnerinstance built -
FrozenPartner–by
Class(spec).
Raises:
-
KeyError–If
spec.class_nameis not registered. The error message lists the currently-registered keys to make typos easy to fix.
Source code in src/chamber/partners/registry.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
make_phase0_draft_zoo
make_phase0_draft_zoo() -> list[PartnerSpec]
Return the 3-partner Stage-3 draft zoo (ADR-009 §Consequences "draft-zoo scoping").
The draft zoo is the minimum viable zoo that lets the ADR-007 rev-3
Stage-3 PF spike measure trained-with vs frozen-novel ≥20pp gap and
instrument the partner-swap λ-reset transient (ADR-006 risk #3 / ADR-004
§risk-mitigation #2). All three specs target the Stage-0 smoke task so
they share an env shape; the per-partner uid matches the ADR-001
smoke robot tuple (panda_wristcam / fetch / allegro_hand_right).
The frozen-RL checkpoint URIs (local://artifacts/...) are produced
by M4b training runs. Both
:class:~chamber.partners.frozen_mappo.FrozenMAPPOPartner (T4.5,
registered as frozen_mappo) and
:class:~chamber.partners.frozen_harl.FrozenHARLPartner (T4.6,
registered as frozen_harl) load their specs via
:func:chamber.partners.registry.load_partner. The full M4 gate
(plan/04 §6 #1, T4.9 draft-zoo round-trip integration test) lands in
a follow-up PR.
Returns:
-
list[PartnerSpec]–A fresh list of 3 :class:
~chamber.partners.api.PartnerSpec -
list[PartnerSpec]–instances. Callers may append to the list freely; the function
-
list[PartnerSpec]–rebuilds it every time so the canonical zoo is immutable.
Source code in src/chamber/partners/selection.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
register_partner
register_partner(
class_name: str,
) -> Callable[[type[T]], type[T]]
Register a partner class by name (ADR-009 §Decision; plan/04 §3.2).
Used as a class decorator:
.. code-block:: python
@register_partner("scripted_heuristic")
class ScriptedHeuristicPartner(PartnerBase): ...
Parameters:
-
class_name(str) –Registry key. Must be unique process-wide; re-registering the same key raises :class:
ValueErrorso name collisions are caught at import time.
Returns:
-
Callable[[type[T]], type[T]]–A decorator that records the class against
class_nameand returns -
Callable[[type[T]], type[T]]–the class unchanged.
Raises:
-
ValueError–If
class_nameis already registered.
Source code in src/chamber/partners/registry.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
select_zoo
select_zoo(*args: object, **kwargs: object) -> NoReturn
FCP/MEP-style zoo construction — Phase-1 stub (ADR-009 §Decision).
The Phase-1 implementation is the three-axis pipeline (policy class x seed x checkpoint) with MEP Population Entropy as the per-stratum admission filter (ADR-009 §Validation criteria "By Phase-1 end").
Parameters:
-
*args(object, default:()) –Reserved for the Phase-1 signature.
-
**kwargs(object, default:{}) –Reserved for the Phase-1 signature.
Raises:
-
NotImplementedError–Always. Phase-0 callers must use :func:
make_phase0_draft_zooinstead.
Source code in src/chamber/partners/selection.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
chamber.evaluation
CHAMBER evaluation harness — HRS bundle, leaderboard, safety reports.
Implements the HRS bundle composition (Option D, axis-survival rule) from ADR-008 and consumes the three-table safety reporting format from ADR-014. Statistical tooling (cluster + paired-cluster bootstrap) and the pre-registration discipline (ADR-007 §Discipline) live in sub-modules of this package.
Public modules:
- :mod:
chamber.evaluation.results— Pydantic schema for spike runs and leaderboard entries (ADR-008 §Decision). - :mod:
chamber.evaluation.prereg— YAML loader + git-tag SHA verification (ADR-007 §Discipline). - :mod:
chamber.evaluation.bootstrap— cluster + paired-cluster bootstrap with rliable-compatible aggregate metrics (ADR-008 §Decision; reviewer P1-9). - :mod:
chamber.evaluation.hrs— HRS vector + scalar (ADR-008 §Decision; reviewer P1-8). - :mod:
chamber.evaluation.render— three-table safety report + leaderboard renderers (ADR-014 §Decision, ADR-008 §Decision).
BootstrapCI
dataclass
Bootstrap point estimate + 95% CI (ADR-008 §Decision).
Attributes:
-
iqm(float) –Interquartile mean of the resampled point estimates.
-
mean(float) –Plain arithmetic mean of the resampled point estimates.
-
ci_low(float) –2.5th percentile of the resampled estimates (lower bound of the 95% CI).
-
ci_high(float) –97.5th percentile of the resampled estimates (upper bound of the 95% CI).
-
n_resamples(int) –Number of bootstrap resamples that produced the interval.
Source code in src/chamber/evaluation/bootstrap.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
ConditionPair
Bases: BaseModel
Pre-registered homogeneous-vs-heterogeneous baseline pair (ADR-007 §Validation criteria).
Each Phase-0 spike runs at least one homogeneous baseline (the "control" side of the ≥20pp test) and at least one heterogeneous condition (the "treatment" side). Identifiers are free-form but must match the labels in the pre-registration YAML.
Attributes:
-
homogeneous_id(str) –Identifier of the homogeneous baseline.
-
heterogeneous_id(str) –Identifier of the heterogeneous condition.
Source code in src/chamber/evaluation/results.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
ConditionResult
Bases: BaseModel
Aggregated result for one ADR-007 axis (ADR-008 §Decision).
Carries the homogeneous-vs-heterogeneous success-rate gap and the
safety signals that feed the HRS-vector entry for this axis.
gap_pp is the ≥20pp gap statistic from ADR-007 §Validation
criteria; ci_low_pp / ci_high_pp are the cluster-bootstrap
95% bounds from :mod:chamber.evaluation.bootstrap.
Attributes:
-
axis(str) –ADR-007 axis name.
-
n_episodes(NonNegativeInt) –Total episodes contributing to this row.
-
homogeneous_success(NonNegativeFloat) –IQM success rate on the homogeneous side.
-
heterogeneous_success(NonNegativeFloat) –IQM success rate on the heterogeneous side.
-
gap_pp(float) –(homogeneous_success - heterogeneous_success) * 100— the percentage-point gap (ADR-007 §Validation criteria). -
ci_low_pp(float) –Lower bound of the 95% cluster-bootstrap CI on
gap_pp. -
ci_high_pp(float) –Upper bound of the 95% cluster-bootstrap CI on
gap_pp. -
violation_rate(NonNegativeFloat) –Per-step CBF-constraint violation rate across episodes (ADR-014 §Decision Table 2).
-
fallback_rate(NonNegativeFloat) –Fraction of episodes in which the braking fallback fired at least once (separate from
violation_rateper reviewer P0).
Source code in src/chamber/evaluation/results.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | |
EpisodeResult
Bases: BaseModel
A single evaluation episode of a CHAMBER spike (ADR-007 §Validation criteria).
The fields cover the three observable quantities the project
reports per ADR-014 §Decision Table 2: episode success, peak
constraint-violation magnitude, and fallback-fire count. The
force_peak slot is the SA-axis specific signal used in
ADR-014's Table 1 / Table 2 vendor-compliance rows when ADR-007
Stage-3 SA decomposes the safety axis.
Attributes:
-
seed(int) –Root seed for this episode (ADR-002 P6 determinism).
-
episode_idx(NonNegativeInt) –Zero-based episode index within the
(seed, condition)cell. -
initial_state_seed(int) –Sub-stream seed for the initial state reset, used to pair homogeneous-vs-heterogeneous episodes in :func:
chamber.evaluation.bootstrap.pacluster_bootstrap(ADR-007 §Validation criteria; reviewer P1-9). -
success(bool) –Trueiff the task succeeded per the env's terminal reward. -
constraint_violation_peak(float) –Worst-case CBF-constraint gap over the episode (ADR-014 §Decision Table 2 "constraint violation" column).
-
fallback_fired(NonNegativeInt) –Number of braking-fallback fires during the episode (ADR-014 §Decision Table 2 "fallback fired" column; separate from violations per reviewer P0).
-
force_peak(float | None) –Peak contact force in newtons;
Noneoutside the SA spike (ADR-007 Stage 3). -
metadata(dict[str, Any]) –Free-form per-episode metadata.
Source code in src/chamber/evaluation/results.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
HRSVector
Bases: BaseModel
Per-axis HRS vector emitted alongside the scalar (ADR-008 §Decision).
Reviewer P1-8 requires the vector to be emitted unconditionally so consumers can recompute the scalar under a different weighting; the leaderboard renderer enforces this by refusing entries that carry only the scalar.
Attributes:
-
entries(list[HRSVectorEntry]) –Per-axis entries in the ADR-008 default order (CM > PF > CR > SA > OM > AS), subject to axis survival.
Source code in src/chamber/evaluation/results.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
HRSVectorEntry
Bases: BaseModel
One axis row of the HRS vector (ADR-008 §Decision).
ADR-008 binds the HRS bundle to the surviving ADR-007 axes and
fixes the default ordering CM > PF > CR > SA > OM > AS. Each
entry carries the per-axis score plus the gap-PP that feeds
:func:chamber.evaluation.hrs.compute_hrs_scalar.
Attributes:
-
axis(str) –ADR-007 axis name.
-
score(NonNegativeFloat) –Normalised per-axis score in
[0, 1](success rate on the heterogeneous condition, by default). -
gap_pp(float) –Percentage-point gap from the matching :class:
ConditionResult(kept for traceability). -
weight(NonNegativeFloat) –Weight assigned to this axis in the scalar aggregation (defaults to ADR-008 Option D ordering).
Source code in src/chamber/evaluation/results.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
LeaderboardEntry
Bases: BaseModel
A single method/result pair as it appears on the CHAMBER leaderboard (ADR-008 §Decision).
Ranks by hrs_scalar. The accompanying :class:HRSVector is
emitted unconditionally per reviewer P1-8; the renderer in
:mod:chamber.evaluation.render displays the vector as a sub-row
below the scalar.
Attributes:
-
method_id(str) –Identifier of the method being evaluated (e.g.
"concerto-ego-aht-v0.1"). -
spike_runs(list[str]) –Spike-run IDs that contributed to this entry.
-
hrs_vector(HRSVector) –Per-axis HRS values (always emitted).
-
hrs_scalar(NonNegativeFloat) –Aggregated HRS scalar (ADR-008 §Decision; the ranking metric).
-
violation_rate(NonNegativeFloat) –Aggregate per-step violation rate across the contributing spike runs.
-
fallback_rate(NonNegativeFloat) –Aggregate fraction of episodes triggering the braking fallback (ADR-014 §Decision).
-
schema_version(int) –Result-archive schema version (default :data:
SCHEMA_VERSION).
Source code in src/chamber/evaluation/results.py
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
PairedEpisode
dataclass
One paired homogeneous-vs-heterogeneous episode (ADR-007 §Validation criteria).
Pairs are matched on (seed, episode_idx, initial_state_seed)
so the gap test in :func:pacluster_bootstrap is computed on
matched pairs rather than on pooled means (reviewer P1-9).
Attributes:
-
seed(int) –Shared root seed.
-
episode_idx(int) –Shared episode index within the seed.
-
initial_state_seed(int) –Shared env-reset sub-stream seed.
-
homogeneous(float) –Per-episode metric on the homogeneous side.
-
heterogeneous(float) –Per-episode metric on the heterogeneous side.
Source code in src/chamber/evaluation/bootstrap.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
PreregistrationError
Bases: RuntimeError
Raised when a pre-registration fails the ADR-007 §Discipline check (ADR-007 §Discipline).
Source code in src/chamber/evaluation/prereg.py
160 161 | |
PreregistrationSpec
Bases: BaseModel
Schema for an ADR-007 pre-registration YAML (ADR-007 §Discipline).
Required fields mirror the operational contract in
docs/reference/evaluation.md §3: the homogeneous-vs-
heterogeneous pair, the seed list, the episodes-per-seed budget,
the estimator, and the bootstrap method. The default bootstrap is
cluster per reviewer P1-9 — pooled IID bootstrap on
seed-clustered data understates the CI width.
Attributes:
-
axis(str) –One of the ADR-007 §3.4 axes (
"CR"/"AS"/"OM"/"CM"/"PF"/"SA"). -
condition_pair(ConditionPair) –Homogeneous-vs-heterogeneous baseline pair.
-
seeds(list[int]) –List of root seeds for ADR-002 P6 determinism.
-
episodes_per_seed(PositiveInt) –Episode budget per
(seed, condition)cell. -
estimator(str) –Identifier of the headline estimator (e.g.
"iqm_success_rate"). -
bootstrap_method(BootstrapMethod) –Default
"cluster"(reviewer P1-9);"hierarchical"is an alias accepted for back-compat;"iid"is allowed only for power calculations, not leaderboard entries (reviewer P1-5, enforced by :meth:_check_bootstrap_policy). -
failure_policy(FailurePolicy) –"strict"(the spike fails if any seed errors) or"best_effort"(errored seeds are dropped and reported). -
git_tag(str) –Pre-registration git tag this YAML is locked to (ADR-007 §Discipline; verified by :func:
verify_git_tag). -
notes(str) –Free-form notes carried into the leaderboard entry.
-
run_purpose(RunPurpose) –"leaderboard"(default),"power", or"debug"(reviewer P1-5). Only"leaderboard"runs are admitted to the public ranking; the other two values relax the iid-bootstrap ban so power calculations and local debug runs can still use a pooled iid baseline.
Source code in src/chamber/evaluation/prereg.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
normalised_axis
normalised_axis() -> str
Return the validated ADR-007 axis label (ADR-007 §Decision).
Raises:
-
ValueError–When :attr:
axisis not one of the ADR-007 §3.4 Option D shortlist.
Source code in src/chamber/evaluation/prereg.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
SlackAggregate
dataclass
Per-condition OSCBF slack aggregate (ADR-014 §Revision history 2026-05-16).
Carries the two scalar columns ADR-014 Table 2 v2 adds to
:class:concerto.safety.reporting.ConditionRow. The values are
aggregated by :func:aggregate_oscbf_slack over the per-step
:class:concerto.safety.oscbf.OSCBFResult stream within a single
ADR-014 Table 2 condition.
Attributes:
-
max_slack(float) –Maximum over steps
kin the condition ofOSCBFResult.max_slack— the worst single-step OSCBF relaxation observed across the condition. Non-negative by construction (the source signal is non-negative). -
slack_l2(float) –Mean over steps
kofOSCBFResult.slack_l2— the mean L2-norm of the per-step slack vector across the condition. Non-negative by construction.
Source code in src/chamber/evaluation/oscbf_aggregate.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
SpikeRun
Bases: BaseModel
Full record of one ADR-007 staged spike (ADR-007 §Discipline; ADR-016 §Decision).
Carries the pre-registration provenance (prereg_sha /
git_tag) so the leaderboard renderer can refuse entries that
do not match the locked YAML. The episode_results list is the
raw observation feed for
:mod:chamber.evaluation.bootstrap and
:mod:chamber.evaluation.hrs.
Attributes:
-
spike_id(str) –Free-form identifier (e.g.
"stage2_cm_urllc"). -
prereg_sha(str) –Git-blob SHA of the locked pre-registration YAML (ADR-007 §Discipline; verified by :func:
chamber.evaluation.prereg.verify_git_tag). -
git_tag(str) –Git tag the pre-registration was locked to (ADR-007 §Discipline).
-
axis(str) –Name of the ADR-007 heterogeneity axis under test (one of
"CR","AS","OM","CM","PF","SA"). -
sub_stage(SubStage) –ADR-007 §Implementation-staging sub-stage label (one of
"1a"/"1b"/"2"/"3"). Required, no default — see ADR-016 §Decision and §Rationale for why the silent default-by-axis fallback the v1 schema relied on was retired. -
condition_pair(ConditionPair) –Homogeneous-vs-heterogeneous pair.
-
seeds(list[int]) –List of root seeds used in this spike (ADR-002 P6).
-
episode_results(list[EpisodeResult]) –Flat list of per-episode results across all seeds and both pair sides.
-
schema_version(int) –Result-archive schema version (default :data:
SCHEMA_VERSION; currently 2 per ADR-016 §Decision).
Source code in src/chamber/evaluation/results.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
aggregate_metrics
aggregate_metrics(
values: Mapping[int, Sequence[float]],
*,
metrics: Sequence[str] = (
"iqm",
"optimality_gap",
"performance_profile",
),
threshold: float = 1.0,
) -> dict[str, float | None]
rliable-compatible aggregate metrics (ADR-008 §Decision; ADR-014 §Decision).
If the optional rliable package is installed, delegates to
its library.get_interval_estimates / aggregate_func
helpers. Otherwise, IQM and optimality gap are computed natively
and a runtime warning explains that performance profiles require
the optional extra. The native path is sufficient for unit tests
and the Phase-0 golden-file integration test.
Parameters:
-
values(Mapping[int, Sequence[float]]) –{seed: [per-episode metric]}mapping. -
metrics(Sequence[str], default:('iqm', 'optimality_gap', 'performance_profile')) –Subset of
{"iqm", "optimality_gap", "performance_profile"}to compute. -
threshold(float, default:1.0) –Reference score for optimality gap (default
1.0for normalised success rates).
Returns:
-
dict[str, float | None]–Dict keyed by metric name with float values; the
-
dict[str, float | None]–"performance_profile"slot isNonewhen the optional -
dict[str, float | None]–rliableextra is not installed.
Source code in src/chamber/evaluation/bootstrap.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
aggregate_oscbf_slack
aggregate_oscbf_slack(
results: Iterable[OSCBFResult],
) -> SlackAggregate
Aggregate per-step :class:OSCBFResult values into the ADR-014 contract.
Implements the locked aggregation contract from ADR-014 §Revision history 2026-05-16:
max_slackis the max over steps ofOSCBFResult.max_slack.slack_l2is the mean over steps ofOSCBFResult.slack_l2.
A consumed empty iterable produces SlackAggregate(0.0, 0.0) so
the result composes with the
:class:concerto.safety.reporting.ConditionRow defaults (the
Stage-1a EGO_ONLY path that does not exercise the OSCBF inner
filter; issue #142).
Parameters:
-
results(Iterable[OSCBFResult]) –Iterable of per-step :class:
concerto.safety.oscbf.OSCBFResultvalues from the steps within a single ADR-014 Table 2 condition. Order is irrelevant; the aggregator consumes the iterable exactly once.
Returns:
-
The(SlackAggregate) –class:
SlackAggregatecarrying the two ADR-014 v2 -
SlackAggregate–ConditionRow columns.
Source code in src/chamber/evaluation/oscbf_aggregate.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
cluster_bootstrap
cluster_bootstrap(
values: Mapping[int, Sequence[float]],
*,
n_resamples: int,
rng: Generator,
) -> BootstrapCI
Cluster bootstrap over {seed: [per-episode values]} (reviewer P1-9; ADR-008 §Decision).
Resamples seeds with replacement (the cluster level), then
resamples episodes with replacement within each resampled seed.
Returns IQM, plain mean, and the 95% CI on the IQM estimator.
This is the headline aggregator for ADR-014 §Decision Table 2
when the bootstrap method is "cluster".
Parameters:
-
values(Mapping[int, Sequence[float]]) –{seed: [per-episode metric]}mapping. Empty seeds are dropped from the resample but a fully empty input yields a degeneratenaninterval. -
n_resamples(int) –Number of bootstrap resamples (e.g.
2000). -
rng(Generator) –Deterministic
np.random.Generator(ADR-002 P6).
Returns:
-
BootstrapCI–class:
BootstrapCIwith the IQM, plain mean, and 95% bounds.
Source code in src/chamber/evaluation/bootstrap.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
compute_hrs_scalar
compute_hrs_scalar(
vector: HRSVector,
*,
weights: dict[str, float] | None = None,
) -> float
Aggregate the HRS vector into a single scalar (ADR-008 §Decision).
Defined as sum(w_i * s_i) / sum(w_i) over the entries in
vector — a weighted mean over the surviving axes, with the
ADR-008 Option D ordering as default. Returns 0.0 for an
empty vector (no surviving axes), which makes a method that
failed every ≥20pp test sort to the bottom of the leaderboard.
Parameters:
-
vector(HRSVector) –Per-axis HRS values from :func:
compute_hrs_vector. -
weights(dict[str, float] | None, default:None) –Override the per-axis weight map; if
None, the weights embedded invector.entriesare used (so the scalar matches the vector that ships with it).
Returns:
-
float–Scalar in
[0, 1].
Source code in src/chamber/evaluation/hrs.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
compute_hrs_vector
compute_hrs_vector(
condition_results: dict[str, ConditionResult],
*,
weights: dict[str, float] | None = None,
) -> HRSVector
Compute the per-axis HRS vector (ADR-008 §Decision; reviewer P1-8).
The per-axis score is the heterogeneous-side success rate
(normalised to [0, 1]). Axes missing from
condition_results are not emitted as zero-score entries —
ADR-007's axis-survival rule means missing axes are cut from the
headline, not silently scored as failure.
Parameters:
-
condition_results(dict[str, ConditionResult]) –{axis: ConditionResult}covering the surviving ADR-007 axes. -
weights(dict[str, float] | None, default:None) –Override the per-axis weight map; defaults to :data:
DEFAULT_AXIS_WEIGHTS(ADR-008 §Decision Option D).
Returns:
-
HRSVector–class:
HRSVectorwith one entry per axis in the canonical -
HRSVector–ADR-008 ordering (CM > PF > CR > SA > OM > AS), filtered to
-
HRSVector–the axes present in
condition_results.
Source code in src/chamber/evaluation/hrs.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
load_prereg
load_prereg(path: Path) -> PreregistrationSpec
Load and validate a pre-registration YAML (ADR-007 §Discipline).
Parameters:
-
path(Path) –Absolute path to the YAML file.
Returns:
-
PreregistrationSpec–The validated :class:
PreregistrationSpec.
Raises:
-
FileNotFoundError–When
pathdoes not exist. -
ValidationError–When the YAML payload does not match the schema.
-
ValueError–When :attr:
PreregistrationSpec.axisis not on the ADR-007 §3.4 shortlist.
Source code in src/chamber/evaluation/prereg.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
make_condition_row
make_condition_row(
*,
predictor: str,
conformal_mode: str,
vendor_compliance: str | None,
n_episodes: int,
violations: int,
fallback_fires: int,
oscbf_results: Iterable[OSCBFResult] = (),
) -> ConditionRow
Construct a :class:ConditionRow with slack aggregates wired in (issue #142).
The natural call-site at three_tables.json emission time. Routes
the per-step :class:concerto.safety.oscbf.OSCBFResult stream through
:func:aggregate_oscbf_slack so the v2 slack columns are populated
per the locked ADR-014 §Revision history 2026-05-16 contract. Callers
on the EGO_ONLY outer-filter path (the Stage-1a default) pass the
default empty tuple and get the schema-v2 0.0 defaults — exactly
the behaviour documented on
:class:concerto.safety.reporting.ConditionRow.
Parameters:
-
predictor(str) –"gt"or"pred"(Huriot & Sibai 2025 Table I labels). -
conformal_mode(str) –"noLearn"or"Learn". -
vendor_compliance(str | None) –ADR-007 rev 3 placeholder;
Nonein Phase-0. -
n_episodes(int) –Number of episodes in this condition.
-
violations(int) –Count of CBF-constraint violations.
-
fallback_fires(int) –Count of braking-fallback fires.
-
oscbf_results(Iterable[OSCBFResult], default:()) –Per-step :class:
OSCBFResultstream for the condition. Defaults to an empty tuple — the Stage-1a EGO_ONLY path that does not exercise the OSCBF inner filter (issue #142).
Returns:
-
A(ConditionRow) –class:
ConditionRowcarrying both the Phase-0 columns and -
ConditionRow–the ADR-014 v2 slack aggregates.
Source code in src/chamber/evaluation/oscbf_aggregate.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
pacluster_bootstrap
pacluster_bootstrap(
pairs: Iterable[PairedEpisode],
*,
n_resamples: int,
rng: Generator,
) -> BootstrapCI
Paired-cluster bootstrap for the ≥20pp gap test (ADR-007 §Validation criteria).
Pairs are matched on (seed, episode_idx, initial_state_seed)
so the homogeneous - heterogeneous delta is computed on the
same initial state, removing one of the dominant sources of
cross-condition variance. The cluster level is the seed, exactly
as in :func:cluster_bootstrap.
Parameters:
-
pairs(Iterable[PairedEpisode]) –Iterable of :class:
PairedEpisoderecords. -
n_resamples(int) –Number of bootstrap resamples.
-
rng(Generator) –Deterministic
np.random.Generator(ADR-002 P6).
Returns:
-
BootstrapCI–class:
BootstrapCIon the per-pair delta (in raw units, -
BootstrapCI–not percentage points — the renderer multiplies by 100 when
-
BootstrapCI–formatting).
Source code in src/chamber/evaluation/bootstrap.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
render_leaderboard
render_leaderboard(
entries: Sequence[LeaderboardEntry],
) -> str
Render the CHAMBER leaderboard (ADR-008 §Decision; reviewer P1-8).
Entries are sorted by hrs_scalar descending. Each method has
two rows: the headline row carrying the scalar + aggregate
violation / fallback rates, and a sub-row showing the HRS
vector as axis: score (weight) pairs. The vector sub-row is
unconditional per reviewer P1-8 — entries lacking an HRSVector
raise :class:ValueError.
Entries built from a single spike-run archive are tagged
[PARTIAL: <axis>] in front of the method name (reviewer P1-3)
so a one-axis row is never mistaken for a complete HRS-bundle row
(ADR-008 §Decision binds the bundle to the surviving-axis set).
Parameters:
-
entries(Sequence[LeaderboardEntry]) –Sequence of :class:
LeaderboardEntryrecords.
Returns:
-
str–Markdown table suitable for embedding in README or
-
str–leaderboard pages.
Raises:
-
ValueError–When any entry's :class:
HRSVectoris empty (no surviving axes) — the leaderboard refuses methods that did not produce a HRS vector per reviewer P1-8.
Source code in src/chamber/evaluation/render.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | |
render_three_table_safety_report
render_three_table_safety_report(
report: Mapping[str, Any], *, fmt: str = "markdown"
) -> str
Render the ADR-014 three-table safety report (ADR-014 §Decision).
Consumes the JSON-serialised payload produced by
:func:concerto.safety.reporting.emit_three_tables (or any
structurally equivalent ThreeTableReport.to_jsonable()
output) and emits Markdown or LaTeX. The fallback-fired column
in Table 2 is rendered separately from constraint-violation per
ADR-014 §Decision and the reviewer P0 review on safety reporting.
Parameters:
-
report(Mapping[str, Any]) –A dict matching the :class:
concerto.safety.reporting.ThreeTableReportto_jsonable()shape (table_1,table_2,table_3). -
fmt(str, default:'markdown') –Output format —
"markdown"or"latex".
Returns:
-
str–The rendered three-table report as a string.
Raises:
-
ValueError–When
fmtis not one of the supported formats, or the report dict is missing required keys.
Source code in src/chamber/evaluation/render.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
verify_git_tag
verify_git_tag(
spec: PreregistrationSpec,
prereg_path: Path,
*,
repo_path: Path,
) -> str
Verify the pre-registration matches its git tag (ADR-007 §Discipline).
Confirms (i) the tag exists in the repository and (ii) the blob
SHA of prereg_path on disk matches the blob SHA of the same
file as stored at the tag. The second check is the lock that
catches the "edit-the-YAML-after-launch" anti-pattern: any change
to the YAML's bytes after the tag was cut shifts the on-disk
blob SHA but not the tagged blob SHA.
Parameters:
-
spec(PreregistrationSpec) –The loaded :class:
PreregistrationSpec. -
prereg_path(Path) –Path to the YAML file on disk (absolute or repo-relative; will be resolved against
repo_path). -
repo_path(Path) –Root of the git working tree.
Returns:
-
str–The verified blob SHA (hex digest, 40-char SHA-1).
Raises:
-
PreregistrationError–When the tag is missing, the file is outside the repo, or the blob SHAs disagree.
Source code in src/chamber/evaluation/prereg.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | |
chamber.benchmarks
CHAMBER spike runners — Stage 0/1/2/3 benchmark experiments.
Implements the staged heterogeneity-axis spike protocol from ADR-007.
Each spike module reads a pre-registered YAML (spikes/preregistration/)
and refuses to run if the SHA does not match its git tag (project principle P4).
Chamber-side ego-AHT training-run launcher (T4b.11; ADR-002 §Decisions).
Bridges the method-side :func:concerto.training.ego_aht.train (which
intentionally does not import chamber.*) to the concrete env +
partner instances that live on the benchmark side. The dependency
direction stays clean: chamber.benchmarks.training_runner imports
both concerto.* (the algorithm-agnostic loop + config) and
chamber.* (the env + partner registry), and orchestrates them.
Public surface:
- :func:
build_env— turns an :class:~concerto.training.config.EnvConfiginto a concrete env (Phase-0 dispatch table:"mpe_cooperative_push"→ :class:MPECooperativePushEnv;"stage0_smoke"is reserved for T4b.3 and raises :class:NotImplementedErroruntil the HARL fork'sconcerto_env_adapterlands). - :func:
build_partner— turns a :class:~concerto.training.config.PartnerConfiginto a frozen partner via :func:chamber.partners.registry.load_partner(ADR-009 §Decision). - :func:
run_training— the end-to-end entry point: builds env, builds partner, calls :func:concerto.training.ego_aht.train, returns the :class:~concerto.training.ego_aht.TrainingResult(P1.04 NamedTuple wrapping the :class:RewardCurve+ trained :class:EgoTrainer).
ADR-002 §Decisions / plan/05 §3.5: this module is the natural home for
the env-task → env-class dispatch table; concrete env classes live
here. M4b-7 will extend the dispatch with the HARL fork's
concerto_env_adapter wrapper for stage0_smoke.
build_bounds_for_env
build_bounds_for_env(env: Env[Any, Any]) -> Bounds
Source per-task :class:Bounds from the env (P1.04.5; ADR-006 §Decision).
Delegates to env.build_bounds() when present
(:class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv adds this
in P1.04.5); falls back to a sensible default for envs that don't
expose the method (Tier-1 fake envs, MPE stand-in).
Parameters:
-
env(Env[Any, Any]) –A Gymnasium env. The Stage-1b env's :meth:
Stage1PickPlaceEnv.build_boundsreturns the per-task envelope pinned at construction (cartesian_accel_capacity fromPANDA_CARTESIAN_ACCEL_CAPACITY_MS2 = 10.0 m/s²).
Returns:
-
Bounds–class:
concerto.safety.api.Bounds.
Source code in src/chamber/benchmarks/training_runner.py
396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 | |
build_control_models
build_control_models(
env: Env[Any, Any],
) -> Mapping[str, AgentControlModel]
Build a per-uid :class:AgentControlModel map from an env (ADR-004 §Decision; spike_004A).
Two-tier dispatch (P1.03 / Task B):
- Env-provided. If the env exposes a callable
build_control_modelsinstance method (the contract :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnvadds in P1.03), delegate to it. The Stage-1b env's method returns a :class:JacobianControlModelfor the 7-DOF panda uid with a working Jacobian callable, plus :class:DoubleIntegratorControlModelfor the fetch uid — the per-embodiment dispatch lives there because the env knows its own URDF and articulation best (ADR-004 §Decision; ADR-007 §Stage 1b). - Default fallback. Read each uid's
action_space[uid]Box shape and return a :class:DoubleIntegratorControlModelper uid sized toshape[0]. Phase-0 envs in CHAMBER expose their actions as Cartesian velocities or accelerations (the safety filter's CBF assumes Cartesian acceleration), so the double-integrator identity model is the correct default for the MPE / Stage-0 envs.
Parameters:
-
env(Env[Any, Any]) –A multi-agent Gymnasium-conformant env. If the env exposes
build_control_models, the helper delegates; otherwise it falls back to the default-DI dispatch overenv.action_space.spaces.
Returns:
-
Mapping[str, AgentControlModel]–{uid: AgentControlModel}per uid in -
Mapping[str, AgentControlModel]–env.action_space.spaces(or per the env's own -
Mapping[str, AgentControlModel]–build_control_models()if delegated).
Raises:
-
TypeError–If the fallback path is taken and
env.action_spaceis not a :class:gymnasium.spaces.Dictor anyaction_space[uid]is not a :class:gymnasium.spaces.Box. -
ValueError–If the fallback path is taken and any uid's action Box is not 1-D (multi-D action shapes would need a custom control model rather than the double-integrator identity).
Source code in src/chamber/benchmarks/training_runner.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 | |
build_emergency_controllers
build_emergency_controllers(
env: Env[Any, Any],
) -> dict[str, EmergencyController]
Build per-uid emergency-controller dispatch for the cell's env (P1.04.6).
ADR-007 §Stage 1b Rev 8: the training-time
:func:concerto.safety.braking.maybe_brake call routes each
dangerous-pair uid's aggregate repulsion through its
:class:~concerto.safety.emergency.EmergencyController to translate
the Cartesian vector into a control-space override. Heterogeneous
embodiments need heterogeneous controllers:
- 7-DOF panda uids (action_dim != position_dim): dispatch to
:class:
~concerto.safety.emergency.JacobianEmergencyControllerwith the env-built :class:JacobianControlModel(same instance the outer CBF row uses, so the kinematic convention is configured once at chamber-side wiring time per ADR-004 §Revision history 2026-05-17). - Double-integrator uids (action_dim == position_dim, fetch /
MPE / Stage-0 default): dispatch to
:class:
~concerto.safety.emergency.CartesianAccelEmergencyController.
Dispatch on :class:JacobianControlModel instance identity rather
than on uid strings or action_dim heuristics — the model is
the source-of-truth for the embodiment / control-space mapping.
Parameters:
-
env(Env[Any, Any]) –A multi-agent Gymnasium-conformant env. Must expose
build_control_models()(the env-side dispatch) or match the default-DI fallback in :func:build_control_models.
Returns:
-
dict[str, EmergencyController]–{uid: EmergencyController}for every uid in the cell.
Source code in src/chamber/benchmarks/training_runner.py
500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 | |
build_env
build_env(env_cfg: EnvConfig, *, root_seed: int) -> EnvLike
Construct the env from :class:EnvConfig (T4b.11; ADR-002 §Decisions; plan/05 §3.5).
Dispatch table:
"mpe_cooperative_push"→ :class:MPECooperativePushEnv(T4b.13's empirical-guarantee env; Phase-0 stand-in)."stage0_smoke"→ :func:chamber.benchmarks.stage0_smoke_adapter.make_stage0_training_env(T4b.3 / plan/05 §3.4). Constructs the rig-validated ADR-001 3-robot env and adapts it to the :class:~concerto.training.ego_aht.EnvLikeshape. Requires a Vulkan-capable GPU."stage1_pickplace"→ :func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env(P1.04 / ADR-007 §Stage 1b). The Stage-1b science-evaluation env; panda + fetch / panda + panda_partner tuple.EnvConfig.agent_uidsdrives the per-condition selection (:func:chamber.envs.stage1_pickplace.resolve_conditionresolves via the prereg-drivencondition_idtable). :class:TrainedPolicyFactoryinvokes this dispatch from its__call__body through the chamber-side :func:run_trainingwrapper. Requires SAPIEN + Vulkan.
Parameters:
-
env_cfg(EnvConfig) –The validated :class:
~concerto.training.config.EnvConfig. -
root_seed(int) –Project root seed routed to the env's :func:
concerto.training.seeding.derive_substreamsubstream for P6 reproducibility.
Returns:
-
EnvLike–Concrete env instance satisfying the
-
EnvLike–class:
~concerto.training.ego_aht.EnvLikeProtocol.
Raises:
-
ValueError–If
env_cfg.taskis not in the dispatch table. -
ChamberEnvCompatibilityError–If a SAPIEN-backed env (
stage0_smoke/stage1_pickplace) is requested and SAPIEN / Vulkan is unavailable (see ADR-001 §Risks).
Source code in src/chamber/benchmarks/training_runner.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
build_partner
build_partner(partner_cfg: PartnerConfig) -> PartnerLike
Construct the frozen partner from :class:PartnerConfig (ADR-009 §Decision).
Builds a :class:~chamber.partners.api.PartnerSpec from the config
fields and routes through :func:chamber.partners.registry.load_partner
so the M4a registry's loud-fail contract (KeyError listing known
classes) applies uniformly.
Parameters:
-
partner_cfg(PartnerConfig) –The validated :class:
~concerto.training.config.PartnerConfig.
Returns:
-
PartnerLike–Concrete frozen partner satisfying the
-
PartnerLike–class:
~concerto.training.ego_aht.PartnerLikeProtocol.
Raises:
-
KeyError–If
partner_cfg.class_nameis not registered in :data:chamber.partners.registry._REGISTRY.
Source code in src/chamber/benchmarks/training_runner.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
build_safety_dt
build_safety_dt(
env: Env[Any, Any], *, fallback: float = 0.02
) -> float
Source the control-step dt from the env (P1.04.5; ADR-007 §Stage 1b).
Reads :attr:env.control_timestep (the ManiSkill BaseEnv default;
forwarded through the Stage-1b wrapper chain via the
:class:Stage1ASStateSynthesizer.__getattr__ forwarder). Falls
back to fallback (default 0.02 s — the ManiSkill v3 default)
on envs that don't expose control_timestep (Tier-1 fake envs
that aren't ManiSkill-backed).
Parameters:
-
env(Env[Any, Any]) –The cell's env.
-
fallback(float, default:0.02) –Default dt in seconds when the env has no
control_timestep. The training loop's safety wiring uses this for the conformal predictor's lookahead horizon and for the numerical-difference velocity computation in :meth:Stage1PickPlaceEnv.build_agent_snapshots.
Returns:
-
float–The control-step dt in seconds.
Source code in src/chamber/benchmarks/training_runner.py
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
build_safety_filter
build_safety_filter(
env: Env[Any, Any],
) -> EgoOnlySafetyFilter
Construct the CBF-QP outer filter for the cell's env (P1.04.5; ADR-007 §Stage 1b).
Wraps :meth:concerto.safety.cbf_qp.ExpCBFQP.ego_only with the
per-uid control-model map sourced from
:func:build_control_models (the env's
build_control_models() method when present; otherwise the
default-DI dispatch). The filter is constructed once per
(seed, condition) cell by
:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory
and handed to :func:concerto.training.ego_aht.train via the
safety_filter kwarg.
Construction is cheap (no scene-state assumed); the filter's
per-step :meth:filter method is what does the work.
Parameters:
-
env(Env[Any, Any]) –A multi-agent Gymnasium-conformant env exposing a
build_control_models()method (or matching the default- DI dispatch in :func:build_control_models).
Returns:
-
EgoOnlySafetyFilter–class:
concerto.safety.api.EgoOnlySafetyFilterinstance.
Source code in src/chamber/benchmarks/training_runner.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 | |
build_safety_snapshot_builder
build_safety_snapshot_builder(
env: Env[Any, Any],
) -> (
Callable[[gym.Env[Any, Any]], dict[str, AgentSnapshot]]
| None
)
Return the env's build_agent_snapshots method, or None (P1.04.5; ADR-007 §Stage 1b).
The training loop calls the returned callable once per step (when
the safety filter is wired) to build the per-uid Cartesian snapshot
map the conformal layer consumes. Envs that don't expose
build_agent_snapshots return None; the training loop then
treats the safety stack as inert (the kwargs validation in
:func:concerto.training.ego_aht.train raises if the operator
passed a filter without a snapshot builder).
Parameters:
-
env(Env[Any, Any]) –A Gymnasium env. The Stage-1b env's :meth:
Stage1PickPlaceEnv.build_agent_snapshotsreturns the per-uid snapshot dict.
Returns:
-
Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None–The bound method or
None. The returned callable's signature -
Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None–is
(env) -> dict[str, AgentSnapshot]for symmetry with the -
Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None–other
build_*helpers — implementations may ignore the -
Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None–envarg if they're bound methods on the env itself.
Source code in src/chamber/benchmarks/training_runner.py
430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 | |
run_training
run_training(
cfg: EgoAHTConfig,
*,
trainer_factory: TrainerFactory | None = None,
repo_root: Path | None = None,
safety_filter: EgoOnlySafetyFilter | None = None,
safety_state: SafetyState | None = None,
safety_bounds: Bounds | None = None,
safety_snapshot_builder: Callable[
[EnvLike], dict[str, AgentSnapshot]
]
| None = None,
safety_dt: float | None = None,
safety_emergency_controllers: Mapping[
str, EmergencyController
]
| None = None,
) -> TrainingResult
Build env + partner and run the ego-AHT training loop (T4b.11; ADR-002 §Decisions).
Drives :func:concerto.training.ego_aht.train with the chamber-side
env + partner instances. The default trainer_factory is
:meth:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config
(M4b-8a / plan/05 §3.5): the HARL-HAPPO-backed ego-PPO trainer that
the empirical-guarantee experiment (M4b-8b / T4b.13) measures.
Pass trainer_factory=None is therefore a no-op fallback to the
learning trainer; callers that explicitly want
:class:~concerto.training.ego_aht.RandomEgoTrainer (test fixtures,
smoke runs) must construct it and pass it in.
Returns a :class:~concerto.training.ego_aht.TrainingResult NamedTuple
(P1.04 / ADR-007 §Stage 1b) carrying both the diagnostic
:class:~concerto.training.ego_aht.RewardCurve and the trained
:class:~concerto.training.ego_aht.EgoTrainer instance. The trainer
is exposed alongside the curve so Phase-1 callers
(:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory) can
wrap result.trainer.act in a per-cell closure without paying a
checkpoint round-trip cost. Pre-P1.04 the return type was the bare
:class:RewardCurve; the NamedTuple wrapper is backward-compatible
at the tuple-unpack level (curve, trainer = run_training(cfg)).
Parameters:
-
cfg(EgoAHTConfig) –Validated :class:
~concerto.training.config.EgoAHTConfig. -
trainer_factory(TrainerFactory | None, default:None) –Optional :class:
~concerto.training.ego_aht.TrainerFactory.None(default) selects the M4b-8a :class:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainervia itsfrom_configclassmethod. -
repo_root(Path | None, default:None) –Working-tree root for run-metadata provenance.
-
safety_filter(EgoOnlySafetyFilter | None, default:None) –Optional CBF-QP outer filter (P1.04.5; ADR-007 §Stage 1b). All five safety kwargs are threaded through to :func:
concerto.training.ego_aht.train. The chamber- side :class:chamber.benchmarks.stage1_common.TrainedPolicyFactoryconstructs them per cell via :meth:_build_safety_for_cell. -
safety_state(SafetyState | None, default:None) –Optional :class:
SafetyState(conformal slack vector; mutated in place per step). -
safety_bounds(Bounds | None, default:None) –Optional per-task :class:
Bounds. -
safety_snapshot_builder(Callable[[EnvLike], dict[str, AgentSnapshot]] | None, default:None) –Optional callable building the per-uid :class:
AgentSnapshotmap per step. -
safety_dt(float | None, default:None) –Optional control-step dt in seconds for the predictor lookahead.
-
safety_emergency_controllers(Mapping[str, EmergencyController] | None, default:None) –Optional per-uid :class:
~concerto.safety.emergency.EmergencyControllermap (P1.04.6; ADR-007 §Stage 1b Rev 8). Threaded through to :func:concerto.training.ego_aht.trainfor the in-rollout :func:concerto.safety.braking.maybe_brakecall. Built per cell by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactoryvia :func:build_emergency_controllers.
Returns:
-
TrainingResult–class:
~concerto.training.ego_aht.TrainingResult— -
TrainingResult–result.curveis the :class:RewardCurve;result.trainer -
TrainingResult–is the trained :class:
EgoTrainerinstance.
Raises:
-
ValueError–If the env task is unknown.
-
NotImplementedError–If the env task is
stage0_smoke(deferred). -
KeyError–If the partner class is not registered.
Source code in src/chamber/benchmarks/training_runner.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
Ego-PPO trainer wrapping HARL's HAPPO actor (M4b-8a; ADR-002 §Decisions).
The :class:EgoPPOTrainer is the concrete :class:~concerto.training.ego_aht.EgoTrainer
that drives the Phase-0 ego-AHT empirical-guarantee experiment (T4b.13 /
plan/05 §3.5). It satisfies the Protocol with a thin layer over HARL's
:class:harl.algorithms.actors.happo.HAPPO actor + a hand-rolled MLP critic
+ a hand-rolled rollout buffer.
Architecture (plan/05 §3.5; ADR-002 §Decisions):
- The HARL fork's :class:
harl.algorithms.actors.ego_aht_happo.EgoAHTHAPPOis a thin reference subclass for external researchers; its :meth:from_configbody is intentionally :class:NotImplementedError. - CONCERTO's training run binds :meth:
EgoPPOTrainer.from_configinstead. The bridge from :class:~concerto.training.config.EgoAHTConfig(project- specific Pydantic) to HARL's 18-keyargsdict lives here, on the CONCERTO side, where project-specific surfaces belong. - Rollout / GAE / advantage normalization / PPO update / critic update all
run in this module — not in the fork, not via HARL's
:class:
OnPolicyHARunner. This satisfies the architecture rule in plan/05 §3.5 ("rollout/update logic on the CONCERTO side") and lets thetest_advantage_decomposition.pyproperty test compare against a hand-rolled GAE reference without tautology.
Public surface:
- :func:
compute_gae— module-level GAE recurrence (ADR-002 §Decisions; Schulman et al. 2016 §6.3). Used both by :class:EgoPPOTrainerand by the property test (T4b.13 / plan/05 §5). - :func:
normalize_advantages— module-level mean/std normalization. Same formula HARL applies inside :meth:HAPPO.trainfor state_type"EP"(without nan-masking, since the ego is always active in our 2-agent task — documented in the function's docstring). - :class:
EgoPPOTrainer— the trainer.
ADR-002 risk-mitigation #1: the empirical-guarantee assertion in T4b.13 / plan/05 §6 criterion 4 is a trip-wire for the framework choice. Do not change the GAE / normalization / PPO update math without re-running the property test and re-validating the reference implementation against the canonical PPO paper formulation.
EgoPPOTrainer
Ego-PPO trainer wrapping HARL's HAPPO actor (M4b-8a; ADR-002 §Decisions).
Implements the :class:~concerto.training.ego_aht.EgoTrainer Protocol
over a HARL :class:HAPPO actor + a hand-rolled :class:_EgoCritic.
The training loop in :func:concerto.training.ego_aht.train calls
:meth:act per step, :meth:observe per step (with the next obs +
reward + done), and :meth:update at every rollout_length
boundary. The trainer's internal rollout buffer accumulates
(obs, action, log_prob, value, reward, done) tuples between
update calls; :meth:update then runs GAE + normalization + a HARL
PPO update + a hand-rolled critic update, then clears the buffer.
Determinism caveat (P6 / ADR-002): HARL's :class:DiagGaussian samples
actions via :meth:torch.distributions.Normal.sample, which uses
torch's global RNG. The trainer seeds that global RNG once at
construction time via
:func:concerto.training.seeding.derive_substream; this is enough
for byte-identical CPU runs at the same seed but means concurrent
trainers in the same process would alias each other's draws.
Phase-0 runs are single-trainer per process, so this is acceptable.
See plan/05 §3.5 for the architecture rationale (rollout/update logic on the CONCERTO side, not in the fork) and plan/05 §5 for the empirical-guarantee verification contract (T4b.13).
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 | |
__init__
__init__(
*,
cfg: EgoAHTConfig,
ego_uid: str,
ego_obs_space: Box,
ego_act_space: Box,
partner: PartnerLike,
device: device | None = None,
) -> None
Build the trainer (M4b-8a; ADR-002 §Decisions; plan/05 §3.5).
Prefer :meth:from_config over direct construction — it derives
ego_obs_space and ego_act_space from the env so the call
site cannot drift from the env's actual shapes.
The partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3)
runs first, before any tensor allocation: a partner with any
requires_grad=True parameter (and no
:class:~chamber.partners.interface.PartnerBase shield blocking
named_parameters access) aborts the construction with a
:class:ValueError that names the offending parameter path.
Parameters:
-
cfg(EgoAHTConfig) –Validated :class:
~concerto.training.config.EgoAHTConfig. The trainer readscfg.seed,cfg.happo.*. -
ego_uid(str) –The env-side uid the ego acts on (matches
cfg.env.agent_uids[0]). -
ego_obs_space(Box) –The Box space for the ego's flat state vector. Synthesized by :meth:
from_configfrom the env's nested observation_space. -
ego_act_space(Box) –The Box space for the ego's action vector. Read directly from
env.action_space[ego_uid]. -
partner(PartnerLike) –The frozen partner instance the ego will be trained against. Validated via :func:
_assert_partner_is_frozenbefore any tensor allocation. The trainer does not retain a reference — the partner is consumed by the training loop in :func:concerto.training.ego_aht.train, not by the trainer itself. -
device(device | None, default:None) –Torch device. Defaults to CPU when called directly; :meth:
from_configresolves it from :attr:RuntimeConfig.device(auto / cpu / cuda / mps) via :func:_resolve_deviceand applies the determinism flag from :attr:RuntimeConfig.deterministic_torchon CPU before constructing the trainer (ADR-002 §Decisions; plan/08 §4).
Raises:
-
ValueError–If
partnerexposes a torch parameter withrequires_grad=True(ADR-009 §Consequences).
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 | |
act
act(
obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]
Sample the ego action; cache log-prob + value for :meth:observe (ADR-002 §Decisions).
Caches (obs, action, log_prob, value) in private state so the
next call to :meth:observe can insert them into the rollout
buffer with the actual reward + done flag. The pairing relies on
the contract in :func:concerto.training.ego_aht.train (each
act(obs) is followed by exactly one
observe(obs_next, reward, done) before the next act).
Parameters:
-
obs(Mapping[str, Any]) –Env obs dict. The ego's flat state vector is read from
obs["agent"][ego_uid]["state"]. -
deterministic(bool, default:False) –When
True, returns the distribution mode instead of a sample. Used by evaluation paths (M4b-8b's per-checkpoint deterministic eval).
Returns:
-
NDArray[floating]–The ego action as a length-
act_dimfloat32 array, ready -
NDArray[floating]–to be packed into the env's dict-action.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 | |
from_config
classmethod
from_config(
cfg: EgoAHTConfig,
*,
env: EnvLike,
partner: PartnerLike,
ego_uid: str,
) -> EgoPPOTrainer
Build from :class:EgoAHTConfig + a concrete env + the frozen partner.
ADR-002 §Decisions / ADR-009 §Consequences / plan/05 §3.5 + §6 #3.
This is the :class:~concerto.training.ego_aht.TrainerFactory
callable that :func:concerto.training.ego_aht.train plugs in via
dependency injection. The factory reads the ego's observation +
action shapes off the env directly so the trainer's network
widths cannot drift from what the env actually emits.
The partner-freeze gate (:func:_assert_partner_is_frozen) runs
first, before any tensor allocation, so a bad partner aborts
cheaply (plan/05 §6 #3).
Per plan/05 §3.5 the bridge from :class:EgoAHTConfig to HARL's
args dict lives in this CONCERTO-side module rather than in the
fork's :meth:EgoAHTHAPPO.from_config (which intentionally
stays :class:NotImplementedError — see
scripts/harl-fork-patches/v0.1.0-aht/ README for the
role-split rationale).
Parameters:
-
cfg(EgoAHTConfig) –Validated :class:
~concerto.training.config.EgoAHTConfig. -
env(EnvLike) –Concrete env satisfying :class:
EnvLike. Must exposeobservation_space["agent"][ego_uid]["state"]as a :class:gym.spaces.Boxandaction_space[ego_uid]as a :class:gym.spaces.Box. -
partner(PartnerLike) –Frozen partner instance the ego will train against. Validated by :func:
_assert_partner_is_frozenas the first construction step (ADR-009 §Consequences). -
ego_uid(str) –The env-side uid the ego acts on.
Returns:
-
EgoPPOTrainer–A constructed :class:
EgoPPOTrainerready for -
EgoPPOTrainer–func:
concerto.training.ego_aht.train.
Raises:
-
ValueError–If
partnerexposes a torch parameter withrequires_grad=True(ADR-009 §Consequences). -
TypeError–If the env's ego observation or action space is not a :class:
gym.spaces.Box.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 | |
observe
observe(
obs: Mapping[str, Any],
reward: float,
done: bool,
*,
truncated: bool = False,
) -> None
Buffer (pending, reward, terminated, truncated) (ADR-002 §Decisions; Pardo 2017).
The trainer keeps a single (obs, action, log_prob, value)
cache — set by :meth:act — and pairs it with the
(reward, done, truncated) from :meth:observe. The post-step
obs is also remembered as the bootstrap target for
:meth:update.
done keeps the historical "this step ended the episode" meaning
(terminated OR truncated). truncated is the new keyword-only
flag that distinguishes time-limit truncation from true
termination. The trainer derives
terminated = done and not truncated internally; the GAE
bootstrap then treats truncation correctly per Pardo et al. 2017
(V(s_truncated_final) flows through, not 0) — see :func:compute_gae
+ project issue #62 for the root-cause writeup.
At truncation boundaries this method calls the critic on the
post-step obs once to capture V(s_truncated_final); that value
is later threaded into :func:compute_gae as the per-step
bootstrap target. At non-truncation steps the recorded value is
0.0 and unused.
Parameters:
-
obs(Mapping[str, Any]) –Post-step env obs (the obs the env returned after the action passed to the most recent :meth:
actcall). -
reward(float) –Per-step ego reward (the env's shared scalar).
-
done(bool) –Trueif the env terminated or truncated this step (i.e. the loop should reset the env for the next step). -
truncated(bool, default:False) –Trueiff this step ended the episode via time-limit truncation (env returnedtruncated=True). DefaultFalseis the conservative legacy behavior — callers that don't yet distinguish truncation from termination see the historical "treat as terminal" semantics.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 | |
state_dict
state_dict() -> dict[str, Any]
Return a flat dict with actor + critic + optimizer states (ADR-002 §Decisions; T4b.12).
The returned dict is what :func:concerto.training.checkpoints.save_checkpoint
torch.saves to the .pt artefact. Loading is the dual:
each component reads its own sub-key off the loaded dict via
:meth:load_state_dict.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 | |
update
update() -> None
Compute GAE; run PPO + critic updates; clear the buffer (ADR-002 §Decisions; Pardo 2017).
Order of operations (Schulman 2017, lifted to ego-only / frozen- partner per plan/05 §3.5):
- Build the per-step
next_valuesbootstrap array (Pardo 2017 §4):V(s_truncated_final_t)at truncation boundaries (cached at observe time),0.0at terminations,values[t + 1]otherwise; rollout-tail bootstrap att = T - 1is the same rule applied toself._last_next_obs. - :func:
compute_gae→ raw advantagesA_t. returns_t = A_t + V(s_t)(the critic regression target).- :func:
normalize_advantages→ normalizedA_tfor the PPO importance-weighted surrogate loss. - For each of
n_epochsepochs, shuffle indices and walkactor_num_mini_batchminibatches. For each minibatch, build the 9-tuple :meth:HAPPO.updateexpects and call it directly (skipping :meth:HAPPO.train's buffer-machinery wrapper). - Train the critic on the same minibatch with MSE on returns.
- Clear the buffer.
The factor_batch argument :meth:HAPPO.update requires for
HAPPO's sequential multi-agent update is set to all 1.0 — the
ego is the only updating agent (ADR-009 §Decision: black-box AHT,
partner is frozen), so previous-agent importance weights are
identically 1.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 | |
compute_gae
compute_gae(
rewards: NDArray[float32],
values: NDArray[float32],
next_values: NDArray[float32],
episode_boundaries: NDArray[bool_],
*,
gamma: float,
gae_lambda: float,
) -> NDArray[np.float32]
GAE with per-step bootstrap (Schulman 2016 §6.3; Pardo 2017; ADR-002 §Decisions).
Pardo et al. 2017 ("Time Limits in Reinforcement Learning") motivates the per-step bootstrap target shape: time-limit truncation is not a terminal state — the policy should still receive the value of the state it would have transitioned to. Treating truncation as termination zeros that bootstrap and biases advantages systematically, slowing learning materially (see project issue #62).
The recurrence is::
delta_t = r_t + gamma * next_values[t] - V(s_t)
A_t = delta_t + gamma * lambda * (1 - boundary_t) * A_{t+1}
A_T = delta_T
where next_values[t] is the caller-supplied bootstrap target —
in production it is:
values[t + 1]for non-boundary steps (typical mid-rollout).V(s_truncated_final_t)at truncation boundaries (the value of the actual post-step obs of the truncated episode, computed by the caller via the critic).0.0at termination boundaries (true terminal state — no value after).- The rollout-tail bootstrap (
V(s_T+1)if the rollout ends mid-episode;0.0if it ends on an episode boundary) at the final timestept = T - 1.
episode_boundaries[t] is True iff step t ended an episode
(terminated or truncated). It zeros GAE propagation across the
boundary — advantages from a different episode (steps > t after
a boundary) must not contribute to A_t.
plan/05 §5 + the property test
tests/property/test_advantage_decomposition.py verify this formula
matches a freshly hand-rolled reference to within 1e-6 on synthetic
trajectories that include both termination and truncation boundaries.
Parameters:
-
rewards(NDArray[float32]) –Per-step ego reward. Shape
(T,). -
values(NDArray[float32]) –Per-step critic value estimate
V(s_t). Shape(T,). -
next_values(NDArray[float32]) –Per-step bootstrap target. Shape
(T,). Caller responsibility (see module docstring + issue #62 root-cause writeup for why this split matters). -
episode_boundaries(NDArray[bool_]) –Trueat steps that ended an episode (terminated OR truncated). Shape(T,). Zeros GAE propagation across the boundary. -
gamma(float) –Discount factor in
[0, 1]. -
gae_lambda(float) –GAE exponential smoothing in
[0, 1].
Returns:
-
NDArray[float32]–Advantages
A_t, same shape and dtype (float32) asrewards.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 | |
normalize_advantages
normalize_advantages(
advantages: NDArray[float32],
) -> NDArray[np.float32]
Mean/std-normalize advantages (ADR-002 §Decisions; matches HAPPO state_type='EP').
The formula::
A_normalized = (A - mean(A)) / (std(A) + 1e-5)
matches the EP-state branch in :meth:HAPPO.train
(lines 122-127 of harl/algorithms/actors/happo.py) without the
nan-masking it applies to inactive agents. The ego in our 2-agent
task is never inactive (use_policy_active_masks=False in
:data:_HARL_FIXED_DEFAULTS), so the nan-masking branch is dead
code in our setting and the formula collapses to a plain
mean/std normalization.
plan/05 §5: pulled out of HARL into this module so the property test (T4b.13) can verify the formula matches single-agent PPO without tautology.
Parameters:
-
advantages(NDArray[float32]) –Raw advantages (typically from :func:
compute_gae).
Returns:
-
NDArray[float32]–Normalized advantages, same shape and dtype.
Source code in src/chamber/benchmarks/ego_ppo_trainer.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 | |
Stage-1 spike-adapter common surface (ADR-007 §Stage 1a / §Stage 1b; plan/07 §T5b.2).
Single source of truth for the seam that Stage-1 AS and OM adapters
(:mod:chamber.benchmarks.stage1_as, :mod:chamber.benchmarks.stage1_om)
share — the :class:EgoActionFactory Protocol the Phase-1 trained-ego
path plugs into, and the Stage-1a production-default
:func:_zero_ego_action_factory.
Why a Protocol (not a per-step Callable):
- The factory contract is called once per
(seed, condition)pair. That matches the natural shape of a trained-policy injection point — Phase 1 (ADR-007 §Stage 1b) supplies a factory that calls :func:concerto.training.ego_aht.trainfor the homogeneous condition's env at the given seed, then returns a closure that wraps the trained policy'sact(the closure is then re-used across the 20 evaluation episodes within that(seed, condition)cell). - The Phase-0 per-step
ego_action_fnwas unable to express this contract — there was no place to hang a "this happens once per(seed, condition), not once per step" idea. The Protocol seam pins the lifecycle explicitly.
Phase-1 wiring contract (verbatim, do not rewrite without an ADR-007 amendment):
Phase-1 supplies a factory that calls
concerto.training.ego_aht.train for the homogeneous condition's
env at the given seed, returns lambda obs: trained_policy.act(obs).
The factory is called once per (seed, condition) pair so the
trained policy is reused across the 20 evaluation episodes within a
seed.
Stage-1a production default: :func:_zero_ego_action_factory returns a
closure that emits a zero action vector of the env's ego-action shape.
Stage-1a's purpose is to exercise the rig (env steps, partner acts,
episode terminations, success thresholding, SpikeRun aggregation,
chamber-eval pipeline) without confounding the rig signal with a
trainer signal; a learning ego is sequenced to Stage-1b
(ADR-007 §Stage 1b).
Stage-1b production default (P1.04): :class:TrainedPolicyFactory —
wires :func:chamber.benchmarks.training_runner.run_training for each
(seed, condition) cell and returns a closure that wraps the trained
:class:~concerto.training.ego_aht.EgoTrainer's act method
(deterministic mode). One ego-AHT training run per cell (5 GPU-h per
axis on A100 at the 100k-frame budget per ADR-007 §Stage 1b); the
trainer is recovered from the
:class:~concerto.training.ego_aht.TrainingResult NamedTuple rather
than reloaded from disk, so no checkpoint round-trip per cell. The
Phase-0 _zero_ego_action_factory is unchanged — Stage-1a runs keep
the rig-validation contract; the Stage-1b dispatch swap is sequenced
to P1.05 (separate slice).
EgoActionFactory
Bases: Protocol
Stage-1 ego-action factory contract (ADR-007 §Stage 1a / §Stage 1b).
Called once per (seed, condition) pair by the Stage-1 spike
adapter's :func:_run_axis_with_factories. The returned callable
is invoked once per env step for the 20 evaluation episodes within
that (seed, condition) cell.
Two production implementations satisfy this Protocol:
- :func:
_zero_ego_action_factory— the Stage-1a default (rig validation). Returns a closure that emits a zero action vector of the env's ego-action shape. No randomness; no training. - :class:
TrainedPolicyFactory— the Stage-1b default (science evaluation; ADR-007 §Stage 1b). Wires :func:concerto.training.ego_aht.trainvia :func:chamber.benchmarks.training_runner.run_trainingfor each(seed, condition)cell, then returns a closure that wraps the trained :class:~concerto.training.ego_aht.EgoTrainer'sactmethod (deterministic mode for evaluation).
The Protocol is intentionally tiny — (env, seed) → callable —
so an external user can plug in their own algorithm without
changing :func:_run_axis_with_factories. Project test fixtures
can build a counting factory (see
tests.integration.test_stage1_*_fake) to assert the lifecycle
contract.
Source code in src/chamber/benchmarks/stage1_common.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
__call__
__call__(
env: Env[Any, Any], seed: int
) -> Callable[[Mapping[str, Any]], NDArray[np.float32]]
Build the per-step ego-action callable for one (seed, condition) cell.
Source code in src/chamber/benchmarks/stage1_common.py
106 107 108 109 110 111 112 | |
TrainedPolicyFactory
Phase-1 :class:EgoActionFactory — wires the ego-AHT training loop per cell.
ADR-007 §Stage 1b production factory. Implements the
:class:EgoActionFactory Protocol (this module) by calling
:func:chamber.benchmarks.training_runner.run_training for each
(seed, condition) cell and returning a closure that wraps the
trained :class:~concerto.training.ego_aht.EgoTrainer's act
method (deterministic mode for evaluation).
Lifecycle (ADR-007 §Stage 1b lifecycle contract):
The factory is called once per (seed, condition) pair by the
Stage-1 spike adapter's :func:_run_axis_with_factories (5 seeds x
2 conditions = 10 calls per Stage-1 spike). Each call drives one
full training run (5 x 100k frames per axis = 1M frames per axis
on A100; ~5 GPU-h per axis per the compute plan); the returned
closure is re-used across the 20 evaluation episodes within the
cell so the trained policy is not retrained per-episode.
Determinism (ADR-007 §Discipline + ADR-002 §Decisions; P6):
The trained-policy seed is set via Pydantic model_copy on
cfg.seed; the underlying training run derives all its
randomness from
:func:concerto.training.seeding.derive_substream. Two invocations
at the same (seed, condition) pair produce byte-identical
trained policies on CPU; on GPU, the cuDNN / cuBLAS non-bit-
determinism caveat (plan/08 §4) applies.
Caching policy (intentionally NO cache):
The factory does NOT cache trained policies across __call__
invocations — each call trains a fresh policy. Caching would
silently violate the per-(seed, condition) determinism contract
if the cache key were not exhaustive (env identity, partner
identity, prereg tag, code state). If the operator needs to resume
a partial run, the right tool is the
:class:~concerto.training.checkpoints.CheckpointMetadata sidecar
serialised under cfg.artifacts_root (the training loop emits
one per cfg.checkpoint_every steps); not an in-memory cache.
Partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3):
The partner-freeze gate is honoured by
:meth:EgoPPOTrainer.__init__'s :func:_assert_partner_is_frozen
call (run first, before any tensor allocation). The factory does
not bypass it.
Per-condition partner-uid routing (P1.04 / Q-C refinement):
The factory reads the env's :attr:partner_uid and :attr:ego_uid
properties at __call__ time and overrides
cfg.env.agent_uids accordingly — single source of truth at the
env layer; the factory carries no per-condition dispatch table.
For AS-homo (partner=panda_partner) vs AS-hetero / OM-*
(partner=fetch), the env's condition_id drives the routing
via the
:data:chamber.envs.stage1_pickplace._CONDITION_TABLE. Envs that
don't expose the properties (Tier-1 fake envs, MPE stand-in) fall
back to cfg.env.agent_uids[0] / [1] — the Phase-0 contract
that pre-dates the property.
Action-dim handoff to partner (P1.04 / Q-C refinement):
The :class:~chamber.partners.heuristic.ScriptedHeuristicPartner
reads its action dimension from spec.extra["action_dim"] and
zero-pads beyond the planar xy. The factory reads the actual
env.action_space.spaces[partner_uid].shape[0] at call time and
overrides cfg.partner.extra["action_dim"] so the partner's
action vector matches the env's expectation per-condition. This
avoids a side-table; the action_dim is sourced from the env that
owns the contract.
Parameters:
-
cfg(EgoAHTConfig) –A fully-resolved :class:
~concerto.training.config.EgoAHTConfig. The factory rewritescfg.seedper call viacfg.model_copy(update={"seed": call_seed})and applies the per-condition env+partner overrides described above; all other fields are honoured as-is.
Example wiring at the Stage-1b adapter (sequenced to P1.05):
.. code-block:: python
from chamber.benchmarks.stage1_common import TrainedPolicyFactory
from concerto.training.config import load_config
factory = TrainedPolicyFactory(
cfg=load_config(
config_path=Path("configs/training/ego_aht_happo/stage1_pickplace.yaml")
)
)
spike_run = _run_axis_with_factories(
prereg=spec,
prereg_sha=prereg_sha,
env_factory=_default_env_factory,
ego_action_factory=factory,
)
Source code in src/chamber/benchmarks/stage1_common.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | |
__call__
__call__(
env: Env[Any, Any], seed: int
) -> EgoActionCallable
Train a per-cell policy and return the per-step closure (ADR-007 §Stage 1b).
Lifecycle: invoked once per (seed, condition) cell by the
Stage-1 adapter's :func:_run_axis_with_factories. The returned
callable is reused across the 20 evaluation episodes within
the cell.
Parameters:
-
env(Env[Any, Any]) –The per-condition evaluation env.
env.ego_uid/env.partner_uid(P1.04 properties on :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv) drive the per-condition overrides applied to the basecfg.envitself is NOT stepped by the factory; the training run inside :func:run_trainingbuilds its own env from the (overridden)cfg.env. The eval env is what the spike adapter's loop steps after this factory returns. -
seed(int) –Per-cell seed; the factory overrides
cfg.seedviamodel_copy.
Returns:
-
EgoActionCallable–(obs) -> actioncallable that wraps -
EgoActionCallable–trained_policy.act(obs, deterministic=True)and casts -
EgoActionCallable–the output to
float32.
Raises:
-
ValueError–Propagated from :meth:
EgoPPOTrainer.__init__if the partner is non-frozen (ADR-009 §Consequences). Any other exception raised by :func:chamber.benchmarks.training_runner.run_trainingalso propagates unchanged (per thetest_run_training_failure_propagatescontract intests/integration/test_trained_policy_factory_fake.py).
Source code in src/chamber/benchmarks/stage1_common.py
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 | |
__init__
__init__(*, cfg: EgoAHTConfig) -> None
Build the factory; capture the base config.
Parameters:
-
cfg(EgoAHTConfig) –Validated :class:
~concerto.training.config.EgoAHTConfig— the Stage-1b config (typically loaded fromconfigs/training/ego_aht_happo/stage1_pickplace.yaml). The original instance is never mutated; per-call overrides are applied viamodel_copy.
Source code in src/chamber/benchmarks/stage1_common.py
290 291 292 293 294 295 296 297 298 299 300 | |