Skip to content

API Reference

Generated from docstrings via mkdocstrings.

CONCERTO — the method

concerto.api

Public Protocols for CONCERTO — the method-level contracts.

All concrete implementations (in chamber.* or concerto.safety, concerto.training) depend on these Protocols, not on each other. See ADR-INDEX.md for the full decision record; individual modules in this package cite the specific ADR they implement.

concerto.safety

CONCERTO safety stack — exp CBF-QP + conformal overlay + OSCBF + braking.

Implements the three-layer safety filter decided in ADR-004 (Wang-Ames-Egerstedt 2017 + Huriot & Sibai 2025 + Morton & Pavone 2025), the per-task numeric bounds plumbing from ADR-006 §Decision, and the three-table reporting format from ADR-014 §Decision.

Public surface (see :mod:concerto.safety.api for full docstrings):

  • :class:Bounds — per-task numeric envelope (ADR-006).
  • :class:SafetyState — mutable conformal-CBF state (ADR-004).
  • :class:FilterInfo — telemetry payload (ADR-014).
  • :class:EgoOnlySafetyFilter, :class:JointSafetyFilter — typed Protocols for the two filter shapes (ADR-004 §Public API; spike_004A §Three-mode taxonomy). The legacy single :data:SafetyFilter alias remains importable but emits a :class:DeprecationWarning on first use; removal target 0.3.0.
  • :class:ConcertoSafetyInfeasible — QP-infeasibility error (ADR-004).
  • :class:EmergencyController — per-uid embodiment hook for the hard braking fallback (ADR-004 risk-mitigation #1).
  • :class:CartesianAccelEmergencyController — default Cartesian sum-and-saturate override for double-integrator agents (ADR-004 risk-mitigation #1).
  • :data:DEFAULT_EPSILON, :data:DEFAULT_ETA, :data:DEFAULT_WARMUP_STEPS — conformal defaults (ADR-004 §Decision).
  • :func:solve_qp_stub — small Clarabel QP benchmark used by the M2 saturation guard test (T3.10 replaced the no-op M2 stub with a real solve at a name-preserved entry point).

FilterInfo module-attribute

FilterInfo = TypedDict(
    "FilterInfo",
    {
        "lambda": LambdaDict,
        "constraint_violation": FloatArray,
        "prediction_gap_loss": LambdaDict | None,
        "fallback_fired": bool,
        "qp_solve_ms": float,
    },
)

Telemetry payload returned by the filter Protocols (ADR-014 §Decision).

Returned by :meth:EgoOnlySafetyFilter.filter and :meth:JointSafetyFilter.filter.

The fields populate the three-table renderer's row data:

  • lambda — current conformal slack as a :data:LambdaDict (per-pair UID-tuple keys → float); feeds Table 3 (conservativeness gap λ mean/var vs. oracle gt/noLearn). Issue #144 promoted this field from a canonical-pair-order ndarray to the dict form so the per-pair indexing is part of the type system; see :func:lambda_to_jsonable / :func:lambda_from_jsonable for the ADR-014 v3 wire encoding.
  • constraint_violation — per-pair non-negative violation signal max(0, -h_ij) from the CBF backbone. Zero when the constraint is satisfied; positive when the barrier is in the violated half-space. Counted into Table 2's per-condition violation rates. Stays as a :class:numpy.ndarray indexed by the canonical pair-iteration order produced by :func:make_pair_keys over the filter's UID set — issue #144's structural fix targets lambda only.
  • prediction_gap_loss — per-pair Huriot & Sibai 2025 §IV.A loss max(0, predicted_h - actual_h) against a partner-trajectory predictor as a :data:LambdaDict (same keying as lambda). This is the signal that drives the conformal update (update_lambda); :data:None when no predictor is wired (the CBF backbone alone cannot compute it — see concerto.safety.conformal.update_lambda_from_predictor).
  • fallback_fired — feeds Table 2's "fallback fired" column.
  • qp_solve_ms — feeds the OSCBF 1 kHz target check (ADR-004 §"OSCBF target").

The constraint-violation and prediction-gap loss are reported as separate cells of the ADR-014 three-table report because they answer different questions: constraint_violation is a per-step CBF gap (did the QP keep us in the safe set this step?), and prediction_gap_loss is the conformal calibration signal (was the predictor's forecast of the safe set wrong?).

AgentControlModel

Bases: Protocol

Per-agent action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A).

The CBF row generator builds pairwise constraints in Cartesian / safety space, then projects to per-agent action-space coefficients via this Protocol. The Protocol mediates between each agent's action space (joint torques for a 7-DOF arm, wheel velocities for a 2-DOF base, Cartesian accelerations for a point mass) and the shared safety space (Cartesian acceleration of the agent's safety body).

Implementations MUST be deterministic on identical (state, action) inputs (P6; seeding contract) and MUST be consistent with each other: for any state and any Cartesian acceleration a in the image of :meth:action_to_cartesian_accel, action_to_cartesian_accel(state, cartesian_accel_to_action(state, a)) == a to within floating- point tolerance. The right-inverse exists by construction for over-actuated agents (Jacobian has more columns than rows); for exactly-actuated agents the right-inverse is the unique inverse.

Attributes:

  • uid (str) –

    Unique identifier of the agent whose model this is.

  • action_dim (int) –

    Dimensionality of the agent's action vector (e.g. 7 for a 7-DOF arm; 2 for a planar double integrator).

  • position_dim (int) –

    Cartesian dimension of the agent's safety body (typically 2 for the §V toy crossing, 3 for manipulation).

Source code in src/concerto/safety/api.py
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
@runtime_checkable
class AgentControlModel(Protocol):
    """Per-agent action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A).

    The CBF row generator builds pairwise constraints in Cartesian /
    safety space, then projects to per-agent action-space coefficients
    via this Protocol. The Protocol mediates between each agent's
    *action space* (joint torques for a 7-DOF arm, wheel velocities
    for a 2-DOF base, Cartesian accelerations for a point mass) and
    the shared *safety space* (Cartesian acceleration of the agent's
    safety body).

    Implementations MUST be deterministic on identical ``(state,
    action)`` inputs (P6; seeding contract) and MUST be consistent
    with each other: for any ``state`` and any Cartesian acceleration
    ``a`` in the image of :meth:`action_to_cartesian_accel`,
    ``action_to_cartesian_accel(state,
    cartesian_accel_to_action(state, a)) == a`` to within floating-
    point tolerance. The right-inverse exists by construction for
    over-actuated agents (Jacobian has more columns than rows); for
    exactly-actuated agents the right-inverse is the unique inverse.

    Attributes:
        uid: Unique identifier of the agent whose model this is.
        action_dim: Dimensionality of the agent's action vector
            (e.g. 7 for a 7-DOF arm; 2 for a planar double integrator).
        position_dim: Cartesian dimension of the agent's safety body
            (typically 2 for the §V toy crossing, 3 for manipulation).
    """

    @property
    def uid(self) -> str:
        """Unique identifier of the agent (ADR-004 §Decision)."""
        ...

    @property
    def action_dim(self) -> int:
        """Dimensionality of the action vector (ADR-004 §Decision)."""
        ...

    @property
    def position_dim(self) -> int:
        """Cartesian dimension of the safety body (ADR-004 §Decision)."""
        ...

    def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
        """Map a control-space action to a Cartesian acceleration (ADR-004 §Decision).

        Args:
            state: Current :class:`concerto.safety.cbf_qp.AgentSnapshot`.
                The mapping is state-dependent for any embodiment whose
                Jacobian depends on configuration (e.g. a 7-DOF arm).
            action: Action vector of shape ``(action_dim,)``, dtype
                ``float64``.

        Returns:
            Cartesian acceleration of shape ``(position_dim,)``, dtype
            ``float64``.
        """
        ...

    def cartesian_accel_to_action(
        self, state: AgentSnapshot, cartesian_accel: FloatArray
    ) -> FloatArray:
        """Right-inverse for QP projection back into action space (ADR-004 §Decision).

        Used by the CBF-QP to express the per-agent action-space slot
        coefficients of a Cartesian constraint row, and (via
        :class:`~concerto.safety.emergency.JacobianEmergencyController`)
        by the braking fallback to translate the Cartesian repulsion
        target into joint-space torques. For an over-actuated agent the
        right-inverse selects one of the many actions that realise the
        same Cartesian acceleration; the damped-least-squares
        pseudo-inverse is the canonical choice (see
        :class:`JacobianControlModel`) and is the single source of
        truth for the Cartesian-to-joint map across the CBF row
        projection and the emergency fallback (ADR-004 §Decision; P1.02).

        Args:
            state: Current :class:`concerto.safety.cbf_qp.AgentSnapshot`.
            cartesian_accel: Cartesian acceleration of shape
                ``(position_dim,)``, dtype ``float64``.

        Returns:
            Action vector of shape ``(action_dim,)``, dtype ``float64``.
        """
        ...

    def max_cartesian_accel(self, bounds: Bounds) -> float:
        """Per-agent Cartesian acceleration capacity (ADR-004 §Decision; spike_004A).

        Used by the QP-row generator to size the per-pair
        ``alpha_pair = alpha_i_cart + alpha_j_cart`` and to drive the
        proportional Wang-Ames-Egerstedt 2017 §IV budget split for
        heterogeneous embodiments. For a double-integrator agent
        whose action space *is* Cartesian acceleration this is
        ``bounds.cartesian_accel_capacity`` (the L2 magnitude cap).

        Args:
            bounds: Per-task :class:`Bounds` envelope.

        Returns:
            Strictly positive scalar acceleration capacity.
        """
        ...

action_dim property

action_dim: int

Dimensionality of the action vector (ADR-004 §Decision).

position_dim property

position_dim: int

Cartesian dimension of the safety body (ADR-004 §Decision).

uid property

uid: str

Unique identifier of the agent (ADR-004 §Decision).

action_to_cartesian_accel

action_to_cartesian_accel(
    state: AgentSnapshot, action: FloatArray
) -> FloatArray

Map a control-space action to a Cartesian acceleration (ADR-004 §Decision).

Parameters:

  • state (AgentSnapshot) –

    Current :class:concerto.safety.cbf_qp.AgentSnapshot. The mapping is state-dependent for any embodiment whose Jacobian depends on configuration (e.g. a 7-DOF arm).

  • action (FloatArray) –

    Action vector of shape (action_dim,), dtype float64.

Returns:

  • FloatArray

    Cartesian acceleration of shape (position_dim,), dtype

  • FloatArray

    float64.

Source code in src/concerto/safety/api.py
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
    """Map a control-space action to a Cartesian acceleration (ADR-004 §Decision).

    Args:
        state: Current :class:`concerto.safety.cbf_qp.AgentSnapshot`.
            The mapping is state-dependent for any embodiment whose
            Jacobian depends on configuration (e.g. a 7-DOF arm).
        action: Action vector of shape ``(action_dim,)``, dtype
            ``float64``.

    Returns:
        Cartesian acceleration of shape ``(position_dim,)``, dtype
        ``float64``.
    """
    ...

cartesian_accel_to_action

cartesian_accel_to_action(
    state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray

Right-inverse for QP projection back into action space (ADR-004 §Decision).

Used by the CBF-QP to express the per-agent action-space slot coefficients of a Cartesian constraint row, and (via :class:~concerto.safety.emergency.JacobianEmergencyController) by the braking fallback to translate the Cartesian repulsion target into joint-space torques. For an over-actuated agent the right-inverse selects one of the many actions that realise the same Cartesian acceleration; the damped-least-squares pseudo-inverse is the canonical choice (see :class:JacobianControlModel) and is the single source of truth for the Cartesian-to-joint map across the CBF row projection and the emergency fallback (ADR-004 §Decision; P1.02).

Parameters:

  • state (AgentSnapshot) –

    Current :class:concerto.safety.cbf_qp.AgentSnapshot.

  • cartesian_accel (FloatArray) –

    Cartesian acceleration of shape (position_dim,), dtype float64.

Returns:

  • FloatArray

    Action vector of shape (action_dim,), dtype float64.

Source code in src/concerto/safety/api.py
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
def cartesian_accel_to_action(
    self, state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray:
    """Right-inverse for QP projection back into action space (ADR-004 §Decision).

    Used by the CBF-QP to express the per-agent action-space slot
    coefficients of a Cartesian constraint row, and (via
    :class:`~concerto.safety.emergency.JacobianEmergencyController`)
    by the braking fallback to translate the Cartesian repulsion
    target into joint-space torques. For an over-actuated agent the
    right-inverse selects one of the many actions that realise the
    same Cartesian acceleration; the damped-least-squares
    pseudo-inverse is the canonical choice (see
    :class:`JacobianControlModel`) and is the single source of
    truth for the Cartesian-to-joint map across the CBF row
    projection and the emergency fallback (ADR-004 §Decision; P1.02).

    Args:
        state: Current :class:`concerto.safety.cbf_qp.AgentSnapshot`.
        cartesian_accel: Cartesian acceleration of shape
            ``(position_dim,)``, dtype ``float64``.

    Returns:
        Action vector of shape ``(action_dim,)``, dtype ``float64``.
    """
    ...

max_cartesian_accel

max_cartesian_accel(bounds: Bounds) -> float

Per-agent Cartesian acceleration capacity (ADR-004 §Decision; spike_004A).

Used by the QP-row generator to size the per-pair alpha_pair = alpha_i_cart + alpha_j_cart and to drive the proportional Wang-Ames-Egerstedt 2017 §IV budget split for heterogeneous embodiments. For a double-integrator agent whose action space is Cartesian acceleration this is bounds.cartesian_accel_capacity (the L2 magnitude cap).

Parameters:

  • bounds (Bounds) –

    Per-task :class:Bounds envelope.

Returns:

  • float

    Strictly positive scalar acceleration capacity.

Source code in src/concerto/safety/api.py
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def max_cartesian_accel(self, bounds: Bounds) -> float:
    """Per-agent Cartesian acceleration capacity (ADR-004 §Decision; spike_004A).

    Used by the QP-row generator to size the per-pair
    ``alpha_pair = alpha_i_cart + alpha_j_cart`` and to drive the
    proportional Wang-Ames-Egerstedt 2017 §IV budget split for
    heterogeneous embodiments. For a double-integrator agent
    whose action space *is* Cartesian acceleration this is
    ``bounds.cartesian_accel_capacity`` (the L2 magnitude cap).

    Args:
        bounds: Per-task :class:`Bounds` envelope.

    Returns:
        Strictly positive scalar acceleration capacity.
    """
    ...

Bounds dataclass

Per-task numeric bounds for the safety stack (ADR-006 §Decision Option C).

The Bounds envelope is the explicit-numeric half of ADR-006's hybrid decision: bounds gate QP feasibility and reportability while the conformal slack (:class:SafetyState) governs online adaptation. The enumerated values track ADR-006 §Consequences — the comm-latency bound is anchored to the URLLC-3GPP-R17 sweep table from M2; action_linf_component, cartesian_accel_capacity, and action_rate are task-specific (Phase-1 fills them; Phase-0 ships sane defaults). The force_limit field is a Phase-0 stub for ADR-007 Open Question #4 (per-vendor decomposition, ADR-004 Open Question #5) — a strategy-interfaced per-vendor handler slots in once Stage-3 SA resolves the open question.

Field split (issue #146, closed by P1.02; 2026-05-17). The prior single action_norm field was consumed by two safety layers with mismatched semantics (L-infinity per-component in the CBF-QP outer filter; L2 magnitude cap in the Cartesian emergency controller). The field is now split into two explicit fields, each named for its semantics:

  • action_linf_component — per-component L-infinity bound on the action vector (|u[k]| <= action_linf_component for every component k). Enforced by :class:concerto.safety.cbf_qp.ExpCBFQP's action-space bound rows and (post-Jacobian) by :class:~concerto.safety.emergency.JacobianEmergencyController's per-joint clip.
  • cartesian_accel_capacity — L2 magnitude cap on the Cartesian acceleration. The semantic source of the per-agent alpha in the Wang-Ames-Egerstedt 2017 §III barrier value (alpha_pair = 2 * cartesian_accel_capacity for symmetric agents) and of the :class:~concerto.safety.emergency.CartesianAccelEmergencyController's saturation step.

For a d-dimensional action the two are related but not equal: an action drawn from the L-infinity envelope has worst-case L2 magnitude sqrt(d) * action_linf_component. The conservative operator pattern is cartesian_accel_capacity = action_linf_component (matches the pre-split semantics under the safe-operator workaround the 2026-05-16 docstring named); a less conservative deployment sets cartesian_accel_capacity = sqrt(d) * action_linf_component to use the full envelope. Either pin is the operator's call; the safety stack treats the two fields as independent inputs.

Attributes:

  • action_linf_component (float) –

    Per-component L-infinity bound enforced by the CBF-QP outer filter (:class:concerto.safety.cbf_qp.ExpCBFQP) and as the post-Jacobian per-joint clip in :class:~concerto.safety.emergency.JacobianEmergencyController.

  • cartesian_accel_capacity (float) –

    L2 magnitude cap on Cartesian acceleration. Source of the per-agent alpha consumed by the conformal-barrier alpha_pair and the saturation step in :class:~concerto.safety.emergency.CartesianAccelEmergencyController.

  • action_rate (float) –

    Maximum :math:\lVert u_k - u_{k-1} \rVert_2 per agent, per task. Bounds the actuator-rate slack the conformal CBF must remain feasible under (ADR-006 §Decision).

  • comm_latency_ms (float) –

    Default mean comm latency, in milliseconds. Set from the URLLC profile selected for the run (ADR-006 §Consequences; chamber.comm.URLLC_3GPP_R17).

  • force_limit (float) –

    Global force threshold for the per-pair contact-force constraint (ADR-004 Open Question #5; per-vendor handler slots in once Stage-3 SA resolves the decomposition).

Source code in src/concerto/safety/api.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
@dataclass(frozen=True)
class Bounds:
    r"""Per-task numeric bounds for the safety stack (ADR-006 §Decision Option C).

    The ``Bounds`` envelope is the explicit-numeric half of ADR-006's hybrid
    decision: bounds gate QP feasibility and reportability while the
    conformal slack (:class:`SafetyState`) governs online adaptation. The
    enumerated values track ADR-006 §Consequences — the comm-latency bound
    is anchored to the URLLC-3GPP-R17 sweep table from M2;
    ``action_linf_component``, ``cartesian_accel_capacity``, and
    ``action_rate`` are task-specific (Phase-1 fills them; Phase-0 ships
    sane defaults). The ``force_limit`` field is a Phase-0 stub for
    ADR-007 Open Question #4 (per-vendor decomposition, ADR-004 Open
    Question #5) — a strategy-interfaced per-vendor handler slots in once
    Stage-3 SA resolves the open question.

    **Field split (issue #146, closed by P1.02; 2026-05-17).** The prior
    single ``action_norm`` field was consumed by two safety layers with
    mismatched semantics (L-infinity per-component in the CBF-QP outer
    filter; L2 magnitude cap in the Cartesian emergency controller). The
    field is now split into two explicit fields, each named for its
    semantics:

    - ``action_linf_component`` — per-component L-infinity bound on the
      action vector (``|u[k]| <= action_linf_component`` for every
      component ``k``). Enforced by
      :class:`concerto.safety.cbf_qp.ExpCBFQP`'s action-space bound rows
      and (post-Jacobian) by :class:`~concerto.safety.emergency.JacobianEmergencyController`'s
      per-joint clip.
    - ``cartesian_accel_capacity`` — L2 magnitude cap on the Cartesian
      acceleration. The semantic source of the per-agent ``alpha`` in
      the Wang-Ames-Egerstedt 2017 §III barrier value (``alpha_pair =
      2 * cartesian_accel_capacity`` for symmetric agents) and of the
      :class:`~concerto.safety.emergency.CartesianAccelEmergencyController`'s
      saturation step.

    For a ``d``-dimensional action the two are related but not equal:
    an action drawn from the L-infinity envelope has worst-case L2
    magnitude ``sqrt(d) * action_linf_component``. The conservative
    operator pattern is ``cartesian_accel_capacity =
    action_linf_component`` (matches the pre-split semantics under the
    safe-operator workaround the 2026-05-16 docstring named); a less
    conservative deployment sets ``cartesian_accel_capacity =
    sqrt(d) * action_linf_component`` to use the full envelope. Either
    pin is the operator's call; the safety stack treats the two fields
    as independent inputs.

    Attributes:
        action_linf_component: Per-component L-infinity bound enforced
            by the CBF-QP outer filter
            (:class:`concerto.safety.cbf_qp.ExpCBFQP`) and as the
            post-Jacobian per-joint clip in
            :class:`~concerto.safety.emergency.JacobianEmergencyController`.
        cartesian_accel_capacity: L2 magnitude cap on Cartesian
            acceleration. Source of the per-agent ``alpha`` consumed
            by the conformal-barrier ``alpha_pair`` and the saturation
            step in
            :class:`~concerto.safety.emergency.CartesianAccelEmergencyController`.
        action_rate: Maximum :math:`\lVert u_k - u_{k-1} \rVert_2` per
            agent, per task. Bounds the actuator-rate slack the conformal
            CBF must remain feasible under (ADR-006 §Decision).
        comm_latency_ms: Default mean comm latency, in milliseconds. Set
            from the URLLC profile selected for the run (ADR-006
            §Consequences; ``chamber.comm.URLLC_3GPP_R17``).
        force_limit: Global force threshold for the per-pair contact-force
            constraint (ADR-004 Open Question #5; per-vendor handler slots
            in once Stage-3 SA resolves the decomposition).
    """

    action_linf_component: float
    cartesian_accel_capacity: float
    action_rate: float
    comm_latency_ms: float
    force_limit: float

CartesianAccelEmergencyController

Cartesian-acceleration emergency override (ADR-004 risk-mitigation #1).

Default :class:EmergencyController for double-integrator agents whose control dimension equals the Cartesian dimension (2D and 3D point masses; the toy crossing example in Wang-Ames-Egerstedt 2017 §V). The aggregation rule is:

  1. Sum the per-pair Cartesian repulsion unit vectors.
  2. Saturate the result to bounds.cartesian_accel_capacity in the net push-apart direction: when the sum has non-trivial magnitude, rescale it to bounds.cartesian_accel_capacity (max emergency push); when it cancels out, return zero.
  3. Return the saturated vector — its shape matches agent_state.position.shape by construction.

Single-pair case (one dangerous partner): the sum is a single unit vector with norm 1; the output magnitude is bounds.cartesian_accel_capacity, matching the original toy-crossing contract. Two opposing pairs (uid squeezed in the middle): the sum cancels to ~zero and the output is a zero action — there is no preferred direction to push in, and a spurious push would be worse than a no-op. Two non-opposing pairs: the sum points in the net escape direction and the output is bounds.cartesian_accel_capacity along that direction.

Not correct for 7-DOF manipulators: see :class:JacobianEmergencyController and the ADR-007 Stage-1 AS spike scope.

Source code in src/concerto/safety/emergency.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
class CartesianAccelEmergencyController:
    """Cartesian-acceleration emergency override (ADR-004 risk-mitigation #1).

    Default :class:`EmergencyController` for double-integrator agents
    whose control dimension equals the Cartesian dimension (2D and 3D
    point masses; the toy crossing example in Wang-Ames-Egerstedt 2017
    §V). The aggregation rule is:

    1. Sum the per-pair Cartesian repulsion unit vectors.
    2. Saturate the result to ``bounds.cartesian_accel_capacity`` in
       the *net* push-apart direction: when the sum has non-trivial
       magnitude, rescale it to ``bounds.cartesian_accel_capacity``
       (max emergency push); when it cancels out, return zero.
    3. Return the saturated vector — its shape matches
       ``agent_state.position.shape`` by construction.

    Single-pair case (one dangerous partner): the sum is a single unit
    vector with norm 1; the output magnitude is
    ``bounds.cartesian_accel_capacity``, matching the original
    toy-crossing contract. Two opposing pairs (uid squeezed in the
    middle): the sum cancels to ~zero and the output is a zero action
    — there is no preferred direction to push in, and a spurious push
    would be worse than a no-op. Two non-opposing pairs: the sum
    points in the net escape direction and the output is
    ``bounds.cartesian_accel_capacity`` along that direction.

    Not correct for 7-DOF manipulators: see
    :class:`JacobianEmergencyController` and the ADR-007 Stage-1 AS
    spike scope.
    """

    def compute_override(
        self,
        agent_state: AgentSnapshot,
        pairwise_repulsion_vectors: list[FloatArray],
        bounds: Bounds,
    ) -> FloatArray:
        """Sum-and-saturate Cartesian repulsion vectors (ADR-004 risk-mitigation #1).

        Args:
            agent_state: This uid's snapshot; only its ``position.shape``
                is used (the override's output shape).
            pairwise_repulsion_vectors: At least one Cartesian unit
                vector pointing away from each dangerous partner.
            bounds: Per-task :class:`Bounds`; the saturation magnitude
                is ``bounds.cartesian_accel_capacity`` (the L2 magnitude
                cap on Cartesian acceleration). Post-P1.02 the field
                split closes the prior L-infinity / L2 ambiguity that
                lived on the unified ``action_norm`` field — the
                per-component L-infinity envelope now lives on
                ``bounds.action_linf_component`` and is consumed only
                by the CBF-QP outer filter and (post-Jacobian) by
                :class:`JacobianEmergencyController`.

        Returns:
            Saturated Cartesian acceleration along the net push-apart
            direction with magnitude ``bounds.cartesian_accel_capacity``;
            or zero when the pairwise vectors cancel out. Shape matches
            ``agent_state.position.shape``, dtype ``float64``.
        """
        if not pairwise_repulsion_vectors:
            return np.zeros_like(agent_state.position, dtype=np.float64)

        aggregate = np.sum(
            np.stack(pairwise_repulsion_vectors, axis=0).astype(np.float64, copy=False),
            axis=0,
        )
        norm = float(np.linalg.norm(aggregate))
        if norm < _AGGREGATE_NORM_FLOOR:
            return np.zeros_like(agent_state.position, dtype=np.float64)
        return ((aggregate / norm) * bounds.cartesian_accel_capacity).astype(np.float64, copy=False)

compute_override

compute_override(
    agent_state: AgentSnapshot,
    pairwise_repulsion_vectors: list[FloatArray],
    bounds: Bounds,
) -> FloatArray

Sum-and-saturate Cartesian repulsion vectors (ADR-004 risk-mitigation #1).

Parameters:

  • agent_state (AgentSnapshot) –

    This uid's snapshot; only its position.shape is used (the override's output shape).

  • pairwise_repulsion_vectors (list[FloatArray]) –

    At least one Cartesian unit vector pointing away from each dangerous partner.

  • bounds (Bounds) –

    Per-task :class:Bounds; the saturation magnitude is bounds.cartesian_accel_capacity (the L2 magnitude cap on Cartesian acceleration). Post-P1.02 the field split closes the prior L-infinity / L2 ambiguity that lived on the unified action_norm field — the per-component L-infinity envelope now lives on bounds.action_linf_component and is consumed only by the CBF-QP outer filter and (post-Jacobian) by :class:JacobianEmergencyController.

Returns:

  • FloatArray

    Saturated Cartesian acceleration along the net push-apart

  • FloatArray

    direction with magnitude bounds.cartesian_accel_capacity;

  • FloatArray

    or zero when the pairwise vectors cancel out. Shape matches

  • FloatArray

    agent_state.position.shape, dtype float64.

Source code in src/concerto/safety/emergency.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def compute_override(
    self,
    agent_state: AgentSnapshot,
    pairwise_repulsion_vectors: list[FloatArray],
    bounds: Bounds,
) -> FloatArray:
    """Sum-and-saturate Cartesian repulsion vectors (ADR-004 risk-mitigation #1).

    Args:
        agent_state: This uid's snapshot; only its ``position.shape``
            is used (the override's output shape).
        pairwise_repulsion_vectors: At least one Cartesian unit
            vector pointing away from each dangerous partner.
        bounds: Per-task :class:`Bounds`; the saturation magnitude
            is ``bounds.cartesian_accel_capacity`` (the L2 magnitude
            cap on Cartesian acceleration). Post-P1.02 the field
            split closes the prior L-infinity / L2 ambiguity that
            lived on the unified ``action_norm`` field — the
            per-component L-infinity envelope now lives on
            ``bounds.action_linf_component`` and is consumed only
            by the CBF-QP outer filter and (post-Jacobian) by
            :class:`JacobianEmergencyController`.

    Returns:
        Saturated Cartesian acceleration along the net push-apart
        direction with magnitude ``bounds.cartesian_accel_capacity``;
        or zero when the pairwise vectors cancel out. Shape matches
        ``agent_state.position.shape``, dtype ``float64``.
    """
    if not pairwise_repulsion_vectors:
        return np.zeros_like(agent_state.position, dtype=np.float64)

    aggregate = np.sum(
        np.stack(pairwise_repulsion_vectors, axis=0).astype(np.float64, copy=False),
        axis=0,
    )
    norm = float(np.linalg.norm(aggregate))
    if norm < _AGGREGATE_NORM_FLOOR:
        return np.zeros_like(agent_state.position, dtype=np.float64)
    return ((aggregate / norm) * bounds.cartesian_accel_capacity).astype(np.float64, copy=False)

ConcertoSafetyInfeasible

Bases: RuntimeError

Raised when the safety QP is infeasible (ADR-004 §Decision; ADR-006 §Risks).

The conformal CBF-QP can become infeasible when partner-induced constraints conflict with within-robot OSCBF constraints under aggressive actuator limits — exactly the open problem named in Garg et al. 2024 §7.3 and tracked as ADR-006 R5. Callers MUST route to the braking fallback (ADR-004 risk-mitigation #1) on this error; the conformal QP is not a valid recovery path because Theorem 3 (Huriot & Sibai 2025) only bounds the average loss.

Source code in src/concerto/safety/errors.py
25
26
27
28
29
30
31
32
33
34
35
class ConcertoSafetyInfeasible(RuntimeError):  # noqa: N818
    """Raised when the safety QP is infeasible (ADR-004 §Decision; ADR-006 §Risks).

    The conformal CBF-QP can become infeasible when partner-induced
    constraints conflict with within-robot OSCBF constraints under
    aggressive actuator limits — exactly the open problem named in
    Garg et al. 2024 §7.3 and tracked as ADR-006 R5. Callers MUST route
    to the braking fallback (ADR-004 risk-mitigation #1) on this error;
    the conformal QP is not a valid recovery path because Theorem 3
    (Huriot & Sibai 2025) only bounds the *average* loss.
    """

DoubleIntegratorControlModel dataclass

Identity action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A §Reduction).

Default :class:AgentControlModel for double-integrator agents whose action space coincides with Cartesian acceleration (the Wang-Ames-Egerstedt 2017 §V toy crossing and every Phase-0 homogeneous test). The Jacobian J_i = d cartesian_accel / d action is the identity, so the CBF row coefficients projected through this model are element-wise identical to the rows the pre-refactor _pair_constraint_row emitted. This is the migration shim: tests that construct homogeneous 2-D pairs pass a :class:DoubleIntegratorControlModel per uid and run unchanged through :class:ExpCBFQP in :attr:SafetyMode.CENTRALIZED mode.

Attributes:

  • uid (str) –

    Unique identifier of the agent.

  • action_dim (int) –

    Dimensionality of the agent's action vector (equal to position_dim by construction).

Source code in src/concerto/safety/api.py
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
@dataclass(frozen=True)
class DoubleIntegratorControlModel:
    """Identity action ↔ Cartesian-acceleration map (ADR-004 §Decision; spike_004A §Reduction).

    Default :class:`AgentControlModel` for double-integrator agents
    whose action space coincides with Cartesian acceleration (the
    Wang-Ames-Egerstedt 2017 §V toy crossing and every Phase-0
    homogeneous test). The Jacobian ``J_i = d cartesian_accel /
    d action`` is the identity, so the CBF row coefficients projected
    through this model are element-wise identical to the rows the
    pre-refactor ``_pair_constraint_row`` emitted. This is the
    migration shim: tests that construct homogeneous 2-D pairs pass
    a :class:`DoubleIntegratorControlModel` per uid and run unchanged
    through :class:`ExpCBFQP` in :attr:`SafetyMode.CENTRALIZED` mode.

    Attributes:
        uid: Unique identifier of the agent.
        action_dim: Dimensionality of the agent's action vector
            (equal to ``position_dim`` by construction).
    """

    uid: str
    action_dim: int

    @property
    def position_dim(self) -> int:
        """Cartesian dimension of the safety body (ADR-004 §Decision).

        For a double integrator the action and position spaces have
        the same dimension by definition.
        """
        return self.action_dim

    def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
        """Return ``action`` unchanged (ADR-004 §Decision; spike_004A §Reduction).

        The double-integrator identity map.
        """
        del state
        if action.shape != (self.action_dim,):
            msg = (
                f"action shape {action.shape} mismatches "
                f"DoubleIntegratorControlModel(uid={self.uid!r}).action_dim={self.action_dim}"
            )
            raise ValueError(msg)
        return action.astype(np.float64, copy=False)

    def cartesian_accel_to_action(
        self, state: AgentSnapshot, cartesian_accel: FloatArray
    ) -> FloatArray:
        """Return ``cartesian_accel`` unchanged (ADR-004 §Decision; spike_004A §Reduction).

        The double-integrator identity right-inverse.
        """
        del state
        if cartesian_accel.shape != (self.action_dim,):
            msg = (
                f"cartesian_accel shape {cartesian_accel.shape} mismatches "
                f"DoubleIntegratorControlModel(uid={self.uid!r}).position_dim={self.action_dim}"
            )
            raise ValueError(msg)
        return cartesian_accel.astype(np.float64, copy=False)

    def max_cartesian_accel(self, bounds: Bounds) -> float:
        """Return ``bounds.cartesian_accel_capacity`` (ADR-004 §Decision; spike_004A).

        For a double integrator the L2 magnitude cap *is* the Cartesian
        acceleration capacity by definition of the embodiment. The
        Wang-Ames-Egerstedt 2017 §III barrier formula's ``alpha`` is a
        Cartesian acceleration capacity (it appears inside
        ``sqrt(2 * alpha * ...)``); pre-split this read
        ``bounds.action_norm`` under the operator pattern
        ``action_norm = capacity / sqrt(d)``, but the post-split field
        names this semantic explicitly (ADR-004 §Revision history
        2026-05-17 + ADR-INDEX footnote (a)).
        """
        return float(bounds.cartesian_accel_capacity)

position_dim property

position_dim: int

Cartesian dimension of the safety body (ADR-004 §Decision).

For a double integrator the action and position spaces have the same dimension by definition.

action_to_cartesian_accel

action_to_cartesian_accel(
    state: AgentSnapshot, action: FloatArray
) -> FloatArray

Return action unchanged (ADR-004 §Decision; spike_004A §Reduction).

The double-integrator identity map.

Source code in src/concerto/safety/api.py
661
662
663
664
665
666
667
668
669
670
671
672
673
def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
    """Return ``action`` unchanged (ADR-004 §Decision; spike_004A §Reduction).

    The double-integrator identity map.
    """
    del state
    if action.shape != (self.action_dim,):
        msg = (
            f"action shape {action.shape} mismatches "
            f"DoubleIntegratorControlModel(uid={self.uid!r}).action_dim={self.action_dim}"
        )
        raise ValueError(msg)
    return action.astype(np.float64, copy=False)

cartesian_accel_to_action

cartesian_accel_to_action(
    state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray

Return cartesian_accel unchanged (ADR-004 §Decision; spike_004A §Reduction).

The double-integrator identity right-inverse.

Source code in src/concerto/safety/api.py
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
def cartesian_accel_to_action(
    self, state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray:
    """Return ``cartesian_accel`` unchanged (ADR-004 §Decision; spike_004A §Reduction).

    The double-integrator identity right-inverse.
    """
    del state
    if cartesian_accel.shape != (self.action_dim,):
        msg = (
            f"cartesian_accel shape {cartesian_accel.shape} mismatches "
            f"DoubleIntegratorControlModel(uid={self.uid!r}).position_dim={self.action_dim}"
        )
        raise ValueError(msg)
    return cartesian_accel.astype(np.float64, copy=False)

max_cartesian_accel

max_cartesian_accel(bounds: Bounds) -> float

Return bounds.cartesian_accel_capacity (ADR-004 §Decision; spike_004A).

For a double integrator the L2 magnitude cap is the Cartesian acceleration capacity by definition of the embodiment. The Wang-Ames-Egerstedt 2017 §III barrier formula's alpha is a Cartesian acceleration capacity (it appears inside sqrt(2 * alpha * ...)); pre-split this read bounds.action_norm under the operator pattern action_norm = capacity / sqrt(d), but the post-split field names this semantic explicitly (ADR-004 §Revision history 2026-05-17 + ADR-INDEX footnote (a)).

Source code in src/concerto/safety/api.py
691
692
693
694
695
696
697
698
699
700
701
702
703
704
def max_cartesian_accel(self, bounds: Bounds) -> float:
    """Return ``bounds.cartesian_accel_capacity`` (ADR-004 §Decision; spike_004A).

    For a double integrator the L2 magnitude cap *is* the Cartesian
    acceleration capacity by definition of the embodiment. The
    Wang-Ames-Egerstedt 2017 §III barrier formula's ``alpha`` is a
    Cartesian acceleration capacity (it appears inside
    ``sqrt(2 * alpha * ...)``); pre-split this read
    ``bounds.action_norm`` under the operator pattern
    ``action_norm = capacity / sqrt(d)``, but the post-split field
    names this semantic explicitly (ADR-004 §Revision history
    2026-05-17 + ADR-INDEX footnote (a)).
    """
    return float(bounds.cartesian_accel_capacity)

EgoOnlySafetyFilter

Bases: Protocol

Contract for an :attr:SafetyMode.EGO_ONLY safety-filter pipeline (ADR-004 §Public API).

The deployment / ad-hoc / black-box-partner mode (ADR-006 §Decision; reviewer P0-3). The QP decides only the ego agent's action; the partner's motion enters as a predicted disturbance on the constraint right-hand side. The structural difference from :class:JointSafetyFilter is wire-visible: proposed_action is a single :class:FloatArray (the ego's nominal control), and the return value's first element is likewise an array — never a dict.

Implementations compose the three layers from ADR-004 — exp CBF-QP backbone (Wang-Ames-Egerstedt 2017), conformal slack overlay (Huriot & Sibai 2025), OSCBF inner filter (Morton & Pavone 2025) — plus the hard braking fallback (ADR-004 risk-mitigation #1) into a single propose-check-replan boundary that wraps any nominal controller without accessing its internals (the property the black-box-partner setting requires per ADR-006 §Decision).

The canonical implementation is :meth:concerto.safety.cbf_qp.ExpCBFQP.ego_only; third-party plugin filters that target ego-only deployment SHOULD subclass / structurally-conform to this Protocol so static checkers can catch shape mismatches at construction time (spike_004A §Three-mode taxonomy; reviewer P0-2).

Source code in src/concerto/safety/api.py
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
@runtime_checkable
class EgoOnlySafetyFilter(Protocol):
    """Contract for an :attr:`SafetyMode.EGO_ONLY` safety-filter pipeline (ADR-004 §Public API).

    The deployment / ad-hoc / black-box-partner mode (ADR-006 §Decision;
    reviewer P0-3). The QP decides only the ego agent's action; the
    partner's motion enters as a predicted disturbance on the
    constraint right-hand side. The structural difference from
    :class:`JointSafetyFilter` is wire-visible: ``proposed_action`` is a
    single :class:`FloatArray` (the ego's nominal control), and the
    return value's first element is likewise an array — never a dict.

    Implementations compose the three layers from ADR-004 — exp CBF-QP
    backbone (Wang-Ames-Egerstedt 2017), conformal slack overlay
    (Huriot & Sibai 2025), OSCBF inner filter (Morton & Pavone 2025) —
    plus the hard braking fallback (ADR-004 risk-mitigation #1) into a
    single propose-check-replan boundary that wraps any nominal
    controller without accessing its internals (the property the
    black-box-partner setting requires per ADR-006 §Decision).

    The canonical implementation is
    :meth:`concerto.safety.cbf_qp.ExpCBFQP.ego_only`; third-party
    plugin filters that target ego-only deployment SHOULD subclass /
    structurally-conform to this Protocol so static checkers can
    catch shape mismatches at construction time (spike_004A
    §Three-mode taxonomy; reviewer P0-2).
    """

    def reset(self, *, seed: int | None = None) -> None:
        """Reset filter state at episode start (ADR-004 §Decision).

        Args:
            seed: Optional root seed for the filter's deterministic RNG
                substream (P6). Implementations route this through
                ``concerto.training.seeding.derive_substream`` so two
                CPU runs with the same seed produce byte-identical
                outputs.
        """
        ...

    def filter(
        self,
        proposed_action: FloatArray,
        obs: dict[str, object],
        state: SafetyState,
        bounds: Bounds,
        *,
        ego_uid: str,
        partner_predicted_states: dict[str, AgentSnapshot],
        dt: float,
    ) -> tuple[FloatArray, FilterInfo]:
        """Project the nominal ego action onto the safe set (ADR-004 §Public API).

        Args:
            proposed_action: Ego nominal control input of shape
                ``(control_models[ego_uid].action_dim,)``. The QP
                minimises ``||u - u_hat||^2`` with this as the reference
                (Wang-Ames-Egerstedt 2017 eq. 9).
            obs: Observation dict; the filter reads ``obs["comm"]`` (M2
                contract — see ``chamber.comm.api.CommPacket``) and
                ``obs["meta"]["partner_id"]`` (M4 contract; ``None`` ⇒
                single-partner mode, no hard fail; identity change ⇒
                reset ``lambda`` to ``lambda_safe`` and enter warmup
                per ADR-004 risk-mitigation #2).
            state: Mutable :class:`SafetyState`; updated in place by the
                conformal layer each control step.
            bounds: Per-task :class:`Bounds` envelope (ADR-006 §Decision).
            ego_uid: Identifies which uid in ``obs["agent_states"]`` is
                the ego.
            partner_predicted_states: Per-partner-uid predicted
                :class:`concerto.safety.cbf_qp.AgentSnapshot` at the
                *next* control step, used to evaluate the partner's
                predicted Cartesian acceleration; this enters the
                constraint RHS as a known drift term.
            dt: Predictor lookahead horizon in seconds (typically the
                env's control-step duration). Used to convert the
                predicted velocity delta into a Cartesian acceleration
                ``(v_pred - v_now) / dt`` on the constraint RHS
                (ADR-004 §Decisions, "Predicted-acceleration units";
                external-review P0-1, 2026-05-16). Must be strictly
                positive and consistent with the value passed to the
                partner-trajectory predictor (e.g.
                :func:`concerto.safety.conformal.constant_velocity_predict`).

        Returns:
            A pair ``(safe_action, info)`` where ``safe_action`` is the
            QP-projected ego action and ``info`` is a
            :class:`FilterInfo` telemetry payload feeding the ADR-014
            three-table renderer.

        Raises:
            ConcertoSafetyInfeasible: When the QP is infeasible even after
                slack relaxation (ADR-004 §Decision; ADR-006 §Risks). The
                caller MUST route to the braking fallback in this case.
        """
        ...

filter

filter(
    proposed_action: FloatArray,
    obs: dict[str, object],
    state: SafetyState,
    bounds: Bounds,
    *,
    ego_uid: str,
    partner_predicted_states: dict[str, AgentSnapshot],
    dt: float,
) -> tuple[FloatArray, FilterInfo]

Project the nominal ego action onto the safe set (ADR-004 §Public API).

Parameters:

  • proposed_action (FloatArray) –

    Ego nominal control input of shape (control_models[ego_uid].action_dim,). The QP minimises ||u - u_hat||^2 with this as the reference (Wang-Ames-Egerstedt 2017 eq. 9).

  • obs (dict[str, object]) –

    Observation dict; the filter reads obs["comm"] (M2 contract — see chamber.comm.api.CommPacket) and obs["meta"]["partner_id"] (M4 contract; None ⇒ single-partner mode, no hard fail; identity change ⇒ reset lambda to lambda_safe and enter warmup per ADR-004 risk-mitigation #2).

  • state (SafetyState) –

    Mutable :class:SafetyState; updated in place by the conformal layer each control step.

  • bounds (Bounds) –

    Per-task :class:Bounds envelope (ADR-006 §Decision).

  • ego_uid (str) –

    Identifies which uid in obs["agent_states"] is the ego.

  • partner_predicted_states (dict[str, AgentSnapshot]) –

    Per-partner-uid predicted :class:concerto.safety.cbf_qp.AgentSnapshot at the next control step, used to evaluate the partner's predicted Cartesian acceleration; this enters the constraint RHS as a known drift term.

  • dt (float) –

    Predictor lookahead horizon in seconds (typically the env's control-step duration). Used to convert the predicted velocity delta into a Cartesian acceleration (v_pred - v_now) / dt on the constraint RHS (ADR-004 §Decisions, "Predicted-acceleration units"; external-review P0-1, 2026-05-16). Must be strictly positive and consistent with the value passed to the partner-trajectory predictor (e.g. :func:concerto.safety.conformal.constant_velocity_predict).

Returns:

  • FloatArray

    A pair (safe_action, info) where safe_action is the

  • FilterInfo

    QP-projected ego action and info is a

  • tuple[FloatArray, FilterInfo]

    class:FilterInfo telemetry payload feeding the ADR-014

  • tuple[FloatArray, FilterInfo]

    three-table renderer.

Raises:

  • ConcertoSafetyInfeasible

    When the QP is infeasible even after slack relaxation (ADR-004 §Decision; ADR-006 §Risks). The caller MUST route to the braking fallback in this case.

Source code in src/concerto/safety/api.py
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
def filter(
    self,
    proposed_action: FloatArray,
    obs: dict[str, object],
    state: SafetyState,
    bounds: Bounds,
    *,
    ego_uid: str,
    partner_predicted_states: dict[str, AgentSnapshot],
    dt: float,
) -> tuple[FloatArray, FilterInfo]:
    """Project the nominal ego action onto the safe set (ADR-004 §Public API).

    Args:
        proposed_action: Ego nominal control input of shape
            ``(control_models[ego_uid].action_dim,)``. The QP
            minimises ``||u - u_hat||^2`` with this as the reference
            (Wang-Ames-Egerstedt 2017 eq. 9).
        obs: Observation dict; the filter reads ``obs["comm"]`` (M2
            contract — see ``chamber.comm.api.CommPacket``) and
            ``obs["meta"]["partner_id"]`` (M4 contract; ``None`` ⇒
            single-partner mode, no hard fail; identity change ⇒
            reset ``lambda`` to ``lambda_safe`` and enter warmup
            per ADR-004 risk-mitigation #2).
        state: Mutable :class:`SafetyState`; updated in place by the
            conformal layer each control step.
        bounds: Per-task :class:`Bounds` envelope (ADR-006 §Decision).
        ego_uid: Identifies which uid in ``obs["agent_states"]`` is
            the ego.
        partner_predicted_states: Per-partner-uid predicted
            :class:`concerto.safety.cbf_qp.AgentSnapshot` at the
            *next* control step, used to evaluate the partner's
            predicted Cartesian acceleration; this enters the
            constraint RHS as a known drift term.
        dt: Predictor lookahead horizon in seconds (typically the
            env's control-step duration). Used to convert the
            predicted velocity delta into a Cartesian acceleration
            ``(v_pred - v_now) / dt`` on the constraint RHS
            (ADR-004 §Decisions, "Predicted-acceleration units";
            external-review P0-1, 2026-05-16). Must be strictly
            positive and consistent with the value passed to the
            partner-trajectory predictor (e.g.
            :func:`concerto.safety.conformal.constant_velocity_predict`).

    Returns:
        A pair ``(safe_action, info)`` where ``safe_action`` is the
        QP-projected ego action and ``info`` is a
        :class:`FilterInfo` telemetry payload feeding the ADR-014
        three-table renderer.

    Raises:
        ConcertoSafetyInfeasible: When the QP is infeasible even after
            slack relaxation (ADR-004 §Decision; ADR-006 §Risks). The
            caller MUST route to the braking fallback in this case.
    """
    ...

reset

reset(*, seed: int | None = None) -> None

Reset filter state at episode start (ADR-004 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional root seed for the filter's deterministic RNG substream (P6). Implementations route this through concerto.training.seeding.derive_substream so two CPU runs with the same seed produce byte-identical outputs.

Source code in src/concerto/safety/api.py
855
856
857
858
859
860
861
862
863
864
865
def reset(self, *, seed: int | None = None) -> None:
    """Reset filter state at episode start (ADR-004 §Decision).

    Args:
        seed: Optional root seed for the filter's deterministic RNG
            substream (P6). Implementations route this through
            ``concerto.training.seeding.derive_substream`` so two
            CPU runs with the same seed produce byte-identical
            outputs.
    """
    ...

EmergencyController

Bases: Protocol

Per-uid emergency-override hook (ADR-004 risk-mitigation #1).

Translates the aggregate per-uid repulsion (summed across every dangerous pair the uid is involved in) into a control-space override action. The braking fallback consults one controller per uid so heterogeneous embodiments (double-integrator base, 7-DOF arm, manipulator end-effector) can supply their own Cartesian-to-control mapping without :func:concerto.safety.braking.maybe_brake needing to know the embodiment class.

Implementations MUST be deterministic on identical inputs (P6; seeding contract) and MUST return an array whose shape matches the control-action shape the env expects for this uid — not necessarily the Cartesian shape. A 7-DOF arm controller returns a 7-vector; a 2D double-integrator controller returns a 2-vector.

Source code in src/concerto/safety/emergency.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
@runtime_checkable
class EmergencyController(Protocol):
    """Per-uid emergency-override hook (ADR-004 risk-mitigation #1).

    Translates the aggregate per-uid repulsion (summed across every
    dangerous pair the uid is involved in) into a control-space override
    action. The braking fallback consults one controller per uid so
    heterogeneous embodiments (double-integrator base, 7-DOF arm,
    manipulator end-effector) can supply their own Cartesian-to-control
    mapping without :func:`concerto.safety.braking.maybe_brake` needing
    to know the embodiment class.

    Implementations MUST be deterministic on identical inputs (P6;
    seeding contract) and MUST return an array whose shape matches the
    control-action shape the env expects for this uid — *not*
    necessarily the Cartesian shape. A 7-DOF arm controller returns a
    7-vector; a 2D double-integrator controller returns a 2-vector.
    """

    def compute_override(
        self,
        agent_state: AgentSnapshot,
        pairwise_repulsion_vectors: list[FloatArray],
        bounds: Bounds,
    ) -> FloatArray:
        """Return the embodiment-specific emergency override (ADR-004 risk-mitigation #1).

        Args:
            agent_state: This uid's :class:`AgentSnapshot` at the current
                control step. Implementations may read position and
                velocity to build a Jacobian or any embodiment-specific
                mapping. The Cartesian default ignores it (the repulsion
                vectors already carry the direction information).
            pairwise_repulsion_vectors: List of unit-magnitude Cartesian
                repulsion vectors, one per dangerous pair this uid is
                involved in (sign convention: each vector points *away*
                from the partner in that pair). The list has at least
                one element — uids with zero dangerous pairs are not
                routed through this hook.
            bounds: Per-task :class:`Bounds`; how each field caps the
                override is up to the implementation. The Cartesian
                controller uses ``bounds.cartesian_accel_capacity`` to
                L2-cap the saturated direction; the Jacobian controller
                additionally uses ``bounds.action_linf_component`` for
                a post-transform per-joint clip.

        Returns:
            Override action in this uid's control space, ``float64``.
        """
        ...

compute_override

compute_override(
    agent_state: AgentSnapshot,
    pairwise_repulsion_vectors: list[FloatArray],
    bounds: Bounds,
) -> FloatArray

Return the embodiment-specific emergency override (ADR-004 risk-mitigation #1).

Parameters:

  • agent_state (AgentSnapshot) –

    This uid's :class:AgentSnapshot at the current control step. Implementations may read position and velocity to build a Jacobian or any embodiment-specific mapping. The Cartesian default ignores it (the repulsion vectors already carry the direction information).

  • pairwise_repulsion_vectors (list[FloatArray]) –

    List of unit-magnitude Cartesian repulsion vectors, one per dangerous pair this uid is involved in (sign convention: each vector points away from the partner in that pair). The list has at least one element — uids with zero dangerous pairs are not routed through this hook.

  • bounds (Bounds) –

    Per-task :class:Bounds; how each field caps the override is up to the implementation. The Cartesian controller uses bounds.cartesian_accel_capacity to L2-cap the saturated direction; the Jacobian controller additionally uses bounds.action_linf_component for a post-transform per-joint clip.

Returns:

  • FloatArray

    Override action in this uid's control space, float64.

Source code in src/concerto/safety/emergency.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
def compute_override(
    self,
    agent_state: AgentSnapshot,
    pairwise_repulsion_vectors: list[FloatArray],
    bounds: Bounds,
) -> FloatArray:
    """Return the embodiment-specific emergency override (ADR-004 risk-mitigation #1).

    Args:
        agent_state: This uid's :class:`AgentSnapshot` at the current
            control step. Implementations may read position and
            velocity to build a Jacobian or any embodiment-specific
            mapping. The Cartesian default ignores it (the repulsion
            vectors already carry the direction information).
        pairwise_repulsion_vectors: List of unit-magnitude Cartesian
            repulsion vectors, one per dangerous pair this uid is
            involved in (sign convention: each vector points *away*
            from the partner in that pair). The list has at least
            one element — uids with zero dangerous pairs are not
            routed through this hook.
        bounds: Per-task :class:`Bounds`; how each field caps the
            override is up to the implementation. The Cartesian
            controller uses ``bounds.cartesian_accel_capacity`` to
            L2-cap the saturated direction; the Jacobian controller
            additionally uses ``bounds.action_linf_component`` for
            a post-transform per-joint clip.

    Returns:
        Override action in this uid's control space, ``float64``.
    """
    ...

JacobianControlModel dataclass

Jacobian-aware action ↔ Cartesian map skeleton (ADR-004 §Decision; ADR-007 Stage 1 AS).

The 7-DOF arm case for the AS axis. Takes a Jacobian callable jacobian_fn(state) -> J (Cartesian-to-action shape (position_dim, action_dim)) and a damped-least-squares pseudo-inverse for the right-inverse (Nakamura & Hanafusa 1986; the singularity-robust variant the OSCBF inner filter already relies on per Morton & Pavone 2025 §IV).

Stage-1 AS spike will exercise this on the 7-DOF arm; the skeleton raises :class:NotImplementedError on :meth:action_to_cartesian_accel unless a Jacobian callable is supplied via jacobian_fn. The same loud-fail discipline ADR-004 §Risks established for :class:concerto.safety.emergency.JacobianEmergencyController applies here: 7-DOF uids must not silently receive a Cartesian- shaped action by accident.

Attributes:

  • uid (str) –

    Unique identifier of the agent.

  • action_dim (int) –

    Dimensionality of the action vector (e.g. 7 for a Franka arm).

  • position_dim (int) –

    Cartesian dimension of the safety body (typically 3 for manipulation).

  • jacobian_fn (Callable[[AgentSnapshot], FloatArray] | None) –

    Optional callable mapping :class:AgentSnapshot to the Jacobian matrix of shape (position_dim, action_dim). When :data:None, both mapping methods raise :class:NotImplementedError.

  • damping (float) –

    Damped-least-squares parameter; default :data:DEFAULT_DLS_DAMPING.

  • max_cartesian_accel_value (float) –

    Pre-computed scalar capacity (kg-free acceleration units). Defaults to :data:numpy.nan so the placeholder fails loudly on attempted use; supply the embodiment-specific value at construction time.

Source code in src/concerto/safety/api.py
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
@dataclass
class JacobianControlModel:
    """Jacobian-aware action ↔ Cartesian map skeleton (ADR-004 §Decision; ADR-007 Stage 1 AS).

    The 7-DOF arm case for the AS axis. Takes a Jacobian callable
    ``jacobian_fn(state) -> J`` (Cartesian-to-action shape
    ``(position_dim, action_dim)``) and a damped-least-squares
    pseudo-inverse for the right-inverse (Nakamura & Hanafusa 1986;
    the singularity-robust variant the OSCBF inner filter already
    relies on per Morton & Pavone 2025 §IV).

    Stage-1 AS spike will exercise this on the 7-DOF arm; the
    skeleton raises :class:`NotImplementedError` on
    :meth:`action_to_cartesian_accel` unless a Jacobian callable is
    supplied via ``jacobian_fn``. The same loud-fail discipline
    ADR-004 §Risks established for
    :class:`concerto.safety.emergency.JacobianEmergencyController`
    applies here: 7-DOF uids must not silently receive a Cartesian-
    shaped action by accident.

    Attributes:
        uid: Unique identifier of the agent.
        action_dim: Dimensionality of the action vector (e.g. 7 for
            a Franka arm).
        position_dim: Cartesian dimension of the safety body
            (typically 3 for manipulation).
        jacobian_fn: Optional callable mapping
            :class:`AgentSnapshot` to the Jacobian matrix of shape
            ``(position_dim, action_dim)``. When :data:`None`, both
            mapping methods raise :class:`NotImplementedError`.
        damping: Damped-least-squares parameter; default
            :data:`DEFAULT_DLS_DAMPING`.
        max_cartesian_accel_value: Pre-computed scalar capacity
            (kg-free acceleration units). Defaults to
            :data:`numpy.nan` so the placeholder fails loudly on
            attempted use; supply the embodiment-specific value at
            construction time.
    """

    uid: str
    action_dim: int
    position_dim: int
    jacobian_fn: Callable[[AgentSnapshot], FloatArray] | None = None
    damping: float = DEFAULT_DLS_DAMPING
    max_cartesian_accel_value: float = float("nan")

    def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
        """Apply ``J(state) @ action`` (ADR-004 §Decision; ADR-007 Stage 1 AS).

        Raises:
            NotImplementedError: When ``jacobian_fn`` was not supplied
                at construction time. Stage-1 AS spike delivers the
                concrete kinematic model; until then 7-DOF uids fail
                loudly per ADR-004 §Risks.
        """
        if self.jacobian_fn is None:
            msg = (
                f"JacobianControlModel(uid={self.uid!r}) has no jacobian_fn; "
                "Stage-1 AS spike (ADR-007 §Stage 1) supplies the kinematic "
                "model. Until then 7-DOF uids must not be routed through the "
                "CBF-QP."
            )
            raise NotImplementedError(msg)
        jac = np.asarray(self.jacobian_fn(state), dtype=np.float64)
        return (jac @ action.astype(np.float64, copy=False)).astype(np.float64, copy=False)

    def cartesian_accel_to_action(
        self, state: AgentSnapshot, cartesian_accel: FloatArray
    ) -> FloatArray:
        """Apply the damped-least-squares pseudo-inverse (ADR-004 §Decision; ADR-007 Stage 1 AS).

        ``J^T (J J^T + damping^2 I)^{-1} a`` — Nakamura & Hanafusa
        1986 singularity-robust right-inverse, identical in form to
        the OSCBF inner filter's Jacobian handling.

        Raises:
            NotImplementedError: When ``jacobian_fn`` was not supplied
                at construction time.
        """
        if self.jacobian_fn is None:
            msg = (
                f"JacobianControlModel(uid={self.uid!r}) has no jacobian_fn; "
                "Stage-1 AS spike (ADR-007 §Stage 1) supplies the kinematic "
                "model. Until then 7-DOF uids must not be routed through the "
                "CBF-QP."
            )
            raise NotImplementedError(msg)
        jac = np.asarray(self.jacobian_fn(state), dtype=np.float64)
        gram = jac @ jac.T + (self.damping * self.damping) * np.eye(
            self.position_dim, dtype=np.float64
        )
        rhs = cartesian_accel.astype(np.float64, copy=False)
        return (jac.T @ np.linalg.solve(gram, rhs)).astype(np.float64, copy=False)

    def max_cartesian_accel(self, bounds: Bounds) -> float:
        """Return the pre-computed Cartesian acceleration capacity (ADR-004 §Decision).

        Raises:
            ValueError: When ``max_cartesian_accel_value`` was left at
                its placeholder :data:`numpy.nan`. Stage-1 AS spike
                supplies the embodiment-specific value.
        """
        del bounds
        if not np.isfinite(self.max_cartesian_accel_value):
            msg = (
                f"JacobianControlModel(uid={self.uid!r}).max_cartesian_accel_value "
                "was not configured. Stage-1 AS spike (ADR-007 §Stage 1) supplies "
                "the embodiment-specific value."
            )
            raise ValueError(msg)
        return float(self.max_cartesian_accel_value)

action_to_cartesian_accel

action_to_cartesian_accel(
    state: AgentSnapshot, action: FloatArray
) -> FloatArray

Apply J(state) @ action (ADR-004 §Decision; ADR-007 Stage 1 AS).

Raises:

  • NotImplementedError

    When jacobian_fn was not supplied at construction time. Stage-1 AS spike delivers the concrete kinematic model; until then 7-DOF uids fail loudly per ADR-004 §Risks.

Source code in src/concerto/safety/api.py
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
def action_to_cartesian_accel(self, state: AgentSnapshot, action: FloatArray) -> FloatArray:
    """Apply ``J(state) @ action`` (ADR-004 §Decision; ADR-007 Stage 1 AS).

    Raises:
        NotImplementedError: When ``jacobian_fn`` was not supplied
            at construction time. Stage-1 AS spike delivers the
            concrete kinematic model; until then 7-DOF uids fail
            loudly per ADR-004 §Risks.
    """
    if self.jacobian_fn is None:
        msg = (
            f"JacobianControlModel(uid={self.uid!r}) has no jacobian_fn; "
            "Stage-1 AS spike (ADR-007 §Stage 1) supplies the kinematic "
            "model. Until then 7-DOF uids must not be routed through the "
            "CBF-QP."
        )
        raise NotImplementedError(msg)
    jac = np.asarray(self.jacobian_fn(state), dtype=np.float64)
    return (jac @ action.astype(np.float64, copy=False)).astype(np.float64, copy=False)

cartesian_accel_to_action

cartesian_accel_to_action(
    state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray

Apply the damped-least-squares pseudo-inverse (ADR-004 §Decision; ADR-007 Stage 1 AS).

J^T (J J^T + damping^2 I)^{-1} a — Nakamura & Hanafusa 1986 singularity-robust right-inverse, identical in form to the OSCBF inner filter's Jacobian handling.

Raises:

  • NotImplementedError

    When jacobian_fn was not supplied at construction time.

Source code in src/concerto/safety/api.py
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
def cartesian_accel_to_action(
    self, state: AgentSnapshot, cartesian_accel: FloatArray
) -> FloatArray:
    """Apply the damped-least-squares pseudo-inverse (ADR-004 §Decision; ADR-007 Stage 1 AS).

    ``J^T (J J^T + damping^2 I)^{-1} a`` — Nakamura & Hanafusa
    1986 singularity-robust right-inverse, identical in form to
    the OSCBF inner filter's Jacobian handling.

    Raises:
        NotImplementedError: When ``jacobian_fn`` was not supplied
            at construction time.
    """
    if self.jacobian_fn is None:
        msg = (
            f"JacobianControlModel(uid={self.uid!r}) has no jacobian_fn; "
            "Stage-1 AS spike (ADR-007 §Stage 1) supplies the kinematic "
            "model. Until then 7-DOF uids must not be routed through the "
            "CBF-QP."
        )
        raise NotImplementedError(msg)
    jac = np.asarray(self.jacobian_fn(state), dtype=np.float64)
    gram = jac @ jac.T + (self.damping * self.damping) * np.eye(
        self.position_dim, dtype=np.float64
    )
    rhs = cartesian_accel.astype(np.float64, copy=False)
    return (jac.T @ np.linalg.solve(gram, rhs)).astype(np.float64, copy=False)

max_cartesian_accel

max_cartesian_accel(bounds: Bounds) -> float

Return the pre-computed Cartesian acceleration capacity (ADR-004 §Decision).

Raises:

  • ValueError

    When max_cartesian_accel_value was left at its placeholder :data:numpy.nan. Stage-1 AS spike supplies the embodiment-specific value.

Source code in src/concerto/safety/api.py
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
def max_cartesian_accel(self, bounds: Bounds) -> float:
    """Return the pre-computed Cartesian acceleration capacity (ADR-004 §Decision).

    Raises:
        ValueError: When ``max_cartesian_accel_value`` was left at
            its placeholder :data:`numpy.nan`. Stage-1 AS spike
            supplies the embodiment-specific value.
    """
    del bounds
    if not np.isfinite(self.max_cartesian_accel_value):
        msg = (
            f"JacobianControlModel(uid={self.uid!r}).max_cartesian_accel_value "
            "was not configured. Stage-1 AS spike (ADR-007 §Stage 1) supplies "
            "the embodiment-specific value."
        )
        raise ValueError(msg)
    return float(self.max_cartesian_accel_value)

JointSafetyFilter

Bases: Protocol

Joint-action filter contract for CENTRALIZED / SHARED_CONTROL (ADR-004 §Public API).

The oracle / ablation baseline mode plus the lab-only co-control mode (ADR-014 three-table report upper-bound row; reviewer P0-3 mode 3). The QP decides every uid's action; per-uid slot widths come from control_models[uid].action_dim (no implicit "first uid's shape" fallback per reviewer P0-2). The structural difference from :class:EgoOnlySafetyFilter is wire-visible: proposed_action is a dict[uid, FloatArray], and the return value's first element is likewise a dict.

The canonical implementations are :meth:concerto.safety.cbf_qp.ExpCBFQP.centralized and :meth:concerto.safety.cbf_qp.ExpCBFQP.shared_control.

Source code in src/concerto/safety/api.py
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
@runtime_checkable
class JointSafetyFilter(Protocol):
    """Joint-action filter contract for CENTRALIZED / SHARED_CONTROL (ADR-004 §Public API).

    The oracle / ablation baseline mode plus the lab-only co-control
    mode (ADR-014 three-table report upper-bound row; reviewer P0-3
    mode 3). The QP decides every uid's action; per-uid slot widths
    come from ``control_models[uid].action_dim`` (no implicit "first
    uid's shape" fallback per reviewer P0-2). The structural
    difference from :class:`EgoOnlySafetyFilter` is wire-visible:
    ``proposed_action`` is a ``dict[uid, FloatArray]``, and the
    return value's first element is likewise a dict.

    The canonical implementations are
    :meth:`concerto.safety.cbf_qp.ExpCBFQP.centralized` and
    :meth:`concerto.safety.cbf_qp.ExpCBFQP.shared_control`.
    """

    def reset(self, *, seed: int | None = None) -> None:
        """Reset filter state at episode start (ADR-004 §Decision).

        See :meth:`EgoOnlySafetyFilter.reset` for the determinism
        contract (P6 seeding).
        """
        ...

    def filter(
        self,
        proposed_action: dict[str, FloatArray],
        obs: dict[str, object],
        state: SafetyState,
        bounds: Bounds,
        *,
        ego_uid: str | None = None,
        partner_action_bound: float | None = None,
    ) -> tuple[dict[str, FloatArray], FilterInfo]:
        """Project the per-uid nominal actions onto the safe set (ADR-004 §Public API).

        Args:
            proposed_action: Per-agent nominal control inputs, keyed by
                uid. The QP minimises ``||u - u_hat||^2`` with this as
                the reference (Wang-Ames-Egerstedt 2017 eq. 9).
            obs: Observation dict; same contract as
                :meth:`EgoOnlySafetyFilter.filter`.
            state: Mutable :class:`SafetyState`; updated in place by the
                conformal layer each control step.
            bounds: Per-task :class:`Bounds` envelope (ADR-006 §Decision).
            ego_uid: Required for :attr:`SafetyMode.SHARED_CONTROL`,
                ignored for :attr:`SafetyMode.CENTRALIZED`. Identifies
                which uid in ``proposed_action`` is the ego whose
                action is unconstrained by ``partner_action_bound``.
            partner_action_bound: Required for
                :attr:`SafetyMode.SHARED_CONTROL`, ignored for
                :attr:`SafetyMode.CENTRALIZED`. Strictly positive
                L-infinity ball radius around each non-ego uid's
                proposed action.

        Returns:
            A pair ``(safe_action, info)`` matching the input shape
            (one entry per uid in ``proposed_action``).

        Raises:
            ConcertoSafetyInfeasible: When the QP is infeasible even after
                slack relaxation (ADR-004 §Decision; ADR-006 §Risks).
        """
        ...

filter

filter(
    proposed_action: dict[str, FloatArray],
    obs: dict[str, object],
    state: SafetyState,
    bounds: Bounds,
    *,
    ego_uid: str | None = None,
    partner_action_bound: float | None = None,
) -> tuple[dict[str, FloatArray], FilterInfo]

Project the per-uid nominal actions onto the safe set (ADR-004 §Public API).

Parameters:

  • proposed_action (dict[str, FloatArray]) –

    Per-agent nominal control inputs, keyed by uid. The QP minimises ||u - u_hat||^2 with this as the reference (Wang-Ames-Egerstedt 2017 eq. 9).

  • obs (dict[str, object]) –

    Observation dict; same contract as :meth:EgoOnlySafetyFilter.filter.

  • state (SafetyState) –

    Mutable :class:SafetyState; updated in place by the conformal layer each control step.

  • bounds (Bounds) –

    Per-task :class:Bounds envelope (ADR-006 §Decision).

  • ego_uid (str | None, default: None ) –

    Required for :attr:SafetyMode.SHARED_CONTROL, ignored for :attr:SafetyMode.CENTRALIZED. Identifies which uid in proposed_action is the ego whose action is unconstrained by partner_action_bound.

  • partner_action_bound (float | None, default: None ) –

    Required for :attr:SafetyMode.SHARED_CONTROL, ignored for :attr:SafetyMode.CENTRALIZED. Strictly positive L-infinity ball radius around each non-ego uid's proposed action.

Returns:

  • dict[str, FloatArray]

    A pair (safe_action, info) matching the input shape

  • FilterInfo

    (one entry per uid in proposed_action).

Raises:

  • ConcertoSafetyInfeasible

    When the QP is infeasible even after slack relaxation (ADR-004 §Decision; ADR-006 §Risks).

Source code in src/concerto/safety/api.py
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
def filter(
    self,
    proposed_action: dict[str, FloatArray],
    obs: dict[str, object],
    state: SafetyState,
    bounds: Bounds,
    *,
    ego_uid: str | None = None,
    partner_action_bound: float | None = None,
) -> tuple[dict[str, FloatArray], FilterInfo]:
    """Project the per-uid nominal actions onto the safe set (ADR-004 §Public API).

    Args:
        proposed_action: Per-agent nominal control inputs, keyed by
            uid. The QP minimises ``||u - u_hat||^2`` with this as
            the reference (Wang-Ames-Egerstedt 2017 eq. 9).
        obs: Observation dict; same contract as
            :meth:`EgoOnlySafetyFilter.filter`.
        state: Mutable :class:`SafetyState`; updated in place by the
            conformal layer each control step.
        bounds: Per-task :class:`Bounds` envelope (ADR-006 §Decision).
        ego_uid: Required for :attr:`SafetyMode.SHARED_CONTROL`,
            ignored for :attr:`SafetyMode.CENTRALIZED`. Identifies
            which uid in ``proposed_action`` is the ego whose
            action is unconstrained by ``partner_action_bound``.
        partner_action_bound: Required for
            :attr:`SafetyMode.SHARED_CONTROL`, ignored for
            :attr:`SafetyMode.CENTRALIZED`. Strictly positive
            L-infinity ball radius around each non-ego uid's
            proposed action.

    Returns:
        A pair ``(safe_action, info)`` matching the input shape
        (one entry per uid in ``proposed_action``).

    Raises:
        ConcertoSafetyInfeasible: When the QP is infeasible even after
            slack relaxation (ADR-004 §Decision; ADR-006 §Risks).
    """
    ...

reset

reset(*, seed: int | None = None) -> None

Reset filter state at episode start (ADR-004 §Decision).

See :meth:EgoOnlySafetyFilter.reset for the determinism contract (P6 seeding).

Source code in src/concerto/safety/api.py
943
944
945
946
947
948
949
def reset(self, *, seed: int | None = None) -> None:
    """Reset filter state at episode start (ADR-004 §Decision).

    See :meth:`EgoOnlySafetyFilter.reset` for the determinism
    contract (P6 seeding).
    """
    ...

SafetyMode

Bases: Enum

Safety-filter operating mode (ADR-004 §Decision; spike_004A §Three-mode taxonomy).

Three modes make the filter's who is being optimised contract explicit at the constructor site, so callsites cannot accidentally request the oracle solve at deployment (reviewer P0-3):

  • :attr:EGO_ONLYdeployment default. QP decision variable is the ego agent's action only; the partner's motion enters as a predicted disturbance on the constraint RHS. This is the ad-hoc / black-box-partner setting ADR-006 §Decision pins.
  • :attr:CENTRALIZED — oracle / ablation only. QP decision variable is the concatenation of every uid's action, with per-uid slot widths set by control_models[uid].action_dim (no implicit "first uid's shape" fallback). Used as the upper-bound baseline in ADR-014's three-table report. Never used at evaluation against ad-hoc partners.
  • :attr:SHARED_CONTROL — lab-only baseline (reviewer P0-3 mode 3). Same shape as :attr:CENTRALIZED with an additional partner_action_bound parameter that restricts the partner's adjustment magnitude. Measures how much of the centralised headroom comes from co-control budget vs from partner co-operation.

Mode is a constructor argument, not a per-call flag: the conformal slack vector SafetyState.lambda_ is sized per pair, and the pair set differs between :attr:EGO_ONLY (one row per partner) and :attr:CENTRALIZED (full pairwise table). Flipping mode mid-episode would silently invalidate the conformal state. A mode change requires constructing a new filter with a fresh :class:SafetyState.

Source code in src/concerto/safety/api.py
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
class SafetyMode(Enum):
    """Safety-filter operating mode (ADR-004 §Decision; spike_004A §Three-mode taxonomy).

    Three modes make the filter's *who is being optimised* contract
    explicit at the constructor site, so callsites cannot accidentally
    request the oracle solve at deployment (reviewer P0-3):

    - :attr:`EGO_ONLY` — **deployment default.** QP decision variable is
      the ego agent's action only; the partner's motion enters as a
      predicted disturbance on the constraint RHS. This is the
      ad-hoc / black-box-partner setting ADR-006 §Decision pins.
    - :attr:`CENTRALIZED` — oracle / ablation only. QP decision variable
      is the concatenation of every uid's action, with per-uid slot
      widths set by ``control_models[uid].action_dim`` (no implicit
      "first uid's shape" fallback). Used as the upper-bound baseline
      in ADR-014's three-table report. Never used at evaluation
      against ad-hoc partners.
    - :attr:`SHARED_CONTROL` — lab-only baseline (reviewer P0-3 mode 3).
      Same shape as :attr:`CENTRALIZED` with an additional
      ``partner_action_bound`` parameter that restricts the partner's
      adjustment magnitude. Measures how much of the centralised
      headroom comes from co-control budget vs from partner
      co-operation.

    Mode is a **constructor argument**, not a per-call flag: the
    conformal slack vector ``SafetyState.lambda_`` is sized per pair,
    and the pair set differs between :attr:`EGO_ONLY` (one row per
    partner) and :attr:`CENTRALIZED` (full pairwise table). Flipping
    mode mid-episode would silently invalidate the conformal state.
    A mode change requires constructing a new filter with a fresh
    :class:`SafetyState`.
    """

    EGO_ONLY = "ego_only"
    CENTRALIZED = "centralized"
    SHARED_CONTROL = "shared_control"

SafetyState dataclass

Mutable conformal CBF state (Huriot & Sibai 2025 §IV; ADR-004 §Decision).

The conformal slack dict lambda_ carries one entry per agent pair, keyed by a lexicographically-ordered UID tuple (uid_a, uid_b) with uid_a < uid_b. Each entry is updated each control step by :func:concerto.safety.conformal.update_lambda according to Theorem 3's rule lambda_{k+1} = lambda_k + eta * (eps - l_k). ADR-004 risk-mitigation #2 motivates the partner-swap warmup window: on partner identity change (detected via obs["meta"]["partner_id"]) the state is reset to lambda_safe (the value guaranteeing QP feasibility under the worst-case bounded prediction error per ADR-006 Assumption A2) and the next warmup_steps_remaining steps run with a tighter eps.

Issue #144 / ADR-014 v3 (2026-05-19) promoted lambda_ from a canonical-pair-order-indexed :class:numpy.ndarray to a :data:LambdaDict keyed by canonical UID-pair tuples. The structural fix subsumes the implicit pair-index alignment invariant the prior Phase-0 amendment (canonical_pair_order) enforced via runtime asserts: a caller that rebuilds the snapshot dict in a different insertion order still keys into the same canonical tuple, so per-pair slack cannot silently misalign with the pairwise CBF constraint.

Attributes:

  • lambda_ (LambdaDict) –

    Per-pair slack as a :data:LambdaDict. Construct via :func:make_lambda_dict (or directly when the canonical pair set is already in hand). Mutated in place by :func:concerto.safety.conformal.update_lambda.

  • epsilon (float) –

    Target average loss. ADR-004 §Decision pins the default to -0.05 (conservative manipulation regime).

  • eta (float) –

    Conformal learning rate (default 0.01).

  • warmup_steps_remaining (int) –

    Decremented each step while in warmup; zero outside the warmup window (ADR-004 risk-mitigation #2).

Source code in src/concerto/safety/api.py
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
@dataclass
class SafetyState:
    """Mutable conformal CBF state (Huriot & Sibai 2025 §IV; ADR-004 §Decision).

    The conformal slack dict ``lambda_`` carries one entry per agent
    pair, keyed by a lexicographically-ordered UID tuple
    ``(uid_a, uid_b)`` with ``uid_a < uid_b``. Each entry is updated
    each control step by
    :func:`concerto.safety.conformal.update_lambda` according to
    Theorem 3's rule
    ``lambda_{k+1} = lambda_k + eta * (eps - l_k)``. ADR-004
    risk-mitigation #2 motivates the partner-swap warmup window: on
    partner identity change (detected via ``obs["meta"]["partner_id"]``)
    the state is reset to ``lambda_safe`` (the value guaranteeing QP
    feasibility under the worst-case bounded prediction error per ADR-006
    Assumption A2) and the next ``warmup_steps_remaining`` steps run with
    a tighter eps.

    Issue #144 / ADR-014 v3 (2026-05-19) promoted ``lambda_`` from a
    canonical-pair-order-indexed :class:`numpy.ndarray` to a
    :data:`LambdaDict` keyed by canonical UID-pair tuples. The
    structural fix subsumes the implicit pair-index alignment
    invariant the prior Phase-0 amendment (canonical_pair_order)
    enforced via runtime asserts: a caller that rebuilds the snapshot
    dict in a different insertion order still keys into the same
    canonical tuple, so per-pair slack cannot silently misalign with
    the pairwise CBF constraint.

    Attributes:
        lambda_: Per-pair slack as a :data:`LambdaDict`. Construct
            via :func:`make_lambda_dict` (or directly when the canonical
            pair set is already in hand). Mutated in place by
            :func:`concerto.safety.conformal.update_lambda`.
        epsilon: Target average loss. ADR-004 §Decision pins the default
            to ``-0.05`` (conservative manipulation regime).
        eta: Conformal learning rate (default ``0.01``).
        warmup_steps_remaining: Decremented each step while in warmup;
            zero outside the warmup window (ADR-004 risk-mitigation #2).
    """

    lambda_: LambdaDict
    epsilon: float = DEFAULT_EPSILON
    eta: float = DEFAULT_ETA
    warmup_steps_remaining: int = 0

__getattr__

__getattr__(name: str) -> object

Re-export the deprecated :data:SafetyFilter alias from the api submodule.

Forwards the lookup so from concerto.safety import SafetyFilter surfaces the same :class:DeprecationWarning as from concerto.safety.api import SafetyFilter. ADR-004 §Public API; spike_004A §Three-mode taxonomy.

Source code in src/concerto/safety/__init__.py
122
123
124
125
126
127
128
129
130
131
132
133
def __getattr__(name: str) -> object:
    """Re-export the deprecated :data:`SafetyFilter` alias from the api submodule.

    Forwards the lookup so ``from concerto.safety import SafetyFilter``
    surfaces the same :class:`DeprecationWarning` as ``from
    concerto.safety.api import SafetyFilter``. ADR-004 §Public API;
    spike_004A §Three-mode taxonomy.
    """
    if name in _DEPRECATED_NAMES:
        return getattr(_api, name)
    msg = f"module {__name__!r} has no attribute {name!r}"
    raise AttributeError(msg)

canonical_pair_key

canonical_pair_key(a: str, b: str) -> tuple[str, str]

Return the lexicographically-ordered pair key for (a, b) (ADR-004 §Decision; issue #144).

The canonical key for any agent pair is the unique 2-tuple of UIDs sorted lexicographically. Callers that already know the canonical order (e.g. a nested loop with the outer uid lex-smaller than the inner) can build the tuple directly; this helper is for sites that take an unordered (uid_a, uid_b) pair (e.g. EGO_ONLY mode's (ego_uid, partner_uid) rows, where neither side is guaranteed to be lex-smaller) and need the canonical lookup key for :data:LambdaDict indexing.

Parameters:

  • a (str) –

    First UID.

  • b (str) –

    Second UID (must differ from a).

Returns:

  • tuple[str, str]

    (a, b) if a < b lexicographically, otherwise (b, a).

Raises:

  • ValueError

    If a == b (an agent pair always has two distinct UIDs).

Source code in src/concerto/safety/api.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def canonical_pair_key(a: str, b: str) -> tuple[str, str]:
    """Return the lexicographically-ordered pair key for ``(a, b)`` (ADR-004 §Decision; issue #144).

    The canonical key for any agent pair is the unique 2-tuple of UIDs
    sorted lexicographically. Callers that already know the canonical
    order (e.g. a nested loop with the outer uid lex-smaller than the
    inner) can build the tuple directly; this helper is for sites that
    take an unordered ``(uid_a, uid_b)`` pair (e.g. EGO_ONLY mode's
    ``(ego_uid, partner_uid)`` rows, where neither side is guaranteed
    to be lex-smaller) and need the canonical lookup key for
    :data:`LambdaDict` indexing.

    Args:
        a: First UID.
        b: Second UID (must differ from ``a``).

    Returns:
        ``(a, b)`` if ``a < b`` lexicographically, otherwise ``(b, a)``.

    Raises:
        ValueError: If ``a == b`` (an agent pair always has two
            distinct UIDs).
    """
    if a == b:
        msg = f"canonical_pair_key requires distinct UIDs; got {a!r} == {b!r}"
        raise ValueError(msg)
    return (a, b) if a < b else (b, a)

canonical_pair_order

canonical_pair_order(uids: Iterable[str]) -> list[str]

Canonical lexicographic UID order for pair iteration (ADR-004 §Decision).

Public utility used by the three pair-iteration entry points (:func:concerto.safety.conformal.compute_prediction_gap_for_pairs, :meth:concerto.safety.cbf_qp.ExpCBFQP._filter_ego_only, and :meth:concerto.safety.cbf_qp.ExpCBFQP._filter_joint); third-party consumers building their own pair-iteration code over the safety stack should call this helper to stay aligned with the same canonical ordering :class:SafetyState lambda_ is indexed against. Returns a list of the UIDs sorted lexicographically; downstream code iterates upper-triangular pairs (uids[i], uids[j]) for i < j so the pair index for any (uid_a, uid_b) (with uid_a < uid_b lexicographically) is stable across callers and across dict reconstruction (external-review P1, 2026-05-16).

Post-issue-#144 the structural fix (lambda_: LambdaDict) makes the pair-keying part of the type system: lookups go via the lexicographic tuple key produced by :func:canonical_pair_key, and callers no longer rely on the canonical ordering to index a flat array. The helper is retained as a primitive because the upper-triangular pair iteration in :func:make_pair_keys and the cbf_qp call-sites still needs a deterministic uid ordering to build the canonical set of pair tuples.

Parameters:

  • uids (Iterable[str]) –

    An iterable of agent UIDs.

Returns:

  • list[str]

    A new list containing the UIDs sorted lexicographically.

Raises:

  • ValueError

    If the iterable contains duplicates (a partner-set invariant the upstream filter already enforces, surfaced here as a defensive check).

Source code in src/concerto/safety/api.py
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def canonical_pair_order(uids: Iterable[str]) -> list[str]:
    """Canonical lexicographic UID order for pair iteration (ADR-004 §Decision).

    Public utility used by the three pair-iteration entry points
    (:func:`concerto.safety.conformal.compute_prediction_gap_for_pairs`,
    :meth:`concerto.safety.cbf_qp.ExpCBFQP._filter_ego_only`, and
    :meth:`concerto.safety.cbf_qp.ExpCBFQP._filter_joint`); third-party
    consumers building their own pair-iteration code over the safety
    stack should call this helper to stay aligned with the same
    canonical ordering :class:`SafetyState` ``lambda_`` is indexed
    against. Returns
    a list of the UIDs sorted lexicographically; downstream code
    iterates upper-triangular pairs ``(uids[i], uids[j]) for i < j``
    so the pair index for any ``(uid_a, uid_b)`` (with
    ``uid_a < uid_b`` lexicographically) is stable across callers and
    across dict reconstruction (external-review P1, 2026-05-16).

    Post-issue-#144 the structural fix (``lambda_: LambdaDict``) makes
    the pair-keying part of the type system: lookups go via the
    lexicographic tuple key produced by :func:`canonical_pair_key`, and
    callers no longer rely on the canonical ordering to index a flat
    array. The helper is retained as a primitive because the
    upper-triangular pair iteration in
    :func:`make_pair_keys` and the cbf_qp call-sites still needs a
    deterministic uid ordering to build the canonical set of pair
    tuples.

    Args:
        uids: An iterable of agent UIDs.

    Returns:
        A new list containing the UIDs sorted lexicographically.

    Raises:
        ValueError: If the iterable contains duplicates (a partner-set
            invariant the upstream filter already enforces, surfaced
            here as a defensive check).
    """
    out = sorted(uids)
    if len(set(out)) != len(out):
        msg = f"duplicate uids in pair iteration: {out!r}"
        raise ValueError(msg)
    return out

lambda_from_jsonable

lambda_from_jsonable(
    payload: object, *, uids: Iterable[str] | None = None
) -> LambdaDict

Deserialise a :data:LambdaDict from a JSON-decoded payload (ADR-014 §Decision; issue #144).

Accepts either:

  • v3 (current) — a list of structured entries [{"a": uid_a, "b": uid_b, "value": λ}, ...] as produced by :func:lambda_to_jsonable. Each entry's pair is canonicalised via :func:canonical_pair_key so a misordered (b, a) entry on the wire is rebuilt under the canonical (a, b) key.
  • v2 (legacy) — a flat list / sequence of floats of length n*(n-1)/2 plus an explicit uids keyword. The values are assumed to be indexed in the canonical pair-iteration order (the implicit invariant that issue #144 closed by promoting to a dict). Reading this form emits a :class:DeprecationWarning and the legacy reader is targeted for removal in v0.6.0.

Parameters:

  • payload (object) –

    The JSON-decoded structure (list of dicts for v3, or a list of floats for v2).

  • uids (Iterable[str] | None, default: None ) –

    Required for v2; ignored for v3. The UID set whose canonical pair order indexed the legacy ndarray.

Returns:

  • LambdaDict

    A fresh :data:LambdaDict.

Raises:

  • TypeError

    When payload is neither a v3 structured list nor a v2 flat float list, or v2 was passed without uids.

  • ValueError

    When a v3 entry lists a == b (canonical pair keys always have distinct UIDs), or the v2 length does not match the canonical pair count for uids.

Source code in src/concerto/safety/api.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def lambda_from_jsonable(
    payload: object,
    *,
    uids: Iterable[str] | None = None,
) -> LambdaDict:
    """Deserialise a :data:`LambdaDict` from a JSON-decoded payload (ADR-014 §Decision; issue #144).

    Accepts either:

    - **v3 (current)** — a list of structured entries
      ``[{"a": uid_a, "b": uid_b, "value": λ}, ...]`` as produced by
      :func:`lambda_to_jsonable`. Each entry's pair is canonicalised
      via :func:`canonical_pair_key` so a misordered ``(b, a)`` entry
      on the wire is rebuilt under the canonical ``(a, b)`` key.
    - **v2 (legacy)** — a flat list / sequence of floats of length
      ``n*(n-1)/2`` plus an explicit ``uids`` keyword. The values are
      assumed to be indexed in the canonical pair-iteration order
      (the implicit invariant that issue #144 closed by promoting to
      a dict). Reading this form emits a :class:`DeprecationWarning`
      and the legacy reader is targeted for removal in v0.6.0.

    Args:
        payload: The JSON-decoded structure (list of dicts for v3, or
            a list of floats for v2).
        uids: Required for v2; ignored for v3. The UID set whose
            canonical pair order indexed the legacy ndarray.

    Returns:
        A fresh :data:`LambdaDict`.

    Raises:
        TypeError: When ``payload`` is neither a v3 structured list
            nor a v2 flat float list, or v2 was passed without
            ``uids``.
        ValueError: When a v3 entry lists ``a == b`` (canonical pair
            keys always have distinct UIDs), or the v2 length does
            not match the canonical pair count for ``uids``.
    """
    if not isinstance(payload, list):
        msg = f"lambda_from_jsonable: expected list payload, got {type(payload).__name__}"
        raise TypeError(msg)
    payload_list: list[Any] = payload  # type: ignore[assignment]
    if payload_list and isinstance(payload_list[0], dict):
        # v3 structured-entry form.
        out: LambdaDict = {}
        for entry in payload_list:
            if not isinstance(entry, dict):
                msg = f"lambda_from_jsonable: v3 entries must be dicts; got {type(entry).__name__}"
                raise TypeError(msg)
            entry_dict: dict[str, Any] = entry  # type: ignore[assignment]
            a = str(entry_dict["a"])
            b = str(entry_dict["b"])
            key = canonical_pair_key(a, b)
            out[key] = float(entry_dict["value"])
        return out
    # v2 legacy ndarray / flat-float-list form.
    if uids is None:
        msg = (
            "lambda_from_jsonable: legacy v2 ndarray payload requires the "
            "explicit ``uids`` keyword (the canonical pair order that "
            "indexed the array). Issue #144 closed the implicit-order "
            "invariant; the v2 reader is deprecated and targeted for "
            "removal in v0.6.0."
        )
        raise TypeError(msg)
    warnings.warn(
        "lambda_from_jsonable: legacy v2 (ndarray) payload — pair-keying "
        "via implicit canonical-pair-order is deprecated in favour of the "
        "v3 structured-list form. Targeted for removal in v0.6.0; see "
        "issue #144 + ADR-014 §Revision history.",
        DeprecationWarning,
        stacklevel=2,
    )
    keys = make_pair_keys(uids)
    if len(payload_list) != len(keys):
        msg = (
            f"lambda_from_jsonable: v2 payload length {len(payload_list)} does "
            f"not match the canonical pair count {len(keys)} for the given "
            f"UID set."
        )
        raise ValueError(msg)
    return {key: float(value) for key, value in zip(keys, payload_list, strict=True)}

lambda_to_jsonable

lambda_to_jsonable(d: LambdaDict) -> list[dict[str, Any]]

Serialise a :data:LambdaDict for JSON round-trip (ADR-014 §Decision; issue #144).

JSON does not have tuple keys natively. The structured list form [{"a": uid_a, "b": uid_b, "value": λ}, ...] is the canonical wire encoding for v3 :data:LambdaDict payloads — pair-keys are explicit named fields rather than stringified tuples, so a reader cannot accidentally split or rejoin the pair on the wrong delimiter.

Parameters:

  • d (LambdaDict) –

    A :data:LambdaDict.

Returns:

  • list[dict[str, Any]]

    A list of plain-dict entries with "a", "b", and

  • list[dict[str, Any]]

    "value" fields, ordered by the canonical pair-key order

  • from ( list[dict[str, Any]] ) –

    func:make_pair_keys over the dict's UID set so the

  • list[dict[str, Any]]

    wire encoding is determinstic.

Source code in src/concerto/safety/api.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
def lambda_to_jsonable(d: LambdaDict) -> list[dict[str, Any]]:
    """Serialise a :data:`LambdaDict` for JSON round-trip (ADR-014 §Decision; issue #144).

    JSON does not have tuple keys natively. The structured list form
    ``[{"a": uid_a, "b": uid_b, "value": λ}, ...]`` is the canonical
    wire encoding for v3 :data:`LambdaDict` payloads — pair-keys are
    explicit named fields rather than stringified tuples, so a reader
    cannot accidentally split or rejoin the pair on the wrong
    delimiter.

    Args:
        d: A :data:`LambdaDict`.

    Returns:
        A list of plain-dict entries with ``"a"``, ``"b"``, and
        ``"value"`` fields, ordered by the canonical pair-key order
        from :func:`make_pair_keys` over the dict's UID set so the
        wire encoding is determinstic.
    """
    uids: set[str] = set()
    for a, b in d:
        uids.add(a)
        uids.add(b)
    payload: list[dict[str, Any]] = []
    for a, b in make_pair_keys(uids):
        if (a, b) not in d:
            continue
        payload.append({"a": a, "b": b, "value": float(d[(a, b)])})
    return payload

make_lambda_dict

make_lambda_dict(
    uids: Iterable[str], *, fill: float = 0.0
) -> LambdaDict

Construct a :data:LambdaDict over the canonical pair set (ADR-004 §Decision; issue #144).

The convenience constructor for :class:SafetyState's lambda_ field. Each canonical pair-key produced by :func:make_pair_keys maps to fill; callers seed every pair to the same scalar (typically the warm-start lambda_safe from :func:concerto.safety.conformal.reset_on_partner_swap) and the conformal layer updates entries in place per step.

Parameters:

  • uids (Iterable[str]) –

    Iterable of agent UIDs.

  • fill (float, default: 0.0 ) –

    Initial per-pair value (default 0.0).

Returns:

  • LambdaDict

    A fresh :data:LambdaDict with one entry per canonical pair.

Source code in src/concerto/safety/api.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def make_lambda_dict(uids: Iterable[str], *, fill: float = 0.0) -> LambdaDict:
    """Construct a :data:`LambdaDict` over the canonical pair set (ADR-004 §Decision; issue #144).

    The convenience constructor for :class:`SafetyState`'s ``lambda_``
    field. Each canonical pair-key produced by :func:`make_pair_keys`
    maps to ``fill``; callers seed every pair to the same scalar
    (typically the warm-start ``lambda_safe`` from
    :func:`concerto.safety.conformal.reset_on_partner_swap`) and the
    conformal layer updates entries in place per step.

    Args:
        uids: Iterable of agent UIDs.
        fill: Initial per-pair value (default ``0.0``).

    Returns:
        A fresh :data:`LambdaDict` with one entry per canonical pair.
    """
    return {key: float(fill) for key in make_pair_keys(uids)}

make_pair_keys

make_pair_keys(
    uids: Iterable[str],
) -> list[tuple[str, str]]

Build the canonical upper-triangular pair-key list (ADR-004 §Decision; issue #144).

Returns the list of (uid_i, uid_j) tuples with i < j over the lexicographically-sorted UID order produced by :func:canonical_pair_order. Used by :func:make_lambda_dict, the cbf_qp call-sites, and any other consumer that needs to enumerate the canonical pair set for a uid collection.

Parameters:

  • uids (Iterable[str]) –

    Iterable of agent UIDs (must not contain duplicates).

Returns:

  • list[tuple[str, str]]

    A new list of pair-key tuples in canonical iteration order

  • list[tuple[str, str]]

    (outer loop over lex-sorted uids, inner loop over the

  • list[tuple[str, str]]

    suffix). The list length is n*(n-1)/2 for n distinct

  • list[tuple[str, str]]

    uids.

Source code in src/concerto/safety/api.py
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
def make_pair_keys(uids: Iterable[str]) -> list[tuple[str, str]]:
    """Build the canonical upper-triangular pair-key list (ADR-004 §Decision; issue #144).

    Returns the list of ``(uid_i, uid_j)`` tuples with ``i < j`` over
    the lexicographically-sorted UID order produced by
    :func:`canonical_pair_order`. Used by :func:`make_lambda_dict`,
    the cbf_qp call-sites, and any other consumer that needs to
    enumerate the canonical pair set for a uid collection.

    Args:
        uids: Iterable of agent UIDs (must not contain duplicates).

    Returns:
        A new list of pair-key tuples in canonical iteration order
        (outer loop over lex-sorted uids, inner loop over the
        suffix). The list length is ``n*(n-1)/2`` for ``n`` distinct
        uids.
    """
    sorted_uids = canonical_pair_order(uids)
    pairs: list[tuple[str, str]] = []
    for i, a in enumerate(sorted_uids):
        pairs.extend((a, b) for b in sorted_uids[i + 1 :])
    return pairs

solve_qp_stub

solve_qp_stub(
    *args: object, **kwargs: object
) -> tuple[float, float]

Real CBF-QP benchmark solve (T3.10; ADR-004 §"OSCBF target").

T3.10 replaces M2's no-op stub with an actual Clarabel solve on a small representative QP. The function name is preserved for backward compatibility with tests/property/test_qp_saturation.py (the M2 acceptance gate) per plan/03 §8 + the M3 brief Hard Rule

4: the saturation guard's regime check (drop_rate >= 10 % or

latency >= 100 ms; ADR-006 §Risks R5) is independent of the QP body, so after replacement it still fires only at the URLLC saturation profile, not at lossy.

The QP is pre-built at module load and the solver instance is module-level so the per-call hot path is just the Clarabel solve — no allocation or instantiation overhead. The *args / **kwargs signature is preserved for the saturation_guard contract (callers pass no args; the placeholder forwarding kept the signature open for future per-call inputs).

Parameters:

  • *args (object, default: () ) –

    Ignored (M2 backward-compat surface).

  • **kwargs (object, default: {} ) –

    Ignored.

Returns:

  • float

    (decision_norm, elapsed_seconds) matching the M2 stub

  • float

    contract; the M2 saturation guard reads only the timing

  • tuple[float, float]

    envelope so the decision value is informational.

Source code in src/concerto/safety/__init__.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def solve_qp_stub(*args: object, **kwargs: object) -> tuple[float, float]:
    """Real CBF-QP benchmark solve (T3.10; ADR-004 §"OSCBF target").

    T3.10 replaces M2's no-op stub with an actual Clarabel solve on a
    small representative QP. The function name is preserved for
    backward compatibility with ``tests/property/test_qp_saturation.py``
    (the M2 acceptance gate) per plan/03 §8 + the M3 brief Hard Rule
    #4: the saturation guard's regime check (drop_rate >= 10 % or
    latency >= 100 ms; ADR-006 §Risks R5) is independent of the QP
    body, so after replacement it still fires only at the URLLC
    ``saturation`` profile, not at ``lossy``.

    The QP is pre-built at module load and the solver instance is
    module-level so the per-call hot path is just the Clarabel solve
    — no allocation or instantiation overhead. The ``*args`` /
    ``**kwargs`` signature is preserved for the ``saturation_guard``
    contract (callers pass no args; the placeholder forwarding kept
    the signature open for future per-call inputs).

    Args:
        *args: Ignored (M2 backward-compat surface).
        **kwargs: Ignored.

    Returns:
        ``(decision_norm, elapsed_seconds)`` matching the M2 stub
        contract; the M2 saturation guard reads only the timing
        envelope so the decision value is informational.
    """
    del args, kwargs
    x, ms = _BENCH_SOLVER.solve(_BENCH_P, _BENCH_Q, _BENCH_A, _BENCH_B)
    return float(np.linalg.norm(x)), ms / 1000.0

concerto.training

CONCERTO training stack — ego-AHT HAPPO wrapper + seeding + logging.

Implements the frozen-partner ego-AHT training algorithm from ADR-002 using the HARL fork (ADR-002 §"HARL dependency"). The seeding module provides the determinism harness required by project principle P6.

CheckStatus

Bases: Enum

Outcome taxonomy for :class:SlopeReport (ADR-002 §Risks #1; review P1-2).

Replaces the pre-amendment binary passed: bool shape so callers can distinguish "trip-wire cleared" from "data too short" from "curve contained NaN / inf". The legacy passed boolean is preserved for v0.5.x backwards-compat but is derived from status; passed = status is CheckStatus.PASSED.

Values

PASSED: slope > 0 and p_value < alpha. FAILED: Slope non-positive (flat or regressing) OR p-value above alpha. INSUFFICIENT_DATA: Curve has fewer than min_episodes entries. The pre-amendment code returned passed=True here (vacuous-pass-on-short-curves bug, external-review P1-2); the new branch returns this status with passed=False so consumers do not silently consume a too-short curve as a passing trip-wire. INVALID: Curve contains NaN or +/- inf. The pre-amendment code would crash inside scipy.stats.linregress (on NaN) or silently propagate inf through the slope; the new branch detects them up front and returns this status with passed=False.

Source code in src/concerto/training/learning_signal_check.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
class CheckStatus(Enum):
    """Outcome taxonomy for :class:`SlopeReport` (ADR-002 §Risks #1; review P1-2).

    Replaces the pre-amendment binary ``passed: bool`` shape so callers
    can distinguish "trip-wire cleared" from "data too short" from
    "curve contained NaN / inf". The legacy ``passed`` boolean is
    preserved for v0.5.x backwards-compat but is derived from
    ``status``; ``passed = status is CheckStatus.PASSED``.

    Values:
        PASSED: ``slope > 0`` and ``p_value < alpha``.
        FAILED: Slope non-positive (flat or regressing) OR p-value
            above ``alpha``.
        INSUFFICIENT_DATA: Curve has fewer than ``min_episodes``
            entries. The pre-amendment code returned ``passed=True``
            here (vacuous-pass-on-short-curves bug, external-review
            P1-2); the new branch returns this status with
            ``passed=False`` so consumers do not silently consume a
            too-short curve as a passing trip-wire.
        INVALID: Curve contains NaN or +/- inf. The pre-amendment code
            would crash inside ``scipy.stats.linregress`` (on NaN) or
            silently propagate inf through the slope; the new branch
            detects them up front and returns this status with
            ``passed=False``.
    """

    PASSED = "passed"
    FAILED = "failed"
    INSUFFICIENT_DATA = "insufficient_data"
    INVALID = "invalid"

GuaranteeReport dataclass

Result of one moving-window non-decreasing-fraction check (T4b.13; ADR-002 §Risks #1).

The dataclass is frozen so a returned report cannot be mutated after the fact (the project's reproducibility contract: provenance bundles are immutable). Serialise to JSON via :func:dataclasses.asdict (used by :func:plot_reward_curve).

The "Guarantee" in the name is a legacy label held over from the pre-rename empirical_guarantee module — the value is a trip-wire, not a formal guarantee; see the module docstring + :class:CheckStatus for the renamed canonical taxonomy. The dataclass name is preserved for v0.5.x backwards-compat with consumers that import it directly.

Attributes:

  • window (int) –

    Moving-average window length the report was computed against (default :data:DEFAULT_WINDOW).

  • threshold (float) –

    Pass threshold the report was checked against (default :data:DEFAULT_THRESHOLD).

  • n_intervals (int) –

    Number of consecutive-window-pair intervals examined. Equals max(len(curve) - window, 0) for a non-degenerate run; zero when the curve is empty or shorter than window.

  • n_nondecreasing (int) –

    Count of intervals whose later-window mean is >= the earlier-window mean.

  • fraction (float) –

    n_nondecreasing / n_intervals when n_intervals > 0; 1.0 (vacuously) when n_intervals == 0 so callers can treat the degenerate case uniformly.

  • passed (bool) –

    fraction >= threshold. Vacuously True when n_intervals == 0 (the curve was too short to assert anything; the assertion is documented as vacuous in that regime).

Source code in src/concerto/training/learning_signal_check.py
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
@dataclass(frozen=True)
class GuaranteeReport:
    """Result of one moving-window non-decreasing-fraction check (T4b.13; ADR-002 §Risks #1).

    The dataclass is frozen so a returned report cannot be mutated after
    the fact (the project's reproducibility contract: provenance bundles
    are immutable). Serialise to JSON via :func:`dataclasses.asdict`
    (used by :func:`plot_reward_curve`).

    The "Guarantee" in the name is a legacy label held over from the
    pre-rename ``empirical_guarantee`` module — the value is a
    trip-wire, not a formal guarantee; see the module docstring +
    :class:`CheckStatus` for the renamed canonical taxonomy. The
    dataclass name is preserved for v0.5.x backwards-compat with
    consumers that import it directly.

    Attributes:
        window: Moving-average window length the report was computed
            against (default :data:`DEFAULT_WINDOW`).
        threshold: Pass threshold the report was checked against (default
            :data:`DEFAULT_THRESHOLD`).
        n_intervals: Number of consecutive-window-pair intervals examined.
            Equals ``max(len(curve) - window, 0)`` for a non-degenerate
            run; zero when the curve is empty or shorter than ``window``.
        n_nondecreasing: Count of intervals whose later-window mean is
            ``>=`` the earlier-window mean.
        fraction: ``n_nondecreasing / n_intervals`` when ``n_intervals > 0``;
            ``1.0`` (vacuously) when ``n_intervals == 0`` so callers can
            treat the degenerate case uniformly.
        passed: ``fraction >= threshold``. Vacuously ``True`` when
            ``n_intervals == 0`` (the curve was too short to assert
            anything; the assertion is documented as vacuous in that
            regime).
    """

    window: int
    threshold: float
    n_intervals: int
    n_nondecreasing: int
    fraction: float
    passed: bool

SlopeReport dataclass

Result of one slope-test learning-signal check (T4b.13; ADR-002 §Risks #1).

Captures everything an external reader needs to interpret the trip-wire outcome without re-running the experiment: the least-squares slope itself, the coefficient of determination, the one-sided p-value, the alpha threshold the run was checked against, the number of episodes the regression saw, the :class:CheckStatus taxon, and the deprecated passed boolean (kept for v0.5.x backwards-compat).

The dataclass is frozen so a returned report cannot be mutated after the fact (the project's reproducibility contract: provenance bundles are immutable). Serialise to JSON via :func:dataclasses.asdict (used by :func:plot_slope_curve).

Attributes:

  • slope (float) –

    Least-squares slope of per-episode reward versus episode index, in reward-units-per-episode. Positive ⇒ learning; zero or negative ⇒ no learning or regressing.

  • intercept (float) –

    Least-squares intercept. Useful for the regression overlay in :func:plot_slope_curve and as a sanity check on the early-rollout reward scale.

  • r_squared (float) –

    Coefficient of determination . Low plus status=PASSED is acceptable on noisy tasks (the slope is still statistically significant); the project does not gate on directly.

  • p_value (float) –

    One-sided p-value for the null "slope <= 0", computed as two_sided_p / 2 when the observed slope is positive and 1 - two_sided_p / 2 when non-positive. The two-sided p-value comes from :func:scipy.stats.linregress.

  • alpha (float) –

    Significance threshold the check was run against (default :data:DEFAULT_ALPHA).

  • n_episodes (int) –

    Number of episodes in the regressed curve. Below :data:DEFAULT_MIN_EPISODES the report carries :attr:CheckStatus.INSUFFICIENT_DATA.

  • status (CheckStatus) –

    Outcome taxon (:class:CheckStatus). Canonical signal; new code should branch on this, not on passed.

  • passed (bool) –

    Deprecated — derived from status (passed == status is CheckStatus.PASSED). Kept for v0.5.x consumers; removal target v0.6.0. Construction emits a :class:DeprecationWarning in __post_init__.

Source code in src/concerto/training/learning_signal_check.py
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
@dataclass(frozen=True)
class SlopeReport:
    """Result of one slope-test learning-signal check (T4b.13; ADR-002 §Risks #1).

    Captures everything an external reader needs to interpret the
    trip-wire outcome without re-running the experiment: the
    least-squares slope itself, the coefficient of determination, the
    one-sided p-value, the alpha threshold the run was checked against,
    the number of episodes the regression saw, the :class:`CheckStatus`
    taxon, and the deprecated ``passed`` boolean (kept for v0.5.x
    backwards-compat).

    The dataclass is frozen so a returned report cannot be mutated after
    the fact (the project's reproducibility contract: provenance bundles
    are immutable). Serialise to JSON via :func:`dataclasses.asdict`
    (used by :func:`plot_slope_curve`).

    Attributes:
        slope: Least-squares slope of per-episode reward versus episode
            index, in reward-units-per-episode. Positive ⇒ learning;
            zero or negative ⇒ no learning or regressing.
        intercept: Least-squares intercept. Useful for the regression
            overlay in :func:`plot_slope_curve` and as a sanity check
            on the early-rollout reward scale.
        r_squared: Coefficient of determination ``R²``. Low ``R²`` plus
            ``status=PASSED`` is acceptable on noisy tasks (the slope is
            still statistically significant); the project does not gate
            on ``R²`` directly.
        p_value: One-sided p-value for the null "slope <= 0", computed
            as ``two_sided_p / 2`` when the observed slope is positive
            and ``1 - two_sided_p / 2`` when non-positive. The
            two-sided p-value comes from
            :func:`scipy.stats.linregress`.
        alpha: Significance threshold the check was run against
            (default :data:`DEFAULT_ALPHA`).
        n_episodes: Number of episodes in the regressed curve. Below
            :data:`DEFAULT_MIN_EPISODES` the report carries
            :attr:`CheckStatus.INSUFFICIENT_DATA`.
        status: Outcome taxon (:class:`CheckStatus`). Canonical signal;
            new code should branch on this, not on ``passed``.
        passed: **Deprecated** — derived from ``status`` (``passed ==
            status is CheckStatus.PASSED``). Kept for v0.5.x consumers;
            removal target v0.6.0. Construction emits a
            :class:`DeprecationWarning` in ``__post_init__``.
    """

    slope: float
    intercept: float
    r_squared: float
    p_value: float
    alpha: float
    n_episodes: int
    status: CheckStatus
    passed: bool = False

    def __post_init__(self) -> None:
        """Derive ``passed`` from ``status`` + emit the field-deprecation warning.

        Two side effects, in order:

        1. ``passed`` is recomputed from ``status`` so the two cannot
           drift even when a caller supplies an inconsistent value
           (e.g. while round-tripping a legacy v1 JSON payload that
           has ``passed`` but no ``status``); the (rare) override uses
           ``object.__setattr__`` because the dataclass is frozen.
        2. A :class:`DeprecationWarning` is emitted **unconditionally,
           on every construction**, signalling that the ``passed`` field
           itself is deprecated. The warning fires whether or not the
           caller supplied ``passed`` explicitly — the goal is to flag
           the *field* to consumers, not just inconsistent inputs. New
           code should branch on ``report.status is CheckStatus.PASSED``
           and ignore ``passed`` entirely; ``passed`` is removed in
           v0.6.0 along with this ``__post_init__`` warning.
        """
        derived = self.status is CheckStatus.PASSED
        if self.passed != derived:
            object.__setattr__(self, "passed", derived)
        warnings.warn(
            "SlopeReport.passed is deprecated; use "
            "`report.status is CheckStatus.PASSED` instead "
            "(removal target: v0.6.0; external-review P1-1/P1-2, "
            "2026-05-16).",
            DeprecationWarning,
            stacklevel=3,
        )

__post_init__

__post_init__() -> None

Derive passed from status + emit the field-deprecation warning.

Two side effects, in order:

  1. passed is recomputed from status so the two cannot drift even when a caller supplies an inconsistent value (e.g. while round-tripping a legacy v1 JSON payload that has passed but no status); the (rare) override uses object.__setattr__ because the dataclass is frozen.
  2. A :class:DeprecationWarning is emitted unconditionally, on every construction, signalling that the passed field itself is deprecated. The warning fires whether or not the caller supplied passed explicitly — the goal is to flag the field to consumers, not just inconsistent inputs. New code should branch on report.status is CheckStatus.PASSED and ignore passed entirely; passed is removed in v0.6.0 along with this __post_init__ warning.
Source code in src/concerto/training/learning_signal_check.py
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
def __post_init__(self) -> None:
    """Derive ``passed`` from ``status`` + emit the field-deprecation warning.

    Two side effects, in order:

    1. ``passed`` is recomputed from ``status`` so the two cannot
       drift even when a caller supplies an inconsistent value
       (e.g. while round-tripping a legacy v1 JSON payload that
       has ``passed`` but no ``status``); the (rare) override uses
       ``object.__setattr__`` because the dataclass is frozen.
    2. A :class:`DeprecationWarning` is emitted **unconditionally,
       on every construction**, signalling that the ``passed`` field
       itself is deprecated. The warning fires whether or not the
       caller supplied ``passed`` explicitly — the goal is to flag
       the *field* to consumers, not just inconsistent inputs. New
       code should branch on ``report.status is CheckStatus.PASSED``
       and ignore ``passed`` entirely; ``passed`` is removed in
       v0.6.0 along with this ``__post_init__`` warning.
    """
    derived = self.status is CheckStatus.PASSED
    if self.passed != derived:
        object.__setattr__(self, "passed", derived)
    warnings.warn(
        "SlopeReport.passed is deprecated; use "
        "`report.status is CheckStatus.PASSED` instead "
        "(removal target: v0.6.0; external-review P1-1/P1-2, "
        "2026-05-16).",
        DeprecationWarning,
        stacklevel=3,
    )

assert_non_decreasing_window

assert_non_decreasing_window(
    curve: RewardCurve,
    *,
    window: int = DEFAULT_WINDOW,
    threshold: float = DEFAULT_THRESHOLD,
) -> GuaranteeReport

Check the moving-window-of-K mean for non-decreasing intervals (ADR-002 §Risks #1).

Slides a window of length window over curve.per_episode_ego_rewards and counts the fraction of consecutive interval pairs whose later-window mean is >= the earlier-window mean. A run is considered to have cleared the trip-wire when that fraction is >= threshold.

Plan/05 §6 criterion 4 names the canonical setting: window=10, threshold=0.8. Callers MUST NOT silently widen the window or lower the threshold to coerce a passing report — if the assertion fires under the production config, hand control back to the maintainer and open a #scope-revision issue (ADR-002 §Risks

1 mitigation policy).

Degenerate inputs (empty curve, len(curve) < window) return a report with n_intervals=0 and passed=True. The pass is vacuous — there is not enough signal to assert anything — and is only useful as a sentinel for unit tests of the helper itself; do not rely on it to mask a too-short production run. The canonical slope test :func:check_positive_learning_slope resolves this vacuous-pass class of failure (external-review P1-2) by returning CheckStatus.INSUFFICIENT_DATA with passed=False on too-short curves.

Parameters:

  • curve (RewardCurve) –

    The reward curve produced by :func:concerto.training.ego_aht.train. Only the per_episode_ego_rewards field is read; the per-step series is deliberately ignored because the assertion is episode-aggregated (Huriot-Sibai 2025 Theorem 7 is stated over the policy-improvement step, which lives at episode granularity).

  • window (int, default: DEFAULT_WINDOW ) –

    Window size (default :data:DEFAULT_WINDOW).

  • threshold (float, default: DEFAULT_THRESHOLD ) –

    Pass threshold on the non-decreasing fraction (default :data:DEFAULT_THRESHOLD).

Returns:

Source code in src/concerto/training/learning_signal_check.py
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
def assert_non_decreasing_window(
    curve: RewardCurve,
    *,
    window: int = DEFAULT_WINDOW,
    threshold: float = DEFAULT_THRESHOLD,
) -> GuaranteeReport:
    """Check the moving-window-of-K mean for non-decreasing intervals (ADR-002 §Risks #1).

    Slides a window of length ``window`` over ``curve.per_episode_ego_rewards``
    and counts the fraction of consecutive interval pairs whose later-window
    mean is ``>=`` the earlier-window mean. A run is considered to have
    cleared the trip-wire when that fraction is ``>= threshold``.

    Plan/05 §6 criterion 4 names the canonical setting:
    ``window=10, threshold=0.8``. Callers MUST NOT silently widen the
    window or lower the threshold to coerce a passing report — if the
    assertion fires under the production config, hand control back to
    the maintainer and open a ``#scope-revision`` issue (ADR-002 §Risks
    #1 mitigation policy).

    Degenerate inputs (empty curve, ``len(curve) < window``) return a
    report with ``n_intervals=0`` and ``passed=True``. The pass is
    *vacuous* — there is not enough signal to assert anything — and is
    only useful as a sentinel for unit tests of the helper itself; do
    not rely on it to mask a too-short production run. The canonical
    slope test :func:`check_positive_learning_slope` resolves this
    vacuous-pass class of failure (external-review P1-2) by returning
    ``CheckStatus.INSUFFICIENT_DATA`` with ``passed=False`` on
    too-short curves.

    Args:
        curve: The reward curve produced by
            :func:`concerto.training.ego_aht.train`. Only the
            ``per_episode_ego_rewards`` field is read; the per-step
            series is deliberately ignored because the assertion is
            episode-aggregated (Huriot-Sibai 2025 Theorem 7 is stated
            over the policy-improvement step, which lives at episode
            granularity).
        window: Window size (default :data:`DEFAULT_WINDOW`).
        threshold: Pass threshold on the non-decreasing fraction
            (default :data:`DEFAULT_THRESHOLD`).

    Returns:
        Frozen :class:`GuaranteeReport`. Does NOT raise on failure — the
        caller decides whether to fail the test, exit the CLI non-zero,
        or just log the report.
    """
    rewards = list(curve.per_episode_ego_rewards)
    if len(rewards) < window or window <= 0:
        return GuaranteeReport(
            window=window,
            threshold=threshold,
            n_intervals=0,
            n_nondecreasing=0,
            fraction=1.0,
            passed=True,
        )
    means: list[float] = []
    running_sum = sum(rewards[:window])
    means.append(running_sum / window)
    for i in range(window, len(rewards)):
        running_sum += rewards[i] - rewards[i - window]
        means.append(running_sum / window)
    n_intervals = len(means) - 1
    n_nondecreasing = sum(1 for prev, nxt in pairwise(means) if nxt >= prev)
    fraction = n_nondecreasing / n_intervals if n_intervals > 0 else 1.0
    return GuaranteeReport(
        window=window,
        threshold=threshold,
        n_intervals=n_intervals,
        n_nondecreasing=n_nondecreasing,
        fraction=fraction,
        passed=fraction >= threshold,
    )

check_positive_learning_slope

check_positive_learning_slope(
    curve: RewardCurve,
    *,
    alpha: float = DEFAULT_ALPHA,
    min_episodes: int = DEFAULT_MIN_EPISODES,
) -> SlopeReport

Least-squares slope check on per-episode reward (ADR-002 §Risks #1; canonical).

Renames the pre-amendment assert_positive_learning_slope (the "assert" verb overclaimed — the function returns a report rather than raising, and the slope check is a trip-wire for ADR-002 §Risks #1, not a formal monotonic-improvement guarantee). The old name remains available via a deprecation shim at concerto.training.empirical_guarantee.assert_positive_learning_slope; removal target v0.6.0.

Issue #62 root-cause writeup: the legacy :func:assert_non_decreasing_window statistic has poor signal-to- noise on per-episode rewards at the Phase-0 100k-frame budget — the moving-window-of-10 mean stdev (~6.3) is comparable to the total drift (~10-20 reward units) the trainer can produce, and the non-decreasing-fraction sits near 0.5 (a random walk) even when the policy is materially improving. The slope check integrates over the whole curve and rejects the null "slope <= 0" with p-values astronomically below any reasonable alpha when learning is real.

The check:

  1. Validate the curve. NaN / +/- inf ⇒ :attr:CheckStatus.INVALID immediately.
  2. If n < min_episodes ⇒ :attr:CheckStatus.INSUFFICIENT_DATA with passed=False. (Pre-amendment code returned passed=True here; external-review P1-2 flagged this as silently masking too-short curves.)
  3. Fit rewards ~ slope * episode_index + intercept via least squares (:func:scipy.stats.linregress).
  4. Take the one-sided p-value: two_sided / 2 when the observed slope is positive, otherwise 1 - two_sided / 2. (scipy reports two-sided by default; the one-sided form is the natural gate for "positive learning".)
  5. Pass when slope > 0 AND p_value < alpha ⇒ :attr:CheckStatus.PASSED. Else :attr:CheckStatus.FAILED.

Plan/05 §6 criterion 4 uses alpha=0.05 as the canonical threshold; the project's production config gives p_value < 1e-14 across seeds, so the gate has very wide margin.

Parameters:

  • curve (RewardCurve) –

    The reward curve produced by :func:concerto.training.ego_aht.train. Only per_episode_ego_rewards is read.

  • alpha (float, default: DEFAULT_ALPHA ) –

    One-sided significance threshold. Default :data:DEFAULT_ALPHA.

  • min_episodes (int, default: DEFAULT_MIN_EPISODES ) –

    Minimum episode count for a non-vacuous report. Below this the report carries :attr:CheckStatus.INSUFFICIENT_DATA. Default :data:DEFAULT_MIN_EPISODES.

Returns:

  • Frozen ( SlopeReport ) –

    class:SlopeReport. Does NOT raise on failure — the

  • SlopeReport

    caller decides whether to fail the test, exit the CLI non-zero,

  • SlopeReport

    or just log the report.

Source code in src/concerto/training/learning_signal_check.py
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
def check_positive_learning_slope(
    curve: RewardCurve,
    *,
    alpha: float = DEFAULT_ALPHA,
    min_episodes: int = DEFAULT_MIN_EPISODES,
) -> SlopeReport:
    """Least-squares slope check on per-episode reward (ADR-002 §Risks #1; canonical).

    Renames the pre-amendment ``assert_positive_learning_slope`` (the
    "assert" verb overclaimed — the function returns a report rather
    than raising, and the slope check is a *trip-wire* for ADR-002
    §Risks #1, not a formal monotonic-improvement guarantee). The old
    name remains available via a deprecation shim at
    ``concerto.training.empirical_guarantee.assert_positive_learning_slope``;
    removal target v0.6.0.

    Issue #62 root-cause writeup: the legacy
    :func:`assert_non_decreasing_window` statistic has poor signal-to-
    noise on per-episode rewards at the Phase-0 100k-frame budget — the
    moving-window-of-10 mean stdev (~6.3) is comparable to the total
    drift (~10-20 reward units) the trainer can produce, and the
    non-decreasing-fraction sits near 0.5 (a random walk) even when the
    policy is materially improving. The slope check integrates over the
    whole curve and rejects the null "slope <= 0" with p-values
    astronomically below any reasonable alpha when learning is real.

    The check:

    1. Validate the curve. NaN / +/- inf ⇒
       :attr:`CheckStatus.INVALID` immediately.
    2. If ``n < min_episodes`` ⇒ :attr:`CheckStatus.INSUFFICIENT_DATA`
       with ``passed=False``. (Pre-amendment code returned
       ``passed=True`` here; external-review P1-2 flagged this as
       silently masking too-short curves.)
    3. Fit ``rewards ~ slope * episode_index + intercept`` via least
       squares (:func:`scipy.stats.linregress`).
    4. Take the one-sided p-value: ``two_sided / 2`` when the observed
       slope is positive, otherwise ``1 - two_sided / 2``. (scipy
       reports two-sided by default; the one-sided form is the natural
       gate for "positive learning".)
    5. Pass when ``slope > 0`` AND ``p_value < alpha`` ⇒
       :attr:`CheckStatus.PASSED`. Else :attr:`CheckStatus.FAILED`.

    Plan/05 §6 criterion 4 uses ``alpha=0.05`` as the canonical
    threshold; the project's production config gives ``p_value < 1e-14``
    across seeds, so the gate has very wide margin.

    Args:
        curve: The reward curve produced by
            :func:`concerto.training.ego_aht.train`. Only
            ``per_episode_ego_rewards`` is read.
        alpha: One-sided significance threshold. Default
            :data:`DEFAULT_ALPHA`.
        min_episodes: Minimum episode count for a non-vacuous report.
            Below this the report carries
            :attr:`CheckStatus.INSUFFICIENT_DATA`. Default
            :data:`DEFAULT_MIN_EPISODES`.

    Returns:
        Frozen :class:`SlopeReport`. Does NOT raise on failure — the
        caller decides whether to fail the test, exit the CLI non-zero,
        or just log the report.
    """
    rewards = np.asarray(list(curve.per_episode_ego_rewards), dtype=np.float64)
    n = int(rewards.shape[0])

    # 1. Validate. NaN / +/- inf surface as INVALID before any solver
    # path touches the array.
    if n > 0 and not np.all(np.isfinite(rewards)):
        return SlopeReport(
            slope=0.0,
            intercept=0.0,
            r_squared=0.0,
            p_value=1.0,
            alpha=alpha,
            n_episodes=n,
            status=CheckStatus.INVALID,
        )

    # 2. Insufficient data. Replaces the pre-amendment vacuous-pass
    # branch (external-review P1-2) that returned passed=True.
    if n < min_episodes:
        return SlopeReport(
            slope=0.0,
            intercept=0.0,
            r_squared=0.0,
            p_value=1.0,
            alpha=alpha,
            n_episodes=n,
            status=CheckStatus.INSUFFICIENT_DATA,
        )

    # 3-5. Slope regression. Zero-variance y is a degenerate case that
    # scipy.stats.linregress rejects, so handle it ahead of the call
    # and emit FAILED (the slope is not strictly positive).
    if float(rewards.std()) == 0.0:
        return SlopeReport(
            slope=0.0,
            intercept=float(rewards[0]) if n > 0 else 0.0,
            r_squared=0.0,
            p_value=1.0,
            alpha=alpha,
            n_episodes=n,
            status=CheckStatus.FAILED,
        )
    x = np.arange(n, dtype=np.float64)
    result = stats.linregress(x, rewards)
    slope = float(result.slope)
    intercept = float(result.intercept)
    r_squared = float(result.rvalue) ** 2
    two_sided_p = float(result.pvalue)
    p_value = two_sided_p / 2.0 if slope > 0.0 else 1.0 - two_sided_p / 2.0
    status = CheckStatus.PASSED if slope > 0.0 and p_value < alpha else CheckStatus.FAILED
    return SlopeReport(
        slope=slope,
        intercept=intercept,
        r_squared=r_squared,
        p_value=p_value,
        alpha=alpha,
        n_episodes=n,
        status=status,
    )

plot_reward_curve

plot_reward_curve(
    curve: RewardCurve,
    *,
    report: GuaranteeReport,
    out_dir: Path,
) -> tuple[Path, Path]

Emit <run_id>.png + <run_id>.json for the assertion plot (T4b.13; ADR-002 §Risks #1).

Writes two files under out_dir:

  • <run_id>.png: matplotlib line plot of the per-episode reward with the moving-window-of-report.window mean overlaid. The report's fraction / passed are stamped into the title so docs/explanation/why-aht.md can embed the figure unchanged.
  • <run_id>.json: machine-replayable sidecar carrying the raw per-episode curve + the report. A future reader regenerates the figure by replaying the JSON through this function.

matplotlib is imported lazily inside the function so the surrounding module stays importable in environments without the viz extra (Phase-0 CI, headless containers). Callers that need the plot must install the viz extra (uv sync --extra viz).

Parameters:

  • curve (RewardCurve) –

    The reward curve produced by :func:concerto.training.ego_aht.train. curve.run_id is used as the filename stem.

  • report (GuaranteeReport) –

    The :class:GuaranteeReport produced by :func:assert_non_decreasing_window. Stamped into the figure title and copied into the JSON sidecar.

  • out_dir (Path) –

    Output directory. Created if missing.

Returns:

  • Path

    (png_path, json_path) tuple — the absolute paths of the two

  • Path

    emitted files.

Source code in src/concerto/training/learning_signal_check.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
def plot_reward_curve(
    curve: RewardCurve,
    *,
    report: GuaranteeReport,
    out_dir: Path,
) -> tuple[Path, Path]:
    """Emit ``<run_id>.png`` + ``<run_id>.json`` for the assertion plot (T4b.13; ADR-002 §Risks #1).

    Writes two files under ``out_dir``:

    - ``<run_id>.png``: matplotlib line plot of the per-episode reward
      with the moving-window-of-``report.window`` mean overlaid. The
      report's ``fraction`` / ``passed`` are stamped into the title so
      ``docs/explanation/why-aht.md`` can embed the figure unchanged.
    - ``<run_id>.json``: machine-replayable sidecar carrying the raw
      per-episode curve + the report. A future reader regenerates the
      figure by replaying the JSON through this function.

    ``matplotlib`` is imported lazily inside the function so the
    surrounding module stays importable in environments without the
    ``viz`` extra (Phase-0 CI, headless containers). Callers that need
    the plot must install the ``viz`` extra (``uv sync --extra viz``).

    Args:
        curve: The reward curve produced by
            :func:`concerto.training.ego_aht.train`. ``curve.run_id``
            is used as the filename stem.
        report: The :class:`GuaranteeReport` produced by
            :func:`assert_non_decreasing_window`. Stamped into the figure
            title and copied into the JSON sidecar.
        out_dir: Output directory. Created if missing.

    Returns:
        ``(png_path, json_path)`` tuple — the absolute paths of the two
        emitted files.
    """
    import matplotlib  # noqa: PLC0415

    matplotlib.use("Agg", force=True)
    import matplotlib.pyplot as plt  # noqa: PLC0415

    out_dir.mkdir(parents=True, exist_ok=True)
    rewards = list(curve.per_episode_ego_rewards)

    fig, ax = plt.subplots(figsize=(8, 4.5))
    if rewards:
        ax.plot(range(len(rewards)), rewards, label="per-episode ego reward", alpha=0.4)
        if len(rewards) >= report.window > 0:
            running_sum = sum(rewards[: report.window])
            means: list[float] = [running_sum / report.window]
            for i in range(report.window, len(rewards)):
                running_sum += rewards[i] - rewards[i - report.window]
                means.append(running_sum / report.window)
            ax.plot(
                range(report.window - 1, len(rewards)),
                means,
                label=f"moving mean (window={report.window})",
                linewidth=2,
            )
    ax.set_xlabel("episode index")
    ax.set_ylabel("ego reward (sum over episode)")
    status = "PASS" if report.passed else "FAIL"
    ax.set_title(
        f"Learning-signal check (moving window) — run {curve.run_id}{status} "
        f"({report.n_nondecreasing}/{report.n_intervals} "
        f"non-decreasing, threshold={report.threshold:.2f})"
    )
    if rewards:
        ax.legend(loc="lower right")
    ax.grid(visible=True, alpha=0.3)
    fig.tight_layout()
    png_path = out_dir / f"{curve.run_id}.png"
    fig.savefig(png_path, dpi=120)
    plt.close(fig)

    json_path = out_dir / f"{curve.run_id}.json"
    json_path.write_text(
        json.dumps(
            {
                "run_id": curve.run_id,
                "per_episode_ego_rewards": rewards,
                "report": asdict(report),
            },
            sort_keys=True,
            indent=2,
        ),
        encoding="utf-8",
    )
    return png_path, json_path

plot_slope_curve

plot_slope_curve(
    curve: RewardCurve,
    *,
    report: SlopeReport,
    out_dir: Path,
) -> tuple[Path, Path]

Emit <run_id>.png + <run_id>.json for the slope plot (T4b.13; ADR-002 §Risks #1).

Mirrors :func:plot_reward_curve's contract but overlays the least-squares regression line instead of the moving-window mean. The status line in the title carries the slope, p-value, status, and pass/fail flag so docs/explanation/why-aht.md can embed the figure unchanged.

Parameters:

  • curve (RewardCurve) –

    The reward curve produced by :func:concerto.training.ego_aht.train. curve.run_id is the filename stem.

  • report (SlopeReport) –

    The :class:SlopeReport produced by :func:check_positive_learning_slope. Stamped into the figure title and copied into the JSON sidecar.

  • out_dir (Path) –

    Output directory. Created if missing.

Returns:

  • Path

    (png_path, json_path) tuple — the absolute paths of the two

  • Path

    emitted files.

Source code in src/concerto/training/learning_signal_check.py
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
def plot_slope_curve(
    curve: RewardCurve,
    *,
    report: SlopeReport,
    out_dir: Path,
) -> tuple[Path, Path]:
    """Emit ``<run_id>.png`` + ``<run_id>.json`` for the slope plot (T4b.13; ADR-002 §Risks #1).

    Mirrors :func:`plot_reward_curve`'s contract but overlays the
    least-squares regression line instead of the moving-window mean.
    The status line in the title carries the slope, p-value, status,
    and pass/fail flag so ``docs/explanation/why-aht.md`` can embed
    the figure unchanged.

    Args:
        curve: The reward curve produced by
            :func:`concerto.training.ego_aht.train`. ``curve.run_id``
            is the filename stem.
        report: The :class:`SlopeReport` produced by
            :func:`check_positive_learning_slope`. Stamped into the
            figure title and copied into the JSON sidecar.
        out_dir: Output directory. Created if missing.

    Returns:
        ``(png_path, json_path)`` tuple — the absolute paths of the two
        emitted files.
    """
    import matplotlib  # noqa: PLC0415

    matplotlib.use("Agg", force=True)
    import matplotlib.pyplot as plt  # noqa: PLC0415

    out_dir.mkdir(parents=True, exist_ok=True)
    rewards = list(curve.per_episode_ego_rewards)

    fig, ax = plt.subplots(figsize=(8, 4.5))
    if rewards:
        x = list(range(len(rewards)))
        ax.plot(x, rewards, label="per-episode ego reward", alpha=0.35)
        ax.plot(
            x,
            [report.slope * xi + report.intercept for xi in x],
            label=(
                f"least-squares slope = {report.slope:+.4f}/episode (R²={report.r_squared:.3f})"
            ),
            linewidth=2,
        )
    ax.set_xlabel("episode index")
    ax.set_ylabel("ego reward (sum over episode)")
    status_label = report.status.name
    ax.set_title(
        f"Learning-signal check (slope) — run {curve.run_id}{status_label} "
        f"(p={report.p_value:.2e}, alpha={report.alpha:.2g}, "
        f"n={report.n_episodes})"
    )
    if rewards:
        ax.legend(loc="lower right")
    ax.grid(visible=True, alpha=0.3)
    fig.tight_layout()
    png_path = out_dir / f"{curve.run_id}.png"
    fig.savefig(png_path, dpi=120)
    plt.close(fig)

    json_path = out_dir / f"{curve.run_id}.json"
    json_path.write_text(
        json.dumps(
            {
                "run_id": curve.run_id,
                "per_episode_ego_rewards": rewards,
                "report": asdict(report),
            },
            sort_keys=True,
            indent=2,
            default=lambda o: o.value if isinstance(o, Enum) else str(o),
        ),
        encoding="utf-8",
    )
    return png_path, json_path

Deterministic seeding harness for CONCERTO (P6 reproducibility).

Every randomised piece of CONCERTO and CHAMBER (env wrappers, comm channel, partner zoo, training loop) draws from a numpy Generator seeded via :func:derive_substream. The function maps a (name, root_seed) pair to a fresh Substream whose RNG is independent of every other named substream under the same root seed; reusing the same pair always returns byte-identical draws.

Cited as ADR-002 §"deterministic seeding harness" (training stack) and ADR-003 §Decision (the comm channel uses derive_substream("comm.fixed", ...) to seed its substream).

Substream dataclass

Wraps a numpy SeedSequence with convenience constructors (P6).

ADR-002 §"deterministic seeding harness". The frozen dataclass makes the substream identity hashable; equality is structural over the underlying seed-sequence's entropy + spawn key, which is enough for caching.

Source code in src/concerto/training/seeding.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
@dataclass(frozen=True)
class Substream:
    """Wraps a numpy ``SeedSequence`` with convenience constructors (P6).

    ADR-002 §"deterministic seeding harness". The frozen dataclass makes the
    substream identity hashable; equality is structural over the underlying
    seed-sequence's entropy + spawn key, which is enough for caching.
    """

    seed_sequence: np.random.SeedSequence

    def default_rng(self) -> np.random.Generator:
        """Construct a fresh numpy ``Generator`` from this substream (P6).

        ADR-002 §"deterministic seeding harness". Two calls return independent
        generators that produce identical sequences when freshly built — i.e.,
        the substream is a *seed factory*, not an iterator.
        """
        return np.random.default_rng(self.seed_sequence)

    def spawn(self, n: int) -> list[Substream]:
        """Spawn ``n`` independent child substreams (P6).

        ADR-002 §"deterministic seeding harness". Use when a producer needs
        per-worker / per-episode RNG without colliding with sibling streams.
        """
        return [Substream(seed_sequence=child) for child in self.seed_sequence.spawn(n)]

default_rng

default_rng() -> np.random.Generator

Construct a fresh numpy Generator from this substream (P6).

ADR-002 §"deterministic seeding harness". Two calls return independent generators that produce identical sequences when freshly built — i.e., the substream is a seed factory, not an iterator.

Source code in src/concerto/training/seeding.py
49
50
51
52
53
54
55
56
def default_rng(self) -> np.random.Generator:
    """Construct a fresh numpy ``Generator`` from this substream (P6).

    ADR-002 §"deterministic seeding harness". Two calls return independent
    generators that produce identical sequences when freshly built — i.e.,
    the substream is a *seed factory*, not an iterator.
    """
    return np.random.default_rng(self.seed_sequence)

spawn

spawn(n: int) -> list[Substream]

Spawn n independent child substreams (P6).

ADR-002 §"deterministic seeding harness". Use when a producer needs per-worker / per-episode RNG without colliding with sibling streams.

Source code in src/concerto/training/seeding.py
58
59
60
61
62
63
64
def spawn(self, n: int) -> list[Substream]:
    """Spawn ``n`` independent child substreams (P6).

    ADR-002 §"deterministic seeding harness". Use when a producer needs
    per-worker / per-episode RNG without colliding with sibling streams.
    """
    return [Substream(seed_sequence=child) for child in self.seed_sequence.spawn(n)]

derive_substream

derive_substream(
    name: str, *, root_seed: int = 0
) -> Substream

Derive a deterministic named substream from a root seed (P6).

ADR-002 §"deterministic seeding harness"; ADR-003 §Decision (the comm channel uses derive_substream("comm.fixed", root_seed=...)).

Parameters:

  • name (str) –

    Stable label for the substream. Distinct names yield independent RNG streams under the same root_seed; reusing the pair (name, root_seed) always returns byte-identical draws.

  • root_seed (int, default: 0 ) –

    Project-wide root seed. Defaults to 0 so call sites can ship working defaults; production runs override per-experiment.

Returns:

  • A ( Substream ) –

    class:Substream whose :meth:Substream.default_rng is the

  • Substream

    canonical entry-point for downstream randomness.

Source code in src/concerto/training/seeding.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def derive_substream(name: str, *, root_seed: int = 0) -> Substream:
    """Derive a deterministic named substream from a root seed (P6).

    ADR-002 §"deterministic seeding harness"; ADR-003 §Decision (the comm
    channel uses ``derive_substream("comm.fixed", root_seed=...)``).

    Args:
        name: Stable label for the substream. Distinct names yield independent
            RNG streams under the same ``root_seed``; reusing the pair
            ``(name, root_seed)`` always returns byte-identical draws.
        root_seed: Project-wide root seed. Defaults to 0 so call sites can
            ship working defaults; production runs override per-experiment.

    Returns:
        A :class:`Substream` whose :meth:`Substream.default_rng` is the
        canonical entry-point for downstream randomness.
    """
    seed_seq = np.random.SeedSequence(
        entropy=root_seed,
        spawn_key=_name_to_spawn_key(name),
    )
    return Substream(seed_sequence=seed_seq)

Structured logging for ego-AHT training runs (ADR-002 §Decisions; plan/05 §2).

Every line emitted during a training run carries the RunContext fields (run_id, seed, git_sha, pyproject_hash) plus a per-event step so any line in any artefact can be traced back to the exact code + config + RNG state that produced it. Phase-0 ships two sinks:

  • A local JSONL file (logs/<run_id>.jsonl) — the offline-replayable fallback that does not depend on W&B being reachable. Every JSONL line is a self-contained valid JSON object.
  • An opt-in :class:wandb.sdk.wandb_run.Run sink — the caller passes a pre-constructed run object as the wandb_sink kwarg of :func:bind_run_logger. Phase-0 uses W&B's offline mode in tests so the sink can be exercised without a network round-trip. The four :class:RunContext provenance fields are written to W&B's config once at bind time (via :meth:WandbSink.set_config) rather than to every metric event, so they appear as run-level metadata rather than scalar metrics.

Determinism: :func:compute_run_metadata derives run_id from (seed, git_sha, pyproject_hash, run_kind) so two reruns with identical code + config + seed produce the same run_id (P6 reproducibility). The pyproject_hash is the SHA-256 of the project's pyproject.toml so a silent dependency drift produces a different run_id.

RunContext dataclass

Per-run provenance bundle bound to every log line (ADR-002 §Decisions; plan/05 §2; P6).

Attributes:

  • run_id (str) –

    Stable 16-hex-char hash derived from the other four fields via :func:compute_run_metadata. Used as the JSONL filename and as the W&B run name so artefacts collated by run_id from different sinks describe the same run.

  • seed (int) –

    Project-wide root seed; ego-AHT runs derive every RNG substream from this via :func:concerto.training.seeding.derive_substream.

  • git_sha (str) –

    Full SHA of the currently-checked-out commit, or :data:GIT_SHA_UNKNOWN when the working tree is not a git repository or the git binary is missing.

  • pyproject_hash (str) –

    SHA-256 of pyproject.toml (full hex digest). A silent dependency drift produces a different hash and therefore a different run_id.

  • run_kind (str) –

    Free-form label describing what this run is for (e.g. "empirical_guarantee", "zoo_seed"). Folded into the run_id hash so two distinct kinds at the same seed do not collide.

  • extra (dict[str, str]) –

    Optional free-form string-string metadata attached to every log line (e.g. task name, partner class). Read-only by convention; defaults to empty dict.

Source code in src/concerto/training/logging.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
@dataclass(frozen=True)
class RunContext:
    """Per-run provenance bundle bound to every log line (ADR-002 §Decisions; plan/05 §2; P6).

    Attributes:
        run_id: Stable 16-hex-char hash derived from the other four fields
            via :func:`compute_run_metadata`. Used as the JSONL filename
            and as the W&B run name so artefacts collated by ``run_id``
            from different sinks describe the same run.
        seed: Project-wide root seed; ego-AHT runs derive every RNG
            substream from this via :func:`concerto.training.seeding.derive_substream`.
        git_sha: Full SHA of the currently-checked-out commit, or
            :data:`GIT_SHA_UNKNOWN` when the working tree is not a git
            repository or the ``git`` binary is missing.
        pyproject_hash: SHA-256 of ``pyproject.toml`` (full hex digest).
            A silent dependency drift produces a different hash and
            therefore a different ``run_id``.
        run_kind: Free-form label describing what this run is for (e.g.
            ``"empirical_guarantee"``, ``"zoo_seed"``). Folded into the
            ``run_id`` hash so two distinct kinds at the same seed do not
            collide.
        extra: Optional free-form string-string metadata attached to every
            log line (e.g. task name, partner class). Read-only by
            convention; defaults to empty dict.
    """

    run_id: str
    seed: int
    git_sha: str
    pyproject_hash: str
    run_kind: str
    extra: dict[str, str] = field(default_factory=dict)
    #: Optional safety-stack telemetry summary (P1.04.5; ADR-007 §Stage 1b).
    #:
    #: ``None`` for pre-P1.04.5 callers; populated by
    #: :func:`concerto.training.ego_aht.train` at end-of-cell when
    #: ``safety_filter`` was wired (the
    #: :class:`concerto.training.safety_telemetry.SafetyAggregator`'s
    #: ``finalise()`` output). Forward-additive widening — existing
    #: JSONL parsers ignore the field; the new audit-gate hook reads
    #: the ``safety_telemetry_final`` JSONL event directly (more
    #: discoverable than reading the in-memory ``RunContext`` post-
    #: training). The ``RunContext`` carrying the field is the
    #: source-of-truth for in-process callers (e.g.
    #: :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`)
    #: that need the summary without re-parsing the JSONL.
    safety_telemetry: dict[str, object] | None = None

WandbSink

Bases: Protocol

Minimal subset of :class:wandb.sdk.wandb_run.Run we depend on (ADR-002 §Decisions).

Plan/05 §2: W&B is opt-in. The Protocol lets tests inject an in-memory fake without requiring the real wandb package to be importable.

Source code in src/concerto/training/logging.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
class WandbSink(Protocol):
    """Minimal subset of :class:`wandb.sdk.wandb_run.Run` we depend on (ADR-002 §Decisions).

    Plan/05 §2: W&B is opt-in. The Protocol lets tests inject an in-memory
    fake without requiring the real ``wandb`` package to be importable.
    """

    def log(self, data: Mapping[str, object], *, step: int | None = None) -> None:
        """Emit one structured event (W&B-side surface; ADR-002 §Decisions)."""
        ...  # pragma: no cover

    def set_config(self, config: Mapping[str, object]) -> None:
        """Store run-level metadata once at bind time (ADR-002 §Decisions; plan/05 §2).

        Called by :func:`bind_run_logger` to pin the four
        :class:`RunContext` provenance fields (``run_id``, ``seed``,
        ``git_sha``, ``pyproject_hash``) on the W&B run as metadata rather
        than as per-step metrics.
        """
        ...  # pragma: no cover

log

log(
    data: Mapping[str, object], *, step: int | None = None
) -> None

Emit one structured event (W&B-side surface; ADR-002 §Decisions).

Source code in src/concerto/training/logging.py
236
237
238
def log(self, data: Mapping[str, object], *, step: int | None = None) -> None:
    """Emit one structured event (W&B-side surface; ADR-002 §Decisions)."""
    ...  # pragma: no cover

set_config

set_config(config: Mapping[str, object]) -> None

Store run-level metadata once at bind time (ADR-002 §Decisions; plan/05 §2).

Called by :func:bind_run_logger to pin the four :class:RunContext provenance fields (run_id, seed, git_sha, pyproject_hash) on the W&B run as metadata rather than as per-step metrics.

Source code in src/concerto/training/logging.py
240
241
242
243
244
245
246
247
248
def set_config(self, config: Mapping[str, object]) -> None:
    """Store run-level metadata once at bind time (ADR-002 §Decisions; plan/05 §2).

    Called by :func:`bind_run_logger` to pin the four
    :class:`RunContext` provenance fields (``run_id``, ``seed``,
    ``git_sha``, ``pyproject_hash``) on the W&B run as metadata rather
    than as per-step metrics.
    """
    ...  # pragma: no cover

bind_run_logger

bind_run_logger(
    ctx: RunContext,
    *,
    jsonl_path: Path,
    wandb_sink: WandbSink | None = None,
) -> structlog.BoundLogger

Build a structlog logger that emits JSONL + optionally W&B (ADR-002 §Decisions; plan/05 §2).

The returned logger has every :class:RunContext field bound to it so every emitted line carries them automatically. Callers add per-event fields (e.g. step, ego_reward) via logger.info(...).

The function does NOT close the JSONL file; the caller owns its lifecycle (typically a with open(...) as fh: block held open for the duration of the run).

Parameters:

  • ctx (RunContext) –

    Run-level provenance bundle.

  • jsonl_path (Path) –

    Filesystem path where the JSONL fallback is written. Parent directory is created if missing.

  • wandb_sink (WandbSink | None, default: None ) –

    Optional :class:WandbSink (production: a wandb.run; tests: an in-memory fake). When None, only JSONL is emitted.

Returns:

  • A ( BoundLogger ) –

    class:structlog.BoundLogger wrapping the JSONL (and optionally

  • BoundLogger

    W&B) sinks. The logger is not module-global; callers explicitly

  • BoundLogger

    pass it through their training loop so unit tests can substitute

  • BoundLogger

    in-memory sinks freely.

Source code in src/concerto/training/logging.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def bind_run_logger(
    ctx: RunContext,
    *,
    jsonl_path: Path,
    wandb_sink: WandbSink | None = None,
) -> structlog.BoundLogger:
    """Build a structlog logger that emits JSONL + optionally W&B (ADR-002 §Decisions; plan/05 §2).

    The returned logger has every :class:`RunContext` field bound to it so
    every emitted line carries them automatically. Callers add per-event
    fields (e.g. ``step``, ``ego_reward``) via ``logger.info(...)``.

    The function does NOT close the JSONL file; the caller owns its
    lifecycle (typically a ``with open(...) as fh:`` block held open for
    the duration of the run).

    Args:
        ctx: Run-level provenance bundle.
        jsonl_path: Filesystem path where the JSONL fallback is written.
            Parent directory is created if missing.
        wandb_sink: Optional :class:`WandbSink` (production: a ``wandb.run``;
            tests: an in-memory fake). When ``None``, only JSONL is emitted.

    Returns:
        A :class:`structlog.BoundLogger` wrapping the JSONL (and optionally
        W&B) sinks. The logger is not module-global; callers explicitly
        pass it through their training loop so unit tests can substitute
        in-memory sinks freely.
    """
    jsonl_path.parent.mkdir(parents=True, exist_ok=True)
    fh = jsonl_path.open("w", encoding="utf-8")
    if wandb_sink is not None:
        # Pin run-level provenance once on the W&B run's config so the four
        # context fields don't appear as per-step metrics in charts.
        wandb_sink.set_config(
            {
                "run_id": ctx.run_id,
                "seed": ctx.seed,
                "git_sha": ctx.git_sha,
                "pyproject_hash": ctx.pyproject_hash,
                "run_kind": ctx.run_kind,
                **ctx.extra,
            }
        )
    processors: list[Any] = [structlog.processors.add_log_level]
    if wandb_sink is not None:
        processors.append(_WandbProcessor(wandb_sink))
    processors.append(_JSONLSink(fh))
    base = structlog.wrap_logger(
        logger=None,  # type: ignore[arg-type]  # structlog uses a no-op base when None.
        processors=processors,
    )
    return base.bind(
        run_id=ctx.run_id,
        seed=ctx.seed,
        git_sha=ctx.git_sha,
        pyproject_hash=ctx.pyproject_hash,
        run_kind=ctx.run_kind,
        **ctx.extra,
    )

compute_run_metadata

compute_run_metadata(
    *,
    seed: int,
    run_kind: str,
    repo_root: Path,
    extra: Mapping[str, str] | None = None,
) -> RunContext

Build :class:RunContext with auto-detected git/pyproject hashes (ADR-002; plan/05 §2).

The run_id is the first 16 hex chars of the SHA-256 of (seed, git_sha, pyproject_hash, run_kind), matching the 16-hex-char convention used for PartnerSpec.partner_id (plan/04 §3.1) so artefact filenames are uniform across the project.

Parameters:

  • seed (int) –

    Root seed; passed to :func:concerto.training.seeding.derive_substream upstream.

  • run_kind (str) –

    Free-form label (e.g. "empirical_guarantee").

  • repo_root (Path) –

    Working-tree directory to query for git_sha and pyproject.toml.

  • extra (Mapping[str, str] | None, default: None ) –

    Optional string-string metadata attached to every line.

Returns:

  • RunContext

    A frozen :class:RunContext ready to bind via :func:bind_run_logger.

Source code in src/concerto/training/logging.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def compute_run_metadata(
    *,
    seed: int,
    run_kind: str,
    repo_root: Path,
    extra: Mapping[str, str] | None = None,
) -> RunContext:
    """Build :class:`RunContext` with auto-detected git/pyproject hashes (ADR-002; plan/05 §2).

    The ``run_id`` is the first 16 hex chars of the SHA-256 of
    ``(seed, git_sha, pyproject_hash, run_kind)``, matching the
    16-hex-char convention used for ``PartnerSpec.partner_id`` (plan/04
    §3.1) so artefact filenames are uniform across the project.

    Args:
        seed: Root seed; passed to
            :func:`concerto.training.seeding.derive_substream` upstream.
        run_kind: Free-form label (e.g. ``"empirical_guarantee"``).
        repo_root: Working-tree directory to query for ``git_sha`` and
            ``pyproject.toml``.
        extra: Optional string-string metadata attached to every line.

    Returns:
        A frozen :class:`RunContext` ready to bind via :func:`bind_run_logger`.
    """
    git_sha = _detect_git_sha(repo_root)
    pyproject_hash = _compute_pyproject_hash(repo_root / "pyproject.toml")
    material = repr((seed, git_sha, pyproject_hash, run_kind))
    run_id = hashlib.sha256(material.encode("utf-8")).hexdigest()[:16]
    return RunContext(
        run_id=run_id,
        seed=seed,
        git_sha=git_sha,
        pyproject_hash=pyproject_hash,
        run_kind=run_kind,
        extra=dict(extra or {}),
    )

Checkpoint save/load with SHA-256 integrity (T4b.12; ADR-002 §Decisions).

The training stack writes torch state-dicts as .pt files referenced from M4a's :class:~chamber.partners.api.PartnerSpec weights_uri field (plan/04 §3.8). This module owns the wire format:

  • A local://artifacts/<name>.pt URI resolves to <artifacts_root>/<name>.pt.
  • Each .pt payload is paired with a <name>.pt.json metadata sidecar carrying :class:CheckpointMetadata (run_id, seed, step, git_sha, pyproject_hash, sha256). The sidecar is the authoritative provenance record; the .pt file is the opaque tensor payload.
  • :func:load_checkpoint re-hashes the loaded .pt payload and refuses to return tensors when the digest disagrees with the sidecar (ADR-002 §Decisions; P6 reproducibility).

Deploy order: payload first, sidecar second. A half-finished deploy (payload landed but sidecar not yet) hits the "missing sidecar" failure path and refuses to load tensors. The reverse order (sidecar first, stale payload) would silently load tensors with the wrong provenance — do not invert.

ADR-009 §Consequences "draft-zoo scoping": the M4a Phase-0 draft-zoo specs reference local://artifacts/mappo_seed42_step100k.pt and local://artifacts/happo_seed7_step50k.pt but the concrete frozen-RL adapters (T4.5 / T4.6) load them only in M4 Phase 3. Until then the SHA- verified loader fails loudly when the sidecar is missing — by design.

CheckpointError

Bases: RuntimeError

Loud-fail signal for checkpoint I/O failures (ADR-002 §Decisions).

Raised on: - URI scheme mismatch (only local:// is supported in Phase 0). - Missing .pt payload or missing sidecar JSON. - SHA-256 mismatch between the stored sidecar digest and the re-computed payload digest on load.

Subclasses :class:RuntimeError so a careless caller catching Exception still sees the failure but can disambiguate via isinstance(exc, CheckpointError).

Source code in src/concerto/training/checkpoints.py
58
59
60
61
62
63
64
65
66
67
68
69
70
class CheckpointError(RuntimeError):
    """Loud-fail signal for checkpoint I/O failures (ADR-002 §Decisions).

    Raised on:
    - URI scheme mismatch (only ``local://`` is supported in Phase 0).
    - Missing ``.pt`` payload or missing sidecar JSON.
    - SHA-256 mismatch between the stored sidecar digest and the
      re-computed payload digest on load.

    Subclasses :class:`RuntimeError` so a careless caller catching
    ``Exception`` still sees the failure but can disambiguate via
    ``isinstance(exc, CheckpointError)``.
    """

CheckpointMetadata dataclass

Provenance + integrity sidecar for one checkpoint (T4b.12; plan/05 §5).

Mirrors the :class:~concerto.training.logging.RunContext provenance bundle (run_id / seed / git_sha / pyproject_hash) plus the per- checkpoint step and the integrity-protecting sha256 of the .pt payload. All fields are required so a sidecar with missing keys raises :class:CheckpointError on load (loud-fail per ADR-002).

Attributes:

  • run_id (str) –

    16-hex hash of the run that produced this checkpoint. Cross-references the run's JSONL log file via :class:concerto.training.logging.RunContext.

  • seed (int) –

    Root seed of the producing run; same value passed to :func:concerto.training.seeding.derive_substream.

  • step (int) –

    Training step at which this checkpoint was taken (e.g. 50_000 for the canonical "50%-reward" tier in ADR-009 §Decision).

  • git_sha (str) –

    Full SHA of the commit the producing run was launched from. "unknown" is permitted (sentinel from :class:concerto.training.logging.GIT_SHA_UNKNOWN).

  • pyproject_hash (str) –

    SHA-256 of pyproject.toml at run launch. Detects silent dependency drift between save and load.

  • sha256 (str) –

    SHA-256 of the raw .pt payload bytes, computed at save time and verified on every load. Mismatch → :class:CheckpointError.

Source code in src/concerto/training/checkpoints.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
@dataclass(frozen=True)
class CheckpointMetadata:
    """Provenance + integrity sidecar for one checkpoint (T4b.12; plan/05 §5).

    Mirrors the :class:`~concerto.training.logging.RunContext` provenance
    bundle (run_id / seed / git_sha / pyproject_hash) plus the per-
    checkpoint ``step`` and the integrity-protecting ``sha256`` of the
    ``.pt`` payload. All fields are required so a sidecar with missing
    keys raises :class:`CheckpointError` on load (loud-fail per ADR-002).

    Attributes:
        run_id: 16-hex hash of the run that produced this checkpoint.
            Cross-references the run's JSONL log file via
            :class:`concerto.training.logging.RunContext`.
        seed: Root seed of the producing run; same value passed to
            :func:`concerto.training.seeding.derive_substream`.
        step: Training step at which this checkpoint was taken (e.g.
            50_000 for the canonical "50%-reward" tier in
            ADR-009 §Decision).
        git_sha: Full SHA of the commit the producing run was launched
            from. ``"unknown"`` is permitted (sentinel from
            :class:`concerto.training.logging.GIT_SHA_UNKNOWN`).
        pyproject_hash: SHA-256 of ``pyproject.toml`` at run launch.
            Detects silent dependency drift between save and load.
        sha256: SHA-256 of the raw ``.pt`` payload bytes, computed at
            save time and verified on every load. Mismatch →
            :class:`CheckpointError`.
    """

    run_id: str
    seed: int
    step: int
    git_sha: str
    pyproject_hash: str
    sha256: str

    @classmethod
    def from_dict(cls, raw: Mapping[str, object]) -> CheckpointMetadata:
        """Build a :class:`CheckpointMetadata` from a JSON-decoded mapping (ADR-002 §Decisions).

        Refuses partial sidecars: every field declared on the dataclass
        must be present, and types are coerced (``int``/``str``) without
        loss. Missing fields raise :class:`CheckpointError` so a half-
        written sidecar is caught at load time, not at first inference.

        Args:
            raw: A ``json.load(open(sidecar))``-style dict.

        Returns:
            A frozen :class:`CheckpointMetadata` instance.

        Raises:
            CheckpointError: If any required field is missing.
        """
        required = ("run_id", "seed", "step", "git_sha", "pyproject_hash", "sha256")
        missing = [k for k in required if k not in raw]
        if missing:
            raise CheckpointError(
                f"Checkpoint sidecar is missing required field(s): {missing}; "
                "expected the full provenance bundle (ADR-002 §Decisions)."
            )
        try:
            return cls(
                run_id=str(raw["run_id"]),
                seed=int(raw["seed"]),  # type: ignore[arg-type]
                step=int(raw["step"]),  # type: ignore[arg-type]
                git_sha=str(raw["git_sha"]),
                pyproject_hash=str(raw["pyproject_hash"]),
                sha256=str(raw["sha256"]),
            )
        except (TypeError, ValueError) as exc:
            # Hand-edited sidecar with malformed seed/step (e.g. "50000.0",
            # "abc") would otherwise raise an unwrapped ValueError; wrap so
            # callers catching CheckpointError see all sidecar-failure modes
            # uniformly (ADR-002 §Decisions).
            raise CheckpointError(
                f"Checkpoint sidecar contains malformed integer field(s): {exc}; "
                "expected ``seed`` and ``step`` to coerce to int (ADR-002 §Decisions)."
            ) from exc

from_dict classmethod

from_dict(raw: Mapping[str, object]) -> CheckpointMetadata

Build a :class:CheckpointMetadata from a JSON-decoded mapping (ADR-002 §Decisions).

Refuses partial sidecars: every field declared on the dataclass must be present, and types are coerced (int/str) without loss. Missing fields raise :class:CheckpointError so a half- written sidecar is caught at load time, not at first inference.

Parameters:

  • raw (Mapping[str, object]) –

    A json.load(open(sidecar))-style dict.

Returns:

Raises:

Source code in src/concerto/training/checkpoints.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
@classmethod
def from_dict(cls, raw: Mapping[str, object]) -> CheckpointMetadata:
    """Build a :class:`CheckpointMetadata` from a JSON-decoded mapping (ADR-002 §Decisions).

    Refuses partial sidecars: every field declared on the dataclass
    must be present, and types are coerced (``int``/``str``) without
    loss. Missing fields raise :class:`CheckpointError` so a half-
    written sidecar is caught at load time, not at first inference.

    Args:
        raw: A ``json.load(open(sidecar))``-style dict.

    Returns:
        A frozen :class:`CheckpointMetadata` instance.

    Raises:
        CheckpointError: If any required field is missing.
    """
    required = ("run_id", "seed", "step", "git_sha", "pyproject_hash", "sha256")
    missing = [k for k in required if k not in raw]
    if missing:
        raise CheckpointError(
            f"Checkpoint sidecar is missing required field(s): {missing}; "
            "expected the full provenance bundle (ADR-002 §Decisions)."
        )
    try:
        return cls(
            run_id=str(raw["run_id"]),
            seed=int(raw["seed"]),  # type: ignore[arg-type]
            step=int(raw["step"]),  # type: ignore[arg-type]
            git_sha=str(raw["git_sha"]),
            pyproject_hash=str(raw["pyproject_hash"]),
            sha256=str(raw["sha256"]),
        )
    except (TypeError, ValueError) as exc:
        # Hand-edited sidecar with malformed seed/step (e.g. "50000.0",
        # "abc") would otherwise raise an unwrapped ValueError; wrap so
        # callers catching CheckpointError see all sidecar-failure modes
        # uniformly (ADR-002 §Decisions).
        raise CheckpointError(
            f"Checkpoint sidecar contains malformed integer field(s): {exc}; "
            "expected ``seed`` and ``step`` to coerce to int (ADR-002 §Decisions)."
        ) from exc

load_checkpoint

load_checkpoint(
    *, uri: str, artifacts_root: Path
) -> tuple[dict[str, torch.Tensor], CheckpointMetadata]

Load a torch state-dict + verify SHA-256 (T4b.12; ADR-002 §Decisions; P6).

Refuses to return tensors when: - The .pt payload is missing. - The sidecar JSON is missing. - The sidecar is missing required fields (per :meth:CheckpointMetadata.from_dict). - The sidecar's stored sha256 disagrees with the recomputed digest of the on-disk payload.

All four failure modes raise :class:CheckpointError so a partial deployment (e.g. payload deployed but sidecar not yet) cannot silently load tensors that the project has no provenance for.

Parameters:

  • uri (str) –

    Source URI (e.g. "local://artifacts/happo_seed7_step50k.pt").

  • artifacts_root (Path) –

    Filesystem directory the URI resolves against.

Returns:

  • dict[str, Tensor]

    A tuple (state_dict, metadata) where state_dict is the

  • CheckpointMetadata

    verbatim mapping loaded from the .pt payload and

  • tuple[dict[str, Tensor], CheckpointMetadata]

    metadata is the verified :class:CheckpointMetadata.

Raises:

Source code in src/concerto/training/checkpoints.py
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
def load_checkpoint(
    *,
    uri: str,
    artifacts_root: Path,
) -> tuple[dict[str, torch.Tensor], CheckpointMetadata]:
    """Load a torch state-dict + verify SHA-256 (T4b.12; ADR-002 §Decisions; P6).

    Refuses to return tensors when:
    - The ``.pt`` payload is missing.
    - The sidecar JSON is missing.
    - The sidecar is missing required fields (per
      :meth:`CheckpointMetadata.from_dict`).
    - The sidecar's stored ``sha256`` disagrees with the recomputed
      digest of the on-disk payload.

    All four failure modes raise :class:`CheckpointError` so a partial
    deployment (e.g. payload deployed but sidecar not yet) cannot
    silently load tensors that the project has no provenance for.

    Args:
        uri: Source URI (e.g.
            ``"local://artifacts/happo_seed7_step50k.pt"``).
        artifacts_root: Filesystem directory the URI resolves against.

    Returns:
        A tuple ``(state_dict, metadata)`` where ``state_dict`` is the
        verbatim mapping loaded from the ``.pt`` payload and
        ``metadata`` is the verified :class:`CheckpointMetadata`.

    Raises:
        CheckpointError: On any of the four loud-fail conditions above.
    """
    payload_path = resolve_uri(uri, artifacts_root=artifacts_root)
    if not payload_path.exists():
        raise CheckpointError(
            f"Checkpoint payload not found at {payload_path} (resolved from {uri!r})."
        )
    sidecar = _sidecar_path(payload_path)
    if not sidecar.exists():
        raise CheckpointError(
            f"Checkpoint sidecar not found at {sidecar}; refusing to load tensors "
            "without a provenance record (ADR-002 §Decisions)."
        )
    metadata = CheckpointMetadata.from_dict(json.loads(sidecar.read_text(encoding="utf-8")))
    actual_digest = _sha256_of(payload_path)
    if actual_digest != metadata.sha256:
        raise CheckpointError(
            f"Checkpoint integrity check failed for {payload_path}: "
            f"sidecar sha256={metadata.sha256!r} but on-disk sha256={actual_digest!r}. "
            "The payload may have been corrupted, partially overwritten, or "
            "swapped without updating the sidecar (ADR-002 §Decisions; P6)."
        )
    # weights_only=True restricts the unpickler to safe tensor primitives,
    # so we do not execute arbitrary code from a tampered .pt file. Pinned
    # explicitly (rather than relying on the default) because the project
    # supports torch>=2.3, and the default flipped to True only at 2.4.
    state_dict = torch.load(
        payload_path,
        weights_only=True,
        map_location="cpu",
    )
    return state_dict, metadata

resolve_uri

resolve_uri(uri: str, *, artifacts_root: Path) -> Path

Resolve local://artifacts/<name>.pt URI to filesystem path (ADR-002; plan/04 §3.8).

Phase-0 supports the local:// scheme only. Non-local:// URIs raise :class:CheckpointError so Phase-1 distribution schemes (hf://, s3://) cannot silently fall through to the local filesystem before an ADR amendment is in place.

The path immediately under local:// is appended to artifacts_root; e.g. local://artifacts/foo.pt against artifacts_root=/data resolves to /data/artifacts/foo.pt. The leading artifacts/ segment in the URI is preserved by design — it makes the URI self-describing in logs and keeps the on-disk layout grep-able.

Parameters:

  • uri (str) –

    The weights_uri from a :class:~chamber.partners.api.PartnerSpec (plan/04 §3.8).

  • artifacts_root (Path) –

    Filesystem directory the URI is resolved against.

Returns:

  • Path

    Absolute filesystem path to the .pt payload (the file may

  • Path

    not exist yet — :func:save_checkpoint creates it).

Raises:

Source code in src/concerto/training/checkpoints.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
def resolve_uri(uri: str, *, artifacts_root: Path) -> Path:
    """Resolve ``local://artifacts/<name>.pt`` URI to filesystem path (ADR-002; plan/04 §3.8).

    Phase-0 supports the ``local://`` scheme only. Non-``local://`` URIs
    raise :class:`CheckpointError` so Phase-1 distribution schemes
    (``hf://``, ``s3://``) cannot silently fall through to the local
    filesystem before an ADR amendment is in place.

    The path immediately under ``local://`` is appended to ``artifacts_root``;
    e.g. ``local://artifacts/foo.pt`` against ``artifacts_root=/data`` resolves
    to ``/data/artifacts/foo.pt``. The leading ``artifacts/`` segment in the
    URI is preserved by design — it makes the URI self-describing in logs
    and keeps the on-disk layout grep-able.

    Args:
        uri: The ``weights_uri`` from a
            :class:`~chamber.partners.api.PartnerSpec` (plan/04 §3.8).
        artifacts_root: Filesystem directory the URI is resolved against.

    Returns:
        Absolute filesystem path to the ``.pt`` payload (the file may
        not exist yet — :func:`save_checkpoint` creates it).

    Raises:
        CheckpointError: If ``uri`` does not start with
            :data:`LOCAL_URI_SCHEME`.
    """
    if not uri.startswith(LOCAL_URI_SCHEME):
        raise CheckpointError(
            f"Unsupported URI scheme: {uri!r}. Phase-0 supports only "
            f"{LOCAL_URI_SCHEME!r} (plan/04 §3.8)."
        )
    relative = uri[len(LOCAL_URI_SCHEME) :]
    return artifacts_root / relative

save_checkpoint

save_checkpoint(
    *,
    state_dict: Mapping[str, Tensor],
    uri: str,
    metadata: CheckpointMetadata,
    artifacts_root: Path,
) -> Path

Save a torch state-dict + sidecar (T4b.12; ADR-002 §Decisions).

The save is two-phase: write the .pt payload first, then compute its SHA-256, write the sidecar JSON with the digest stitched in. The sha256 field on the input metadata is overwritten with the digest computed at save time so callers cannot accidentally ship a stale digest that would crash :func:load_checkpoint.

Parameters:

  • state_dict (Mapping[str, Tensor]) –

    A Mapping[str, torch.Tensor] exactly as produced by model.state_dict(). Saved verbatim via :func:torch.save.

  • uri (str) –

    Destination URI (e.g. "local://artifacts/happo_seed7_step50k.pt").

  • metadata (CheckpointMetadata) –

    Provenance bundle. The sha256 field is recomputed at save time and overrides whatever the caller passed.

  • artifacts_root (Path) –

    Filesystem directory the URI resolves against.

Returns:

  • Path

    Absolute filesystem path to the written .pt payload.

Raises:

Source code in src/concerto/training/checkpoints.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
def save_checkpoint(
    *,
    state_dict: Mapping[str, torch.Tensor],
    uri: str,
    metadata: CheckpointMetadata,
    artifacts_root: Path,
) -> Path:
    """Save a torch state-dict + sidecar (T4b.12; ADR-002 §Decisions).

    The save is two-phase: write the ``.pt`` payload first, then compute
    its SHA-256, write the sidecar JSON with the digest stitched in. The
    ``sha256`` field on the input ``metadata`` is overwritten with the
    digest computed at save time so callers cannot accidentally ship a
    stale digest that would crash :func:`load_checkpoint`.

    Args:
        state_dict: A ``Mapping[str, torch.Tensor]`` exactly as produced
            by ``model.state_dict()``. Saved verbatim via :func:`torch.save`.
        uri: Destination URI (e.g.
            ``"local://artifacts/happo_seed7_step50k.pt"``).
        metadata: Provenance bundle. The ``sha256`` field is recomputed
            at save time and overrides whatever the caller passed.
        artifacts_root: Filesystem directory the URI resolves against.

    Returns:
        Absolute filesystem path to the written ``.pt`` payload.

    Raises:
        CheckpointError: If ``uri`` is not a ``local://`` URI.
    """
    payload_path = resolve_uri(uri, artifacts_root=artifacts_root)
    payload_path.parent.mkdir(parents=True, exist_ok=True)
    torch.save(dict(state_dict), payload_path)
    digest = _sha256_of(payload_path)
    pinned = CheckpointMetadata(
        run_id=metadata.run_id,
        seed=metadata.seed,
        step=metadata.step,
        git_sha=metadata.git_sha,
        pyproject_hash=metadata.pyproject_hash,
        sha256=digest,
    )
    _sidecar_path(payload_path).write_text(
        json.dumps(asdict(pinned), sort_keys=True, indent=2),
        encoding="utf-8",
    )
    return payload_path

Hydra-driven config for ego-AHT training runs (T4b.11; ADR-002 §Decisions; plan/05 §2).

The training stack is configured via Hydra YAML files under configs/training/<algo>/<task>.yaml. This module ships the type-safe Pydantic v2 models that those YAMLs validate against: EgoAHTConfig is the root, with sub-models for the env / partner / W&B / hyperparameter blocks.

Why two layers (Hydra + Pydantic)?

  • Hydra owns the composition + CLI override surface (defaults, group selection, +key=value overrides).
  • Pydantic owns the validation + Python-side type-safety (Path coercion, range checks, frozen-by-default to keep call sites from mutating config mid-run).

:func:load_config is the canonical entry point: it locates the project's configs/ directory by walking up from the caller's CWD, composes the named config via Hydra, and validates it through :class:EgoAHTConfig.

ADR-002 §Decisions ("Hydra config root"): the YAML schema is part of the public reproducibility contract. Adding a required field is a breaking change to the config; deprecate-then-remove via a Pydantic :class:Field default and a DeprecationWarning rather than removing in place.

EgoAHTConfig

Bases: _FrozenModel

Root config for an ego-AHT training run (T4b.11; ADR-002 §Decisions).

Validates the composed Hydra config and exposes a frozen Python-side handle for :func:concerto.training.ego_aht.train.

Attributes:

  • algo (str) –

    Trainer-class registry key. Phase-0: "ego_aht_happo"; "ego_aht_hatd3" is the Phase-1 stub.

  • seed (int) –

    Project root seed routed to :func:concerto.training.seeding.derive_substream. Two runs with the same seed produce byte-identical CPU reward curves (plan/05 §6 criterion 7).

  • total_frames (int) –

    Training budget in env frames. T4b.13 = 100_000.

  • checkpoint_every (int) –

    Save a .pt artefact every K frames.

  • artifacts_root (Path) –

    Directory the local://artifacts/... URIs resolve against (plan/04 §3.8).

  • log_dir (Path) –

    Directory where the JSONL logs are written.

  • wandb (WandbConfig) –

    :class:WandbConfig.

  • env (EnvConfig) –

    :class:EnvConfig.

  • partner (PartnerConfig) –

    :class:PartnerConfig.

  • happo (HAPPOHyperparams) –

    :class:HAPPOHyperparams.

  • runtime (RuntimeConfig) –

    :class:RuntimeConfig — device + determinism knobs (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat). Defaults to device="auto" + deterministic_torch=True so an unset YAML on a Mac CPU dev box just works; the production mpe_cooperative_push.yaml pins device: cpu and stage0_smoke.yaml pins device: cuda.

Source code in src/concerto/training/config.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
class EgoAHTConfig(_FrozenModel):
    """Root config for an ego-AHT training run (T4b.11; ADR-002 §Decisions).

    Validates the composed Hydra config and exposes a frozen
    Python-side handle for :func:`concerto.training.ego_aht.train`.

    Attributes:
        algo: Trainer-class registry key. Phase-0: ``"ego_aht_happo"``;
            ``"ego_aht_hatd3"`` is the Phase-1 stub.
        seed: Project root seed routed to
            :func:`concerto.training.seeding.derive_substream`. Two
            runs with the same ``seed`` produce byte-identical CPU
            reward curves (plan/05 §6 criterion 7).
        total_frames: Training budget in env frames. T4b.13 = 100_000.
        checkpoint_every: Save a ``.pt`` artefact every K frames.
        artifacts_root: Directory the ``local://artifacts/...`` URIs
            resolve against (plan/04 §3.8).
        log_dir: Directory where the JSONL logs are written.
        wandb: :class:`WandbConfig`.
        env: :class:`EnvConfig`.
        partner: :class:`PartnerConfig`.
        happo: :class:`HAPPOHyperparams`.
        runtime: :class:`RuntimeConfig` — device + determinism knobs
            (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat).
            Defaults to ``device="auto"`` + ``deterministic_torch=True``
            so an unset YAML on a Mac CPU dev box just works; the
            production ``mpe_cooperative_push.yaml`` pins
            ``device: cpu`` and ``stage0_smoke.yaml`` pins
            ``device: cuda``.
    """

    algo: str = "ego_aht_happo"
    seed: int = Field(default=0, ge=0)
    total_frames: int = Field(default=100_000, gt=0)
    checkpoint_every: int = Field(default=10_000, gt=0)
    artifacts_root: Path = Path("./artifacts")
    log_dir: Path = Path("./logs")
    wandb: WandbConfig = Field(default_factory=WandbConfig)
    env: EnvConfig
    partner: PartnerConfig = Field(default_factory=PartnerConfig)
    happo: HAPPOHyperparams = Field(default_factory=HAPPOHyperparams)
    runtime: RuntimeConfig = Field(default_factory=RuntimeConfig)
    safety: SafetyConfig = Field(default_factory=SafetyConfig)

EnvConfig

Bases: _FrozenModel

Env-side configuration (T4b.13; ADR-002 §Decisions; plan/05 §3.5).

Attributes:

  • task (str) –

    Registered task name. Phase-0 supports "mpe_cooperative_push" (T4b.13's empirical-guarantee env) and "stage0_smoke" (T4b.14's zoo-seed env, deferred to user-side GPU run). P1.04 adds "stage1_pickplace" (ADR-007 §Stage 1b real-env science-evaluation target).

  • episode_length (int) –

    Truncation horizon in env ticks. The empirical- guarantee experiment defaults to 50 (matches PettingZoo simple_spread).

  • agent_uids (tuple[str, str]) –

    2-element list of uids the env exposes; the first is the ego, the second is the frozen partner.

  • condition_id (str | None) –

    Stage-1b pre-registered condition_id string (P1.04 / ADR-007 §Stage 1b). Required when task == "stage1_pickplace" because OM-homo and OM-hetero share the ("panda_wristcam", "fetch") agent_uids tuple — the condition_id is the explicit disambiguator. The yaml carries a sensible default; :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory overrides per (seed, condition) cell via model_copy. None for non-Stage-1b tasks (MPE, Stage-0) — pre-P1.04 cfgs work unchanged.

Source code in src/concerto/training/config.py
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
class EnvConfig(_FrozenModel):
    """Env-side configuration (T4b.13; ADR-002 §Decisions; plan/05 §3.5).

    Attributes:
        task: Registered task name. Phase-0 supports ``"mpe_cooperative_push"``
            (T4b.13's empirical-guarantee env) and ``"stage0_smoke"``
            (T4b.14's zoo-seed env, deferred to user-side GPU run).
            P1.04 adds ``"stage1_pickplace"`` (ADR-007 §Stage 1b real-env
            science-evaluation target).
        episode_length: Truncation horizon in env ticks. The empirical-
            guarantee experiment defaults to 50 (matches PettingZoo
            simple_spread).
        agent_uids: 2-element list of uids the env exposes; the first is
            the ego, the second is the frozen partner.
        condition_id: Stage-1b pre-registered condition_id string
            (P1.04 / ADR-007 §Stage 1b). Required when
            ``task == "stage1_pickplace"`` because OM-homo and
            OM-hetero share the ``("panda_wristcam", "fetch")``
            agent_uids tuple — the condition_id is the explicit
            disambiguator. The yaml carries a sensible default;
            :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            overrides per ``(seed, condition)`` cell via
            ``model_copy``. ``None`` for non-Stage-1b tasks (MPE,
            Stage-0) — pre-P1.04 cfgs work unchanged.
    """

    task: str
    episode_length: int = Field(default=50, gt=0)
    agent_uids: tuple[str, str] = ("ego", "partner")
    condition_id: str | None = None

    @field_validator("agent_uids", mode="before")
    @classmethod
    def _coerce_agent_uids(cls, v: object) -> object:
        # Hydra/OmegaConf parses YAML lists as ListConfig (subclass of list),
        # not tuple — coerce to keep the tuple-typed contract intact.
        if isinstance(v, list):
            return tuple(v)
        return v

    @field_validator("agent_uids")
    @classmethod
    def _check_distinct_uids(cls, v: tuple[str, str]) -> tuple[str, str]:
        if v[0] == v[1]:
            raise ValueError(f"agent_uids must be distinct; got {v!r}")
        return v

HAPPOHyperparams

Bases: _FrozenModel

On-policy ego-AHT HAPPO hyperparameters (ADR-002 §Decisions; plan/05 §3.2).

Defaults match the published HARL Bi-DexHands-style values where they translate; tuned conservatively for the small MPE task to keep the empirical-guarantee experiment within its 30-minute CPU budget (plan/05 §8).

Attributes:

  • lr (float) –

    Adam learning rate for the actor + critic.

  • gamma (float) –

    Discount factor.

  • gae_lambda (float) –

    GAE-λ for advantage estimation.

  • clip_eps (float) –

    PPO clipping epsilon.

  • n_epochs (int) –

    Number of optimisation epochs per rollout.

  • rollout_length (int) –

    Frames collected per epoch before the update step.

  • batch_size (int) –

    SGD minibatch size; rollout_length must be a multiple.

  • hidden_dim (int) –

    MLP hidden width for actor + critic.

Source code in src/concerto/training/config.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
class HAPPOHyperparams(_FrozenModel):
    """On-policy ego-AHT HAPPO hyperparameters (ADR-002 §Decisions; plan/05 §3.2).

    Defaults match the published HARL Bi-DexHands-style values where
    they translate; tuned conservatively for the small MPE task to keep
    the empirical-guarantee experiment within its 30-minute CPU budget
    (plan/05 §8).

    Attributes:
        lr: Adam learning rate for the actor + critic.
        gamma: Discount factor.
        gae_lambda: GAE-λ for advantage estimation.
        clip_eps: PPO clipping epsilon.
        n_epochs: Number of optimisation epochs per rollout.
        rollout_length: Frames collected per epoch before the update step.
        batch_size: SGD minibatch size; rollout_length must be a multiple.
        hidden_dim: MLP hidden width for actor + critic.
    """

    lr: float = Field(default=3.0e-4, gt=0.0)
    gamma: float = Field(default=0.99, ge=0.0, le=1.0)
    gae_lambda: float = Field(default=0.95, ge=0.0, le=1.0)
    clip_eps: float = Field(default=0.2, gt=0.0)
    n_epochs: int = Field(default=4, gt=0)
    rollout_length: int = Field(default=1024, gt=0)
    batch_size: int = Field(default=256, gt=0)
    hidden_dim: int = Field(default=64, gt=0)

PartnerConfig

Bases: _FrozenModel

Frozen partner spec (ADR-009 §Decision; plan/05 §3.5).

Mirrors the M4a :class:chamber.partners.api.PartnerSpec fields so the training loop can build a PartnerSpec directly from the config without re-validating.

Attributes:

  • class_name (str) –

    Registry key passed to :func:chamber.partners.registry.load_partner. Phase-0 empirical-guarantee runs use "scripted_heuristic".

  • seed (int) –

    Per-partner training seed; for scripted partners use 0.

  • checkpoint_step (int | None) –

    Training step at which the checkpoint was taken; None for scripted partners.

  • weights_uri (str | None) –

    local://... URI for frozen-RL partners; None for scripted partners.

  • extra (dict[str, str]) –

    Free-form string-string metadata routed to PartnerSpec.extra.

Source code in src/concerto/training/config.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class PartnerConfig(_FrozenModel):
    """Frozen partner spec (ADR-009 §Decision; plan/05 §3.5).

    Mirrors the M4a :class:`chamber.partners.api.PartnerSpec` fields so
    the training loop can build a ``PartnerSpec`` directly from the
    config without re-validating.

    Attributes:
        class_name: Registry key passed to
            :func:`chamber.partners.registry.load_partner`. Phase-0
            empirical-guarantee runs use ``"scripted_heuristic"``.
        seed: Per-partner training seed; for scripted partners use ``0``.
        checkpoint_step: Training step at which the checkpoint was
            taken; ``None`` for scripted partners.
        weights_uri: ``local://...`` URI for frozen-RL partners; ``None``
            for scripted partners.
        extra: Free-form string-string metadata routed to
            ``PartnerSpec.extra``.
    """

    class_name: str = "scripted_heuristic"
    seed: int = 0
    checkpoint_step: int | None = None
    weights_uri: str | None = None
    extra: dict[str, str] = Field(default_factory=dict)

RuntimeConfig

Bases: _FrozenModel

Device + perf knobs (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat).

Phase-0 development runs on Mac CPU (Apple M-series). Production zoo-seed runs target Linux + CUDA via scripts/repro/zoo_seed.sh (M4b-9b). The two regimes have different determinism contracts:

  • CPU: byte-identical across runs at the same seed (plan/05 §6 criterion 7; pinned by tests/integration/test_cpu_determinism.py). deterministic_torch=True calls :func:torch.use_deterministic_algorithms once at trainer construction so any non-deterministic CUDA-style op falls back to a deterministic implementation if one exists, or warns loudly otherwise (warn_only=True).
  • CUDA / MPS: plan/08 §4 grants the GPU determinism caveat — cuDNN / cuBLAS kernels are not bit-deterministic across hardware generations even with deterministic flags. Set deterministic_torch=False on GPU configs to disable the strict-mode flag and avoid the warning spam.

Attributes:

  • device (Literal['auto', 'cpu', 'cuda', 'mps']) –

    "auto" resolves via :func:chamber.utils.device.torch_device (CUDA > MPS > CPU). Explicit "cpu" / "cuda" / "mps" pins the choice and raises if unavailable (the trainer reports the ADR cite in the error message so a confused user searching for the message lands on ADR-002 §Decisions).

  • deterministic_torch (bool) –

    When True, the trainer calls :func:torch.use_deterministic_algorithms(True, warn_only=True) once at construction. Documented as a process-global side effect.

Source code in src/concerto/training/config.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
class RuntimeConfig(_FrozenModel):
    """Device + perf knobs (ADR-002 §Decisions; plan/08 §4 GPU determinism caveat).

    Phase-0 development runs on Mac CPU (Apple M-series). Production
    zoo-seed runs target Linux + CUDA via
    ``scripts/repro/zoo_seed.sh`` (M4b-9b). The two regimes have
    different determinism contracts:

    - **CPU**: byte-identical across runs at the same seed (plan/05 §6
      criterion 7; pinned by ``tests/integration/test_cpu_determinism.py``).
      ``deterministic_torch=True`` calls
      :func:`torch.use_deterministic_algorithms` once at trainer
      construction so any non-deterministic CUDA-style op falls back to
      a deterministic implementation if one exists, or warns loudly
      otherwise (``warn_only=True``).
    - **CUDA / MPS**: plan/08 §4 grants the GPU determinism caveat —
      cuDNN / cuBLAS kernels are not bit-deterministic across hardware
      generations even with deterministic flags. Set
      ``deterministic_torch=False`` on GPU configs to disable the
      strict-mode flag and avoid the warning spam.

    Attributes:
        device: ``"auto"`` resolves via
            :func:`chamber.utils.device.torch_device` (CUDA > MPS >
            CPU). Explicit ``"cpu"`` / ``"cuda"`` / ``"mps"`` pins the
            choice and raises if unavailable (the trainer reports the
            ADR cite in the error message so a confused user
            searching for the message lands on ADR-002 §Decisions).
        deterministic_torch: When ``True``, the trainer calls
            :func:`torch.use_deterministic_algorithms(True, warn_only=True)`
            once at construction. Documented as a process-global side
            effect.
    """

    device: Literal["auto", "cpu", "cuda", "mps"] = "auto"
    deterministic_torch: bool = True

SafetyConfig

Bases: _FrozenModel

Safety-stack runtime configuration for the training rollout (P1.04.5; ADR-007 §Stage 1b).

Phase-1 P1.04.5 wires the CBF-QP outer filter into :func:concerto.training.ego_aht.train per ADR-007 §Stage 1b implementation-details paragraph 2. The filter runs once per env step between the trainer's act and env.step; conformal slack state.lambda_ accumulates per cell and the :class:concerto.training.safety_telemetry.SafetyAggregator emits a safety_telemetry_final JSONL event at end-of-cell carrying the audit-gate predicate's inputs.

The block is read by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory's per-cell construction path; non-Stage-1b yaml configs default enabled=False so the filter doesn't engage outside its target use case (the Phase-0 MPE empirical-guarantee config + the Stage-0 smoke adapter both keep the un-filtered training rollout).

Kill-switch contract (D10 from the P1.04.5 Plan-subagent design): when enabled=False the training loop skips the filter entirely and emits safety_enabled=false in the JSONL summary. The audit-gate hook reads that flag and emits a non-failing "safety disabled by operator override; gate skipped" message — the operator's intent is preserved on the audit trail.

Attributes:

  • enabled (bool) –

    Master switch. True (default) wires the filter + telemetry into the training loop. False skips both; the audit gate reads safety_enabled and treats the cell as non-gated.

  • saturation_threshold (float) –

    Fraction of cartesian_accel_capacity above which the cell is considered saturated. Default 0.9 leaves a 10% margin — operator-tunable for the audit-gate predicate A. ADR-007 §Stage 1b implementation details paragraph 2 names the predicate verbatim (λ_steady_state < cartesian_accel_capacity); the 0.9 factor is the engineering safety margin.

  • cbf_gamma (float) –

    Class-K function gain forwarded to :func:concerto.safety.conformal.update_lambda_from_predictor. Default 5.0 matches the existing :data:concerto.safety.cbf_qp.DEFAULT_CBF_GAMMA.

  • lambda_safe (float) –

    Per-pair slack reset value at cell start (Phase-0 placeholder 0.0 per ADR-004 §Open questions; the derived form is a Stage-2 prerequisite).

  • n_warmup_steps (int) –

    Length of the warmup window after the cell starts (matches :data:concerto.safety.api.DEFAULT_WARMUP_STEPS).

  • predictor_kind (Literal['constant_velocity']) –

    Partner-trajectory predictor (Phase-0/1 default "constant_velocity" per :func:concerto.safety.conformal.constant_velocity_predict; AoI-conditioned variants are Phase-2 per ADR-004 §Open questions).

  • tau_brake (float) –

    Per-step braking-fallback TTC threshold in seconds (P1.04.6; ADR-007 §Stage 1b training-vs-deployment parity). When the smallest pairwise time-to-collision drops below this, :func:concerto.safety.braking.maybe_brake overrides the ego action with the per-uid emergency-controller's push-apart response — the per-step backstop that does not depend on QP feasibility (Wang-Ames-Egerstedt 2017 eq. 17; ADR-004 risk-mitigation #1). Default matches :data:concerto.safety.braking.DEFAULT_TAU_BRAKE (100 ms).

Source code in src/concerto/training/config.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
class SafetyConfig(_FrozenModel):
    """Safety-stack runtime configuration for the training rollout (P1.04.5; ADR-007 §Stage 1b).

    Phase-1 P1.04.5 wires the CBF-QP outer filter into
    :func:`concerto.training.ego_aht.train` per ADR-007 §Stage 1b
    implementation-details paragraph 2. The filter runs once per env
    step between the trainer's ``act`` and ``env.step``; conformal
    slack ``state.lambda_`` accumulates per cell and the
    :class:`concerto.training.safety_telemetry.SafetyAggregator` emits
    a ``safety_telemetry_final`` JSONL event at end-of-cell carrying
    the audit-gate predicate's inputs.

    The block is read by
    :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`'s
    per-cell construction path; non-Stage-1b yaml configs default
    ``enabled=False`` so the filter doesn't engage outside its target
    use case (the Phase-0 MPE empirical-guarantee config + the
    Stage-0 smoke adapter both keep the un-filtered training rollout).

    Kill-switch contract (D10 from the P1.04.5 Plan-subagent design):
    when ``enabled=False`` the training loop skips the filter entirely
    and emits ``safety_enabled=false`` in the JSONL summary. The
    audit-gate hook reads that flag and emits a non-failing
    "safety disabled by operator override; gate skipped" message — the
    operator's intent is preserved on the audit trail.

    Attributes:
        enabled: Master switch. ``True`` (default) wires the filter +
            telemetry into the training loop. ``False`` skips both;
            the audit gate reads ``safety_enabled`` and treats the
            cell as non-gated.
        saturation_threshold: Fraction of ``cartesian_accel_capacity``
            above which the cell is considered saturated. Default
            ``0.9`` leaves a 10% margin — operator-tunable for the
            audit-gate predicate A. ADR-007 §Stage 1b implementation
            details paragraph 2 names the predicate verbatim
            (``λ_steady_state < cartesian_accel_capacity``); the
            ``0.9`` factor is the engineering safety margin.
        cbf_gamma: Class-K function gain forwarded to
            :func:`concerto.safety.conformal.update_lambda_from_predictor`.
            Default ``5.0`` matches the existing
            :data:`concerto.safety.cbf_qp.DEFAULT_CBF_GAMMA`.
        lambda_safe: Per-pair slack reset value at cell start
            (Phase-0 placeholder ``0.0`` per ADR-004 §Open questions;
            the derived form is a Stage-2 prerequisite).
        n_warmup_steps: Length of the warmup window after the cell
            starts (matches
            :data:`concerto.safety.api.DEFAULT_WARMUP_STEPS`).
        predictor_kind: Partner-trajectory predictor (Phase-0/1
            default ``"constant_velocity"`` per
            :func:`concerto.safety.conformal.constant_velocity_predict`;
            AoI-conditioned variants are Phase-2 per ADR-004 §Open
            questions).
        tau_brake: Per-step braking-fallback TTC threshold in seconds
            (P1.04.6; ADR-007 §Stage 1b training-vs-deployment parity).
            When the smallest pairwise time-to-collision drops below
            this, :func:`concerto.safety.braking.maybe_brake` overrides
            the ego action with the per-uid emergency-controller's
            push-apart response — the per-step backstop that does not
            depend on QP feasibility (Wang-Ames-Egerstedt 2017 eq. 17;
            ADR-004 risk-mitigation #1). Default matches
            :data:`concerto.safety.braking.DEFAULT_TAU_BRAKE` (100 ms).
    """

    enabled: bool = False
    saturation_threshold: float = Field(default=0.9, gt=0.0, le=1.0)
    cbf_gamma: float = Field(default=5.0, gt=0.0)
    lambda_safe: float = 0.0
    n_warmup_steps: int = Field(default=50, ge=0)
    predictor_kind: Literal["constant_velocity"] = "constant_velocity"
    tau_brake: float = Field(default=0.100, gt=0.0)

WandbConfig

Bases: _FrozenModel

W&B sink toggle + project name (ADR-002 §Decisions; plan/05 §2).

Attributes:

  • enabled (bool) –

    When False (default), :func:bind_run_logger is built without a W&B sink and only the JSONL fallback fires. CI runs and unit tests should leave this off.

  • project (str) –

    W&B project name. Used only when enabled is True.

Source code in src/concerto/training/config.py
46
47
48
49
50
51
52
53
54
55
56
57
class WandbConfig(_FrozenModel):
    """W&B sink toggle + project name (ADR-002 §Decisions; plan/05 §2).

    Attributes:
        enabled: When ``False`` (default), :func:`bind_run_logger` is
            built without a W&B sink and only the JSONL fallback fires.
            CI runs and unit tests should leave this off.
        project: W&B project name. Used only when ``enabled`` is ``True``.
    """

    enabled: bool = False
    project: str = "concerto-m4b"

load_config

load_config(
    *, config_path: Path, overrides: list[str] | None = None
) -> EgoAHTConfig

Compose a Hydra config + validate via :class:EgoAHTConfig (T4b.11; ADR-002 §Decisions).

The function does NOT use Hydra's @hydra.main decorator (which takes over the process's CWD + argv); it uses the lower-level :class:hydra.compose API so the call is testable and re-entrant.

Parameters:

  • config_path (Path) –

    Absolute path to the YAML config file (e.g. configs/training/ego_aht_happo/mpe_cooperative_push.yaml). The directory becomes Hydra's config_dir and the stem becomes the config_name it loads. Splitting via the file path keeps the YAML's root-level keys at the root of the composed config (rather than Hydra's path-folding).

  • overrides (list[str] | None, default: None ) –

    Optional list of Hydra CLI-style overrides (e.g. ["seed=7", "happo.lr=1e-4"]).

Returns:

Raises:

  • ValidationError

    If the composed config violates the schema (missing required field, out-of-range hyperparam, duplicate agent uids, etc.).

Source code in src/concerto/training/config.py
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
def load_config(
    *,
    config_path: Path,
    overrides: list[str] | None = None,
) -> EgoAHTConfig:
    """Compose a Hydra config + validate via :class:`EgoAHTConfig` (T4b.11; ADR-002 §Decisions).

    The function does NOT use Hydra's ``@hydra.main`` decorator (which
    takes over the process's CWD + argv); it uses the lower-level
    :class:`hydra.compose` API so the call is testable and re-entrant.

    Args:
        config_path: Absolute path to the YAML config file (e.g.
            ``configs/training/ego_aht_happo/mpe_cooperative_push.yaml``).
            The directory becomes Hydra's ``config_dir`` and the
            stem becomes the ``config_name`` it loads. Splitting via the
            file path keeps the YAML's root-level keys at the root of
            the composed config (rather than Hydra's path-folding).
        overrides: Optional list of Hydra CLI-style overrides
            (e.g. ``["seed=7", "happo.lr=1e-4"]``).

    Returns:
        A frozen :class:`EgoAHTConfig` ready for
        :func:`concerto.training.ego_aht.train`.

    Raises:
        pydantic.ValidationError: If the composed config violates the
            schema (missing required field, out-of-range hyperparam,
            duplicate agent uids, etc.).
    """
    config_dir = config_path.parent.resolve()
    config_name = config_path.stem
    # Re-entrancy: another caller (or a leftover pytest fixture) may have
    # left ``GlobalHydra`` initialised. Clear it so initialize_config_dir's
    # ``with`` block can re-bind to the new directory cleanly.
    if GlobalHydra.instance().is_initialized():
        GlobalHydra.instance().clear()
    with initialize_config_dir(config_dir=str(config_dir), version_base="1.3"):
        cfg = compose(config_name=config_name, overrides=overrides or [])
    raw = OmegaConf.to_container(cfg, resolve=True)
    if not isinstance(raw, dict):
        raise TypeError(
            f"Composed Hydra config must be a mapping at root; got {type(raw).__name__}"
        )
    return EgoAHTConfig.model_validate(raw)

Ego-AHT training loop entry point (T4b.11; ADR-002 §Decisions; plan/05 §3.5).

The single user-facing API for an ego-AHT training run is :func:train. It wires up logging, seeding, per-step rollout collection, checkpoint emission, and the trainer-factory seam, and returns a :class:RewardCurve for T4b.13's empirical-guarantee assertion.

Dependency-direction discipline (project plan/10 §2 dependency-direction rule): concerto.* MUST NOT import from chamber.*. The training loop therefore takes the env and partner via dependency injection — the chamber-side runner (chamber.benchmarks.training_runner) builds the concrete env (:class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv) and partner (via :func:chamber.partners.registry.load_partner) and hands them to :func:train. That keeps every concerto.* arrow pointing only at lower layers.

Trainer-factory injection: this loop lives in concerto.* and so cannot import from chamber.* (project plan/10 §2). The real ego-PPO trainer (:class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer) wraps HARL's HAPPO and lives on the chamber side; the chamber-side :func:chamber.benchmarks.training_runner.run_training selects it as the default. :func:train therefore continues to accept trainer_factory and falls back to :class:RandomEgoTrainer when called directly without an injected factory — which keeps the loop unit-testable inside concerto.* without violating the dependency direction.

ADR-002 risk-mitigation #1: this loop is the substrate the empirical- guarantee assertion runs against. Do not silently drop steps, mutate the rollout collection mid-loop, or change the partner-frozen contract without a new ADR.

EgoTrainer

Bases: Protocol

Minimal contract the training loop expects of an ego trainer (T4b.11; ADR-002 §Decisions).

Two concrete implementations satisfy this Protocol:

  • :class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer — the real HARL-HAPPO-backed ego-PPO trainer (M4b-8a / plan/05 §3.5), and the default selected by :func:chamber.benchmarks.training_runner.run_training.
  • :class:RandomEgoTrainer — the Phase-0 reference fallback, used by :func:train when no trainer_factory is injected.

The Protocol is intentionally tiny (:meth:act + :meth:observe + :meth:update + :meth:state_dict) so the seam stays small enough that an external user can plug in their own algorithm without changing :func:train.

Source code in src/concerto/training/ego_aht.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
class EgoTrainer(Protocol):
    """Minimal contract the training loop expects of an ego trainer (T4b.11; ADR-002 §Decisions).

    Two concrete implementations satisfy this Protocol:

    - :class:`chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer` — the
      real HARL-HAPPO-backed ego-PPO trainer (M4b-8a / plan/05 §3.5),
      and the default selected by
      :func:`chamber.benchmarks.training_runner.run_training`.
    - :class:`RandomEgoTrainer` — the Phase-0 reference fallback, used
      by :func:`train` when no ``trainer_factory`` is injected.

    The Protocol is intentionally tiny (:meth:`act` + :meth:`observe` +
    :meth:`update` + :meth:`state_dict`) so the seam stays small enough
    that an external user can plug in their own algorithm without
    changing :func:`train`.
    """

    def act(
        self,
        obs: Mapping[str, Any],
        *,
        deterministic: bool = False,
    ) -> NDArray[np.floating]:
        """Sample (or pick deterministically) the ego action (ADR-002 §Decisions)."""
        ...  # pragma: no cover

    def observe(
        self,
        obs: Mapping[str, Any],
        reward: float,
        done: bool,
        *,
        truncated: bool = False,
    ) -> None:
        """Record a (s, a, r, s', done, truncated) tuple (ADR-002 §Decisions; Pardo 2017).

        ``done`` keeps the historical "this step ended the episode"
        meaning (terminated OR truncated). ``truncated`` is a
        keyword-only flag that distinguishes time-limit truncation from
        true termination — needed by PPO-style trainers to bootstrap the
        GAE target correctly (see :func:`chamber.benchmarks.ego_ppo_trainer.compute_gae`
        and project issue #62 for the root-cause writeup). Default
        ``truncated=False`` keeps legacy callers' behavior unchanged.
        """
        ...  # pragma: no cover

    def update(self) -> None:
        """Run one optimisation epoch on the buffered rollout (ADR-002 §Decisions)."""
        ...  # pragma: no cover

    def state_dict(self) -> dict[str, Any]:
        """Return the trainer's torch state-dict for checkpointing (ADR-002 §Decisions)."""
        ...  # pragma: no cover

act

act(
    obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]

Sample (or pick deterministically) the ego action (ADR-002 §Decisions).

Source code in src/concerto/training/ego_aht.py
154
155
156
157
158
159
160
161
def act(
    self,
    obs: Mapping[str, Any],
    *,
    deterministic: bool = False,
) -> NDArray[np.floating]:
    """Sample (or pick deterministically) the ego action (ADR-002 §Decisions)."""
    ...  # pragma: no cover

observe

observe(
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None

Record a (s, a, r, s', done, truncated) tuple (ADR-002 §Decisions; Pardo 2017).

done keeps the historical "this step ended the episode" meaning (terminated OR truncated). truncated is a keyword-only flag that distinguishes time-limit truncation from true termination — needed by PPO-style trainers to bootstrap the GAE target correctly (see :func:chamber.benchmarks.ego_ppo_trainer.compute_gae and project issue #62 for the root-cause writeup). Default truncated=False keeps legacy callers' behavior unchanged.

Source code in src/concerto/training/ego_aht.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def observe(
    self,
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None:
    """Record a (s, a, r, s', done, truncated) tuple (ADR-002 §Decisions; Pardo 2017).

    ``done`` keeps the historical "this step ended the episode"
    meaning (terminated OR truncated). ``truncated`` is a
    keyword-only flag that distinguishes time-limit truncation from
    true termination — needed by PPO-style trainers to bootstrap the
    GAE target correctly (see :func:`chamber.benchmarks.ego_ppo_trainer.compute_gae`
    and project issue #62 for the root-cause writeup). Default
    ``truncated=False`` keeps legacy callers' behavior unchanged.
    """
    ...  # pragma: no cover

state_dict

state_dict() -> dict[str, Any]

Return the trainer's torch state-dict for checkpointing (ADR-002 §Decisions).

Source code in src/concerto/training/ego_aht.py
187
188
189
def state_dict(self) -> dict[str, Any]:
    """Return the trainer's torch state-dict for checkpointing (ADR-002 §Decisions)."""
    ...  # pragma: no cover

update

update() -> None

Run one optimisation epoch on the buffered rollout (ADR-002 §Decisions).

Source code in src/concerto/training/ego_aht.py
183
184
185
def update(self) -> None:
    """Run one optimisation epoch on the buffered rollout (ADR-002 §Decisions)."""
    ...  # pragma: no cover

EnvLike

Bases: Protocol

Structural Gymnasium-multi-agent env contract the loop needs (T4b.11; ADR-002 §Decisions).

Defined locally so :func:train does not import a concrete class from chamber.* (project plan/10 §2 dependency-direction rule). The chamber-side :class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv satisfies this Protocol structurally.

Source code in src/concerto/training/ego_aht.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
@runtime_checkable
class EnvLike(Protocol):
    """Structural Gymnasium-multi-agent env contract the loop needs (T4b.11; ADR-002 §Decisions).

    Defined locally so :func:`train` does not import a concrete class
    from ``chamber.*`` (project plan/10 §2 dependency-direction rule). The
    chamber-side :class:`chamber.envs.mpe_cooperative_push.MPECooperativePushEnv`
    satisfies this Protocol structurally.
    """

    def reset(
        self,
        *,
        seed: int | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[Mapping[str, Any], Mapping[str, Any]]:
        """Reset to the start of an episode (T4b.11; ADR-002 §Decisions)."""
        ...  # pragma: no cover

    def step(
        self,
        action: dict[str, NDArray[np.floating]],
    ) -> tuple[Mapping[str, Any], float, bool, bool, Mapping[str, Any]]:
        """Advance one tick (T4b.11; ADR-002 §Decisions).

        Takes a concrete ``dict`` rather than a ``Mapping`` because the
        chamber-side :class:`MPECooperativePushEnv` and the future
        ManiSkill v3 wrappers all require dict-keyed access (mutation
        of the action dict is intentional in some wrappers' rate-limit
        / collision-injection paths).
        """
        ...  # pragma: no cover

reset

reset(
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[Mapping[str, Any], Mapping[str, Any]]

Reset to the start of an episode (T4b.11; ADR-002 §Decisions).

Source code in src/concerto/training/ego_aht.py
78
79
80
81
82
83
84
85
def reset(
    self,
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[Mapping[str, Any], Mapping[str, Any]]:
    """Reset to the start of an episode (T4b.11; ADR-002 §Decisions)."""
    ...  # pragma: no cover

step

step(
    action: dict[str, NDArray[floating]],
) -> tuple[
    Mapping[str, Any], float, bool, bool, Mapping[str, Any]
]

Advance one tick (T4b.11; ADR-002 §Decisions).

Takes a concrete dict rather than a Mapping because the chamber-side :class:MPECooperativePushEnv and the future ManiSkill v3 wrappers all require dict-keyed access (mutation of the action dict is intentional in some wrappers' rate-limit / collision-injection paths).

Source code in src/concerto/training/ego_aht.py
87
88
89
90
91
92
93
94
95
96
97
98
99
def step(
    self,
    action: dict[str, NDArray[np.floating]],
) -> tuple[Mapping[str, Any], float, bool, bool, Mapping[str, Any]]:
    """Advance one tick (T4b.11; ADR-002 §Decisions).

    Takes a concrete ``dict`` rather than a ``Mapping`` because the
    chamber-side :class:`MPECooperativePushEnv` and the future
    ManiSkill v3 wrappers all require dict-keyed access (mutation
    of the action dict is intentional in some wrappers' rate-limit
    / collision-injection paths).
    """
    ...  # pragma: no cover

PartnerLike

Bases: Protocol

Structural FrozenPartner contract the loop needs (T4b.11; ADR-009 §Decision).

Defined locally to keep concerto.* free of chamber.* imports. The M4a :class:chamber.partners.api.FrozenPartner satisfies this Protocol structurally — both the heuristic + the future frozen-RL adapters drop in unchanged.

The spec attribute is declared as :class:Any so concrete implementations can attach any spec type (M4a's :class:chamber.partners.api.PartnerSpec); the loop only uses it for diagnostic logging, never structurally. Any is required here rather than object because Protocol attributes are invariant and concrete subtypes (PartnerSpec) are not assignable to a mutable object annotation.

Source code in src/concerto/training/ego_aht.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
@runtime_checkable
class PartnerLike(Protocol):
    """Structural ``FrozenPartner`` contract the loop needs (T4b.11; ADR-009 §Decision).

    Defined locally to keep ``concerto.*`` free of ``chamber.*`` imports.
    The M4a :class:`chamber.partners.api.FrozenPartner` satisfies this
    Protocol structurally — both the heuristic + the future frozen-RL
    adapters drop in unchanged.

    The ``spec`` attribute is declared as :class:`Any` so concrete
    implementations can attach any spec type (M4a's
    :class:`chamber.partners.api.PartnerSpec`); the loop only uses it
    for diagnostic logging, never structurally. ``Any`` is required
    here rather than ``object`` because Protocol attributes are
    invariant and concrete subtypes (PartnerSpec) are not assignable to
    a mutable ``object`` annotation.
    """

    spec: Any

    def reset(self, *, seed: int | None = None) -> None:
        """Reset internal episode state (ADR-009 §Decision)."""
        ...  # pragma: no cover

    def act(
        self,
        obs: Mapping[str, Any],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Pick an action for the partner's own uid (ADR-009 §Decision)."""
        ...  # pragma: no cover

act

act(
    obs: Mapping[str, Any], *, deterministic: bool = True
) -> NDArray[np.floating]

Pick an action for the partner's own uid (ADR-009 §Decision).

Source code in src/concerto/training/ego_aht.py
126
127
128
129
130
131
132
133
def act(
    self,
    obs: Mapping[str, Any],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Pick an action for the partner's own uid (ADR-009 §Decision)."""
    ...  # pragma: no cover

reset

reset(*, seed: int | None = None) -> None

Reset internal episode state (ADR-009 §Decision).

Source code in src/concerto/training/ego_aht.py
122
123
124
def reset(self, *, seed: int | None = None) -> None:
    """Reset internal episode state (ADR-009 §Decision)."""
    ...  # pragma: no cover

RandomEgoTrainer

Reference :class:EgoTrainer that samples uniformly from the action space.

Exists so :func:train can be exercised end-to-end before the HARL fork lands (T4b.4-T4b.7). Not a real learner — :meth:update is a no-op, :meth:state_dict returns an empty dict — but the loop's plumbing (env stepping, partner act, checkpoint emission, JSONL logging) is fully exercised. T4b.13 will swap to EgoAHTHAPPO via the trainer-factory seam (ADR-002 §Decisions; plan/05 §3.5).

Source code in src/concerto/training/ego_aht.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
class RandomEgoTrainer:
    """Reference :class:`EgoTrainer` that samples uniformly from the action space.

    Exists so :func:`train` can be exercised end-to-end before the HARL
    fork lands (T4b.4-T4b.7). Not a real learner — :meth:`update` is a
    no-op, :meth:`state_dict` returns an empty dict — but the loop's
    plumbing (env stepping, partner act, checkpoint emission, JSONL
    logging) is fully exercised. T4b.13 will swap to ``EgoAHTHAPPO``
    via the trainer-factory seam (ADR-002 §Decisions; plan/05 §3.5).
    """

    def __init__(self, *, ego_uid: str, action_dim: int, root_seed: int) -> None:
        """Build a uniform-random reference trainer (T4b.11; ADR-002 §Decisions).

        Args:
            ego_uid: Stored for downstream identity propagation (the
                trainer doesn't read it at action time but the loop
                uses it as the dict-action key).
            action_dim: Length of the ego's action vector.
            root_seed: Project seed; the trainer's RNG is derived via
                :func:`concerto.training.seeding.derive_substream` for
                P6 reproducibility.
        """
        self._ego_uid = ego_uid
        self._action_dim = action_dim
        self._rng: np.random.Generator = derive_substream(
            "training.ego_random", root_seed=root_seed
        ).default_rng()

    def act(
        self,
        obs: Mapping[str, Any],
        *,
        deterministic: bool = False,
    ) -> NDArray[np.floating]:
        """Sample uniformly from [-1, 1]^action_dim (ADR-002 §Decisions; plan/05 §3.5)."""
        del obs
        if deterministic:
            return np.zeros(self._action_dim, dtype=np.float32)
        return self._rng.uniform(-1.0, 1.0, size=self._action_dim).astype(np.float32)

    def observe(
        self,
        obs: Mapping[str, Any],
        reward: float,
        done: bool,
        *,
        truncated: bool = False,
    ) -> None:
        """No-op (ADR-002 §Decisions): reference trainer has no rollout buffer."""
        del obs, reward, done, truncated

    def update(self) -> None:
        """No-op (ADR-002 §Decisions): reference trainer has no learnable params."""

    def state_dict(self) -> dict[str, Any]:
        """Return an empty dict (ADR-002 §Decisions: trainer is parameter-free)."""
        return {}

__init__

__init__(
    *, ego_uid: str, action_dim: int, root_seed: int
) -> None

Build a uniform-random reference trainer (T4b.11; ADR-002 §Decisions).

Parameters:

  • ego_uid (str) –

    Stored for downstream identity propagation (the trainer doesn't read it at action time but the loop uses it as the dict-action key).

  • action_dim (int) –

    Length of the ego's action vector.

  • root_seed (int) –

    Project seed; the trainer's RNG is derived via :func:concerto.training.seeding.derive_substream for P6 reproducibility.

Source code in src/concerto/training/ego_aht.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
def __init__(self, *, ego_uid: str, action_dim: int, root_seed: int) -> None:
    """Build a uniform-random reference trainer (T4b.11; ADR-002 §Decisions).

    Args:
        ego_uid: Stored for downstream identity propagation (the
            trainer doesn't read it at action time but the loop
            uses it as the dict-action key).
        action_dim: Length of the ego's action vector.
        root_seed: Project seed; the trainer's RNG is derived via
            :func:`concerto.training.seeding.derive_substream` for
            P6 reproducibility.
    """
    self._ego_uid = ego_uid
    self._action_dim = action_dim
    self._rng: np.random.Generator = derive_substream(
        "training.ego_random", root_seed=root_seed
    ).default_rng()

act

act(
    obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]

Sample uniformly from [-1, 1]^action_dim (ADR-002 §Decisions; plan/05 §3.5).

Source code in src/concerto/training/ego_aht.py
309
310
311
312
313
314
315
316
317
318
319
def act(
    self,
    obs: Mapping[str, Any],
    *,
    deterministic: bool = False,
) -> NDArray[np.floating]:
    """Sample uniformly from [-1, 1]^action_dim (ADR-002 §Decisions; plan/05 §3.5)."""
    del obs
    if deterministic:
        return np.zeros(self._action_dim, dtype=np.float32)
    return self._rng.uniform(-1.0, 1.0, size=self._action_dim).astype(np.float32)

observe

observe(
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None

No-op (ADR-002 §Decisions): reference trainer has no rollout buffer.

Source code in src/concerto/training/ego_aht.py
321
322
323
324
325
326
327
328
329
330
def observe(
    self,
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None:
    """No-op (ADR-002 §Decisions): reference trainer has no rollout buffer."""
    del obs, reward, done, truncated

state_dict

state_dict() -> dict[str, Any]

Return an empty dict (ADR-002 §Decisions: trainer is parameter-free).

Source code in src/concerto/training/ego_aht.py
335
336
337
def state_dict(self) -> dict[str, Any]:
    """Return an empty dict (ADR-002 §Decisions: trainer is parameter-free)."""
    return {}

update

update() -> None

No-op (ADR-002 §Decisions): reference trainer has no learnable params.

Source code in src/concerto/training/ego_aht.py
332
333
def update(self) -> None:
    """No-op (ADR-002 §Decisions): reference trainer has no learnable params."""

RewardCurve dataclass

Result of one ego-AHT training run (T4b.11; T4b.13 consumer; ADR-002 §Decisions).

Attributes:

  • run_id (str) –

    16-hex run identifier (from :class:concerto.training.logging.RunContext).

  • per_step_ego_rewards (list[float]) –

    One entry per env step. T4b.13's empirical- guarantee assertion runs over the moving-window-of-10 mean of this series.

  • per_episode_ego_rewards (list[float]) –

    Sum-of-step reward per episode. Coarser than per_step; the canonical "epoch reward" the empirical-guarantee plot embeds in docs/explanation/why-aht.md.

  • checkpoint_paths (list[Path]) –

    Absolute paths of every .pt artefact saved during the run. T4b.14 reads the canonical 50%-step file from this list.

Source code in src/concerto/training/ego_aht.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
@dataclass(frozen=True)
class RewardCurve:
    """Result of one ego-AHT training run (T4b.11; T4b.13 consumer; ADR-002 §Decisions).

    Attributes:
        run_id: 16-hex run identifier (from
            :class:`concerto.training.logging.RunContext`).
        per_step_ego_rewards: One entry per env step. T4b.13's empirical-
            guarantee assertion runs over the moving-window-of-10 mean
            of this series.
        per_episode_ego_rewards: Sum-of-step reward per episode. Coarser
            than ``per_step``; the canonical "epoch reward" the
            empirical-guarantee plot embeds in
            ``docs/explanation/why-aht.md``.
        checkpoint_paths: Absolute paths of every ``.pt`` artefact saved
            during the run. T4b.14 reads the canonical 50%-step file
            from this list.
    """

    run_id: str
    per_step_ego_rewards: list[float] = field(default_factory=list)
    per_episode_ego_rewards: list[float] = field(default_factory=list)
    checkpoint_paths: list[Path] = field(default_factory=list)

TrainerFactory

Bases: Protocol

Constructor that builds an :class:EgoTrainer (T4b.11; ADR-002 §Decisions).

The seam through which :func:train plugs in the algorithm. :func:chamber.benchmarks.training_runner.run_training selects :meth:EgoPPOTrainer.from_config <chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config> by default (M4b-8a / plan/05 §3.5).

The factory receives the partner so it can refuse to construct against a non-frozen partner before any tensor allocation (ADR-009 §Consequences; plan/05 §6 #3). The default factory :meth:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config calls :func:~chamber.benchmarks.ego_ppo_trainer._assert_partner_is_frozen on the partner as its first construction step.

Source code in src/concerto/training/ego_aht.py
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
class TrainerFactory(Protocol):
    """Constructor that builds an :class:`EgoTrainer` (T4b.11; ADR-002 §Decisions).

    The seam through which :func:`train` plugs in the algorithm.
    :func:`chamber.benchmarks.training_runner.run_training` selects
    :meth:`EgoPPOTrainer.from_config <chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config>`
    by default (M4b-8a / plan/05 §3.5).

    The factory receives the partner so it can refuse to construct
    against a non-frozen partner before any tensor allocation
    (ADR-009 §Consequences; plan/05 §6 #3). The default factory
    :meth:`~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config`
    calls :func:`~chamber.benchmarks.ego_ppo_trainer._assert_partner_is_frozen`
    on the partner as its first construction step.
    """

    def __call__(
        self,
        cfg: EgoAHTConfig,
        *,
        env: EnvLike,
        partner: PartnerLike,
        ego_uid: str,
    ) -> EgoTrainer:
        """Build a trainer (ADR-002 §Decisions; plan/05 §3.5; ADR-009 §Consequences)."""
        ...  # pragma: no cover

__call__

__call__(
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    ego_uid: str,
) -> EgoTrainer

Build a trainer (ADR-002 §Decisions; plan/05 §3.5; ADR-009 §Consequences).

Source code in src/concerto/training/ego_aht.py
208
209
210
211
212
213
214
215
216
217
def __call__(
    self,
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    ego_uid: str,
) -> EgoTrainer:
    """Build a trainer (ADR-002 §Decisions; plan/05 §3.5; ADR-009 §Consequences)."""
    ...  # pragma: no cover

TrainingResult

Bases: NamedTuple

Return value of :func:train and :func:chamber.benchmarks.training_runner.run_training.

Pairs the diagnostic reward stream (:class:RewardCurve) with the trained ego policy (:class:EgoTrainer) so Phase-1 callers like :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory (P1.04 / ADR-007 §Stage 1b) can wrap result.trainer.act in a per-step closure without paying a checkpoint round-trip cost. The trainer's internal state (HARL HAPPO actor + critic + optimisers) survives :func:train's scope through this return value rather than being reloaded from disk.

NamedTuple (not a bare tuple[RewardCurve, EgoTrainer]) so call sites use the field names — result.curve / result.trainer — instead of positional indexing. Tuple-unpack still works (curve, trainer = run_training(cfg)) for callers that prefer it.

Attributes:

  • curve (RewardCurve) –

    The diagnostic per-step + per-episode reward stream T4b.13's empirical-guarantee assertion runs over. Same shape as before P1.04 — this NamedTuple is a wrapper, not a replacement.

  • trainer (EgoTrainer) –

    The trained :class:EgoTrainer instance, with the HARL HAPPO actor weights at their post-training state. Use trainer.act(obs, deterministic=True) for evaluation rollouts. When :func:train is called without a trainer_factory (Phase-0 reference path), this is the :class:RandomEgoTrainer fallback (parameter-free).

Source code in src/concerto/training/ego_aht.py
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
class TrainingResult(NamedTuple):
    """Return value of :func:`train` and :func:`chamber.benchmarks.training_runner.run_training`.

    Pairs the diagnostic reward stream (:class:`RewardCurve`) with the
    trained ego policy (:class:`EgoTrainer`) so Phase-1 callers like
    :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
    (P1.04 / ADR-007 §Stage 1b) can wrap ``result.trainer.act`` in a
    per-step closure without paying a checkpoint round-trip cost. The
    trainer's internal state (HARL HAPPO actor + critic + optimisers)
    survives :func:`train`'s scope through this return value rather
    than being reloaded from disk.

    NamedTuple (not a bare ``tuple[RewardCurve, EgoTrainer]``) so call
    sites use the field names — ``result.curve`` /
    ``result.trainer`` — instead of positional indexing. Tuple-unpack
    still works (``curve, trainer = run_training(cfg)``) for callers
    that prefer it.

    Attributes:
        curve: The diagnostic per-step + per-episode reward stream
            T4b.13's empirical-guarantee assertion runs over. Same
            shape as before P1.04 — this NamedTuple is a wrapper, not
            a replacement.
        trainer: The trained :class:`EgoTrainer` instance, with the
            HARL HAPPO actor weights at their post-training state. Use
            ``trainer.act(obs, deterministic=True)`` for evaluation
            rollouts. When :func:`train` is called without a
            ``trainer_factory`` (Phase-0 reference path), this is the
            :class:`RandomEgoTrainer` fallback (parameter-free).
    """

    curve: RewardCurve
    trainer: EgoTrainer

train

train(
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    trainer_factory: TrainerFactory | None = None,
    repo_root: Path | None = None,
    safety_filter: EgoOnlySafetyFilter | None = None,
    safety_state: SafetyState | None = None,
    safety_bounds: Bounds | None = None,
    safety_snapshot_builder: Callable[
        [EnvLike], dict[str, AgentSnapshot]
    ]
    | None = None,
    safety_dt: float | None = None,
    safety_emergency_controllers: Mapping[
        str, EmergencyController
    ]
    | None = None,
) -> TrainingResult

Run one ego-AHT training loop (T4b.11; ADR-002 §Decisions; plan/05 §3.5).

Wires up logging, seeding, per-step rollout, checkpoint emission, and the trainer-factory seam. Returns a :class:TrainingResult (P1.04 / ADR-007 §Stage 1b) carrying both the diagnostic :class:RewardCurve and the trained :class:EgoTrainer instance, so Phase-1 callers (notably :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory) can wrap result.trainer.act in a per-cell closure without paying a checkpoint round-trip.

The pre-P1.04 return type was the bare :class:RewardCurve; the NamedTuple wrapper is backward-compatible at the tuple-unpack level (curve, trainer = train(...)) but reads more cleanly at named call sites (result.curve / result.trainer).

Parameters:

  • cfg (EgoAHTConfig) –

    Validated :class:~concerto.training.config.EgoAHTConfig.

  • env (EnvLike) –

    Concrete env instance satisfying the :class:EnvLike structural Protocol. Built upstream by :func:chamber.benchmarks.training_runner.build_env.

  • partner (PartnerLike) –

    Concrete frozen partner satisfying :class:PartnerLike. Built upstream by :func:chamber.benchmarks.training_runner.build_partner.

  • trainer_factory (TrainerFactory | None, default: None ) –

    Constructor that builds the :class:EgoTrainer. When None (default), uses :class:RandomEgoTrainer — the Phase-0 in-concerto.* fallback. The chamber-side wrapper :func:chamber.benchmarks.training_runner.run_training injects the M4b-8a :class:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer (the real HARL-HAPPO trainer; plan/05 §3.5) here.

  • repo_root (Path | None, default: None ) –

    Working-tree root for the run-metadata bundle (defaults to :func:pathlib.Path.cwd).

  • safety_filter (EgoOnlySafetyFilter | None, default: None ) –

    Optional CBF-QP outer filter (P1.04.5; ADR-007 §Stage 1b). All five safety kwargs must be passed together when cfg.safety.enabled=True; passing any with the cfg disabled (or partial sets) raises :class:ValueError (intent-mismatch contract).

  • safety_state (SafetyState | None, default: None ) –

    Optional :class:SafetyState — the conformal slack vector (mutated in place per step by :func:update_lambda_from_predictor).

  • safety_bounds (Bounds | None, default: None ) –

    Optional per-task :class:Bounds — the cell's CBF/Cartesian envelope (audit-gate predicate A's RHS is sourced from cartesian_accel_capacity).

  • safety_snapshot_builder (Callable[[EnvLike], dict[str, AgentSnapshot]] | None, default: None ) –

    Optional (env) -> dict[str, AgentSnapshot] callable that the loop invokes per step to build the per-uid Cartesian snapshot map the filter consumes via partner_predicted_states.

  • safety_dt (float | None, default: None ) –

    Optional control-step dt in seconds for the constant-velocity predictor lookahead + numerical- difference velocity. Typically env.control_timestep.

  • safety_emergency_controllers (Mapping[str, EmergencyController] | None, default: None ) –

    Optional per-uid :class:concerto.safety.emergency.EmergencyController dispatch map (P1.04.6; ADR-007 §Stage 1b Rev 8). Consulted by :func:concerto.safety.braking.maybe_brake at each per-step braking call to translate the aggregate Cartesian repulsion into the per-uid control-space override. None falls back to a per-uid :class:concerto.safety.emergency.CartesianAccelEmergencyController — correct for double-integrator agents but a silent dimension-mismatch foot-gun for 7-DOF arms. Wired by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory via :func:chamber.benchmarks.training_runner.build_emergency_controllers. Honoured only when cfg.safety.enabled=True AND the rest of the safety stack is wired.

Returns:

Source code in src/concerto/training/ego_aht.py
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
def train(  # noqa: PLR0912, PLR0915 - P1.04.5 added safety-stack integration to the canonical training loop; the body's branch + statement counts exceed the default thresholds because the safety path is interleaved per step. Refactoring into a SafetyFilterDriver collaborator is the Phase-2 cleanup path (see Plan-subagent D1 alternative).
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    trainer_factory: TrainerFactory | None = None,
    repo_root: Path | None = None,
    # ----- P1.04.5: safety-stack wiring (ADR-007 §Stage 1b). -----
    # All five default to ``None``; absence means pre-P1.04.5 behaviour
    # (the loop runs unfiltered against the partner). Presence requires
    # all five together — validated below. ``cfg.safety.enabled`` is the
    # operator-intent toggle; cfg-says-enabled-but-kwargs-not-passed and
    # cfg-says-disabled-but-kwargs-passed both loud-fail (intent
    # mismatch).
    safety_filter: EgoOnlySafetyFilter | None = None,
    safety_state: SafetyState | None = None,
    safety_bounds: Bounds | None = None,
    safety_snapshot_builder: (Callable[[EnvLike], dict[str, AgentSnapshot]] | None) = None,
    safety_dt: float | None = None,
    # ----- P1.04.6: braking-fallback parity (ADR-007 §Stage 1b Rev 8). -----
    # ``safety_emergency_controllers`` is the per-uid emergency-controller
    # dispatch map :func:`concerto.safety.braking.maybe_brake` consults.
    # When ``None`` the braking layer falls back to a per-uid
    # :class:`CartesianAccelEmergencyController` — correct for
    # double-integrator agents but a silent dimension-mismatch foot-gun
    # for 7-DOF arms; the chamber-side caller
    # (:class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`)
    # populates it via
    # :func:`chamber.benchmarks.training_runner.build_emergency_controllers`.
    # Read only when ``cfg.safety.enabled=True`` AND the other safety
    # kwargs are passed (the validation block below ties presence to
    # ``_safety_active``).
    safety_emergency_controllers: Mapping[str, EmergencyController] | None = None,
) -> TrainingResult:
    """Run one ego-AHT training loop (T4b.11; ADR-002 §Decisions; plan/05 §3.5).

    Wires up logging, seeding, per-step rollout, checkpoint emission, and
    the trainer-factory seam. Returns a :class:`TrainingResult` (P1.04 /
    ADR-007 §Stage 1b) carrying both the diagnostic
    :class:`RewardCurve` and the trained :class:`EgoTrainer` instance,
    so Phase-1 callers (notably
    :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`) can
    wrap ``result.trainer.act`` in a per-cell closure without paying
    a checkpoint round-trip.

    The pre-P1.04 return type was the bare :class:`RewardCurve`; the
    NamedTuple wrapper is backward-compatible at the tuple-unpack level
    (``curve, trainer = train(...)``) but reads more cleanly at named
    call sites (``result.curve`` / ``result.trainer``).

    Args:
        cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
        env: Concrete env instance satisfying the :class:`EnvLike`
            structural Protocol. Built upstream by
            :func:`chamber.benchmarks.training_runner.build_env`.
        partner: Concrete frozen partner satisfying :class:`PartnerLike`.
            Built upstream by
            :func:`chamber.benchmarks.training_runner.build_partner`.
        trainer_factory: Constructor that builds the
            :class:`EgoTrainer`. When ``None`` (default), uses
            :class:`RandomEgoTrainer` — the Phase-0 in-``concerto.*``
            fallback. The chamber-side wrapper
            :func:`chamber.benchmarks.training_runner.run_training`
            injects the M4b-8a
            :class:`~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer`
            (the real HARL-HAPPO trainer; plan/05 §3.5) here.
        repo_root: Working-tree root for the run-metadata bundle
            (defaults to :func:`pathlib.Path.cwd`).
        safety_filter: Optional CBF-QP outer filter (P1.04.5;
            ADR-007 §Stage 1b). All five safety kwargs must be passed
            together when ``cfg.safety.enabled=True``; passing any
            with the cfg disabled (or partial sets) raises
            :class:`ValueError` (intent-mismatch contract).
        safety_state: Optional :class:`SafetyState` — the conformal
            slack vector (mutated in place per step by
            :func:`update_lambda_from_predictor`).
        safety_bounds: Optional per-task :class:`Bounds` — the
            cell's CBF/Cartesian envelope (audit-gate predicate A's
            RHS is sourced from ``cartesian_accel_capacity``).
        safety_snapshot_builder: Optional ``(env) -> dict[str,
            AgentSnapshot]`` callable that the loop invokes per step
            to build the per-uid Cartesian snapshot map the filter
            consumes via ``partner_predicted_states``.
        safety_dt: Optional control-step dt in seconds for the
            constant-velocity predictor lookahead + numerical-
            difference velocity. Typically ``env.control_timestep``.
        safety_emergency_controllers: Optional per-uid
            :class:`concerto.safety.emergency.EmergencyController`
            dispatch map (P1.04.6; ADR-007 §Stage 1b Rev 8). Consulted
            by :func:`concerto.safety.braking.maybe_brake` at each
            per-step braking call to translate the aggregate Cartesian
            repulsion into the per-uid control-space override.
            ``None`` falls back to a per-uid
            :class:`concerto.safety.emergency.CartesianAccelEmergencyController`
            — correct for double-integrator agents but a silent
            dimension-mismatch foot-gun for 7-DOF arms. Wired by
            :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            via
            :func:`chamber.benchmarks.training_runner.build_emergency_controllers`.
            Honoured only when ``cfg.safety.enabled=True`` AND the rest
            of the safety stack is wired.

    Returns:
        :class:`TrainingResult` NamedTuple carrying ``curve`` (the
        :class:`RewardCurve` with per-step + per-episode ego rewards
        and saved checkpoint paths) and ``trainer`` (the trained
        :class:`EgoTrainer` instance whose ``act`` method evaluation
        rollouts wrap).
    """
    repo_root = repo_root or Path.cwd()
    ctx = compute_run_metadata(
        seed=cfg.seed,
        run_kind=cfg.algo,
        repo_root=repo_root,
        extra={
            "task": cfg.env.task,
            "partner_class": cfg.partner.class_name,
        },
    )
    jsonl_path = cfg.log_dir / f"{ctx.run_id}.jsonl"
    logger = bind_run_logger(ctx, jsonl_path=jsonl_path)
    logger.info(
        "training_start",
        total_frames=cfg.total_frames,
        checkpoint_every=cfg.checkpoint_every,
    )

    ego_uid, partner_uid = cfg.env.agent_uids
    if trainer_factory is None:
        trainer: EgoTrainer = RandomEgoTrainer(
            ego_uid=ego_uid,
            action_dim=_RANDOM_TRAINER_FALLBACK_ACTION_DIM,
            root_seed=cfg.seed,
        )
    else:
        trainer = trainer_factory(cfg, env=env, partner=partner, ego_uid=ego_uid)

    # P1.04.5: validate the safety-stack kwargs against cfg.safety.enabled
    # (ADR-007 §Stage 1b lifecycle contract). Intent-mismatch loud-fails:
    # cfg-says-enabled-but-some-kwarg-None and cfg-says-disabled-but-some-
    # kwarg-not-None both raise. Both-aligned paths (all five None when
    # disabled; all five non-None when enabled) proceed.
    _safety_kwargs_passed = (
        safety_filter is not None
        and safety_state is not None
        and safety_bounds is not None
        and safety_snapshot_builder is not None
        and safety_dt is not None
    )
    _safety_kwargs_partial = (
        safety_filter is not None
        or safety_state is not None
        or safety_bounds is not None
        or safety_snapshot_builder is not None
        or safety_dt is not None
    ) and not _safety_kwargs_passed
    if _safety_kwargs_partial:
        msg = (
            "train(): safety-stack kwargs must be passed as a complete set "
            "(safety_filter + safety_state + safety_bounds + "
            "safety_snapshot_builder + safety_dt) or all None. Partial "
            "wiring is an operator-intent mismatch."
        )
        raise ValueError(msg)
    if cfg.safety.enabled and not _safety_kwargs_passed:
        msg = (
            "train(): cfg.safety.enabled is True but the safety-stack kwargs "
            "were not passed. The chamber-side caller "
            "(chamber.benchmarks.stage1_common.TrainedPolicyFactory) wires "
            "them; tests that invoke train() directly with cfg.safety.enabled "
            "must construct the SafetyState + filter + bounds + snapshot "
            "builder + dt themselves (see "
            "chamber.benchmarks.training_runner.build_safety_*)."
        )
        raise ValueError(msg)
    if not cfg.safety.enabled and _safety_kwargs_passed:
        msg = (
            "train(): cfg.safety.enabled is False but safety-stack kwargs "
            "were passed. Either set cfg.safety.enabled=True (and run with "
            "the filter) or drop the kwargs (and run unfiltered). Mixed "
            "intent is loud-fail by design (ADR-007 §Stage 1b discipline)."
        )
        raise ValueError(msg)
    _safety_active: bool = cfg.safety.enabled and _safety_kwargs_passed

    obs, _ = env.reset(seed=cfg.seed)
    partner.reset(seed=cfg.seed)
    curve = RewardCurve(run_id=ctx.run_id)
    episode_reward_acc = 0.0
    episode_in_progress = True
    # Episode index is incremented on each terminated/truncated boundary,
    # NOT inferred from step // episode_length — the latter would silently
    # misnumber for envs that early-terminate (P6 anti-pattern flagged in
    # local pre-PR review).
    episode_index = 0

    # P1.04.5: safety-stack runtime state.
    aggregator = None
    snaps_prev: dict[str, AgentSnapshot] | None = None
    # The validation block above guarantees all five safety kwargs are
    # non-None iff _safety_active is True. The casts below narrow for
    # pyright across the loop body (S101 avoided by not using ``assert``).
    _filter = cast("EgoOnlySafetyFilter", safety_filter)
    _state = cast("SafetyState", safety_state)
    _bounds = cast("Bounds", safety_bounds)
    _builder = cast("Callable[[EnvLike], dict[str, AgentSnapshot]]", safety_snapshot_builder)
    _dt = cast("float", safety_dt)
    # Lazy imports to keep concerto.training.ego_aht Tier-1-import-safe;
    # the safety_telemetry module only resolves when train() is called,
    # not at concerto.training import time. ADR-004 §Decision: the
    # safety stack is itself a concerto.* module, so this keeps train()
    # importable for consumers that don't pull in safety_telemetry.
    # Imports are unconditional (not inside `if _safety_active:`) so
    # the symbols are bound for pyright's flow analysis across the
    # loop body; their *use* is still guarded by `_safety_active`.
    from concerto.safety.braking import maybe_brake  # noqa: PLC0415
    from concerto.safety.conformal import (  # noqa: PLC0415 - lazy by design (see comment above)
        constant_velocity_predict,
        update_lambda_from_predictor,
    )
    from concerto.safety.errors import ConcertoSafetyInfeasible  # noqa: PLC0415
    from concerto.training.safety_telemetry import SafetyAggregator  # noqa: PLC0415

    if _safety_active:
        _filter.reset(seed=cfg.seed)
        aggregator = SafetyAggregator(
            n_pairs=len(_state.lambda_),
            cartesian_accel_capacity=_bounds.cartesian_accel_capacity,
            saturation_threshold=cfg.safety.saturation_threshold,
        )

    for step in range(cfg.total_frames):
        ego_action_nominal = trainer.act(obs)
        partner_action = partner.act(obs)
        if _safety_active and aggregator is not None:
            snaps_now = _builder(env)
            # Conformal update against the previous step's snapshots
            # (skipped on first step + at episode boundaries, where
            # snaps_prev is None — the predictor cannot meaningfully
            # forecast across a reset).
            if snaps_prev is not None:
                update_lambda_from_predictor(
                    _state,
                    snaps_now=snaps_now,
                    snaps_prev=snaps_prev,
                    alpha_pair=2.0 * _bounds.cartesian_accel_capacity,
                    gamma=cfg.safety.cbf_gamma,
                    dt=_dt,
                    in_warmup=(_state.warmup_steps_remaining > 0),
                )
            # Forecast partner state for the next control step
            # (constant-velocity stub per ADR-004 §Decision).
            partner_predicted = {
                uid: constant_velocity_predict(snaps_now[uid], _dt)
                for uid in snaps_now
                if uid != ego_uid
            }
            # P1.04.6 (ADR-007 §Stage 1b Rev 8): per-step braking
            # fallback in front of the CBF-QP — matches the deployment-
            # time composition order in
            # ``tests/integration/test_safety_in_loop.py``. Independent
            # of QP feasibility (Wang-Ames-Egerstedt 2017 eq. 17). We
            # build a proposed-action dict over BOTH uids so
            # :func:`maybe_brake` evaluates pairwise TTC over the cell,
            # but only adopt the override for ``ego_uid`` — the partner
            # is frozen per ADR-009 §Decision and its action stays
            # untouched. ``safe_ego`` is what propagates into the
            # trainer's learning signal.
            proposed_for_braking = {
                ego_uid: np.asarray(ego_action_nominal, dtype=np.float64),
                partner_uid: np.asarray(partner_action, dtype=np.float64),
            }
            braking_override, braking_fired = maybe_brake(
                proposed_for_braking,
                snaps_now,
                bounds=_bounds,
                tau_brake=cfg.safety.tau_brake,
                emergency_controllers=safety_emergency_controllers,
            )
            if braking_fired and braking_override is not None:
                # Per-step backstop fired: take the ego override and
                # skip the QP for this step. ``fallback_fired`` (the
                # CBF-QP's internal 1-D projection fallback flag) stays
                # ``False`` — those two flags are independent telemetry
                # channels on the audit-gate (one is "QP ran and used
                # its internal fallback"; the other is "QP did not run,
                # braking took over"). ``qp_infeasible`` is also False:
                # the QP did not raise, it simply was not called.
                ego_action = np.asarray(braking_override[ego_uid], dtype=ego_action_nominal.dtype)
                fallback_fired = False
                qp_infeasible = False
            else:
                try:
                    safe_ego, info = _filter.filter(
                        np.asarray(ego_action_nominal, dtype=np.float64),
                        obs={
                            "agent_states": snaps_now,
                            "meta": {"partner_id": cfg.partner.class_name},
                        },
                        state=_state,
                        bounds=_bounds,
                        ego_uid=ego_uid,
                        partner_predicted_states=partner_predicted,
                        dt=_dt,
                    )
                    ego_action = np.asarray(safe_ego, dtype=ego_action_nominal.dtype)
                    fallback_fired = bool(info["fallback_fired"])
                    qp_infeasible = False
                except ConcertoSafetyInfeasible:
                    # Pre-P1.04.6 training-time policy was to pass the
                    # nominal action through; P1.04.6's braking layer
                    # now runs upstream of this branch, so reaching
                    # here means the QP raised AFTER braking didn't
                    # fire (i.e. TTC >= tau_brake but the QP still
                    # found no feasible projection — a tighter QP
                    # constraint than the braking TTC predicts).
                    # Surface as the unfiltered-passthrough still, so
                    # the trainer can learn from the consequence; the
                    # ``qp_infeasible`` flag is what audits this.
                    ego_action = ego_action_nominal
                    fallback_fired = False
                    qp_infeasible = True
            snaps_prev = snaps_now
            aggregator.observe(
                _state.lambda_,
                fallback_fired=fallback_fired,
                qp_infeasible=qp_infeasible,
                braking_fired=braking_fired,
            )
        else:
            ego_action = ego_action_nominal
        action = {ego_uid: ego_action, partner_uid: partner_action}
        # The trainer.observe(s') contract: this loop passes the *next*
        # observation (the post-step obs), reward, and done flag. M4b-7's
        # HARL trainer is responsible for buffering pre-step obs internally
        # by tracking the last act() input. Phase-0 simplification per
        # plan/05 §3.5; revisit if a future on-policy trainer needs the
        # explicit (s, a, r, s', done) tuple.
        obs, reward, terminated, truncated, _ = env.step(action)
        curve.per_step_ego_rewards.append(reward)
        episode_reward_acc += reward
        # Pardo 2017 / issue #62: thread terminated AND truncated
        # separately. ``done`` keeps the historical episode-boundary
        # meaning (terminated OR truncated) so the loop's reset logic is
        # unchanged; ``truncated`` is the kwarg-only flag PPO-style
        # trainers use to bootstrap GAE correctly at time-limit
        # boundaries.
        trainer.observe(obs, reward, terminated or truncated, truncated=truncated)

        if (step + 1) % cfg.happo.rollout_length == 0:
            trainer.update()
            if aggregator is not None:
                window = aggregator.flush_window_stats()
                logger.info("safety_telemetry", step=step + 1, **window)
            logger.info(
                "rollout_update",
                step=step + 1,
                last_reward=reward,
            )

        if (step + 1) % cfg.checkpoint_every == 0:
            ckpt_path = _save_run_checkpoint(
                cfg=cfg,
                ctx_run_id=ctx.run_id,
                seed=cfg.seed,
                step=step + 1,
                git_sha=ctx.git_sha,
                pyproject_hash=ctx.pyproject_hash,
                state_dict=trainer.state_dict(),
            )
            curve.checkpoint_paths.append(ckpt_path)
            logger.info("checkpoint_saved", step=step + 1, path=str(ckpt_path))

        if terminated or truncated:
            curve.per_episode_ego_rewards.append(episode_reward_acc)
            episode_reward_acc = 0.0
            episode_in_progress = False
            episode_index += 1
            # P1.04.5: reset snaps_prev so the conformal update on the
            # first step of the new episode does not diff across the
            # reset boundary (the cross-episode position delta would
            # produce a spurious large prediction-gap loss). SafetyState
            # itself is NOT reset — D3 of the design pass mandates
            # per-cell accumulation per Huriot-Sibai §IV.A.
            if _safety_active:
                snaps_prev = None
            # P6 reproducibility: derive a per-episode seed via a stable
            # substream rather than additive int mixing (cfg.seed +
            # episode_index would collide across runs). The substream
            # name pins the episode index uniquely under cfg.seed.
            episode_seed_int = int(
                derive_substream(f"training.episode.{episode_index}", root_seed=cfg.seed)
                .default_rng()
                .integers(0, 2**31 - 1)
            )
            obs, _ = env.reset(seed=episode_seed_int)
            partner.reset(seed=episode_seed_int)
            episode_in_progress = True

    # Tail-end episode (final partial chunk) — capture so the curve has
    # full coverage for the empirical-guarantee assertion. Use the
    # episode_in_progress sentinel rather than `episode_reward_acc != 0.0`
    # so a legitimately-zero accumulated reward isn't dropped.
    if episode_in_progress:
        curve.per_episode_ego_rewards.append(episode_reward_acc)

    # P1.04.5: emit the per-cell safety_telemetry_final summary the
    # audit-gate hook reads (predicates A + B; ADR-007 §Stage 1b).
    if aggregator is not None:
        final_summary = aggregator.finalise(
            safety_enabled=cfg.safety.enabled,
            predictor_kind=cfg.safety.predictor_kind,
        )
        event_name = str(final_summary.pop("event", "safety_telemetry_final"))
        logger.info(event_name, **final_summary)
    logger.info(
        "training_end",
        n_episodes=len(curve.per_episode_ego_rewards),
        n_checkpoints=len(curve.checkpoint_paths),
    )
    return TrainingResult(curve=curve, trainer=trainer)

CHAMBER — the benchmark

chamber.envs

CHAMBER environment wrappers above ManiSkill v3.

ADR-001 §Decision: extend ManiSkill v3 with three thin wrappers covering the four heterogeneity axes. ADR-005 §Decision: SAPIEN 3 + Warp-MPM is the physics substrate, inherited from ManiSkill v3.

ChamberEnvCompatibilityError

Bases: RuntimeError

Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.

ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate the gate; failing loudly forces ADR re-review.

Source code in src/chamber/envs/errors.py
11
12
13
14
15
16
class ChamberEnvCompatibilityError(RuntimeError):
    """Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.

    ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate
    the gate; failing loudly forces ADR re-review.
    """

CommShapingWrapper

Bases: Wrapper

Inject comm-channel output into the observation dict as obs["comm"].

Wraps any env whose observation space is a gym.spaces.Dict. On each reset and step, calls channel.encode(obs) and stores the result under the "comm" key. The channel itself controls all degradation semantics (latency, drop, jitter); this wrapper is a pure carrier.

ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.

Parameters:

  • env (Env) –

    Env with a gym.spaces.Dict observation space.

  • channel (CommChannel) –

    A :class:chamber.comm.api.CommChannel instance. In M1 this is a stub :class:chamber.comm.FixedFormatCommChannel that emits an empty packet shell. M2 fills the real fixed-format encoding.

Raises:

Source code in src/chamber/envs/comm_shaping.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class CommShapingWrapper(gym.Wrapper):  # type: ignore[type-arg]
    """Inject comm-channel output into the observation dict as ``obs["comm"]``.

    Wraps any env whose observation space is a ``gym.spaces.Dict``. On each
    reset and step, calls ``channel.encode(obs)`` and stores the result under
    the ``"comm"`` key. The channel itself controls all degradation semantics
    (latency, drop, jitter); this wrapper is a pure carrier.

    ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.

    Args:
        env: Env with a ``gym.spaces.Dict`` observation space.
        channel: A :class:`chamber.comm.api.CommChannel` instance. In M1 this
            is a stub :class:`chamber.comm.FixedFormatCommChannel` that emits
            an empty packet shell. M2 fills the real fixed-format encoding.

    Raises:
        ChamberEnvCompatibilityError: if the inner observation space is not a
            ``gym.spaces.Dict`` (ADR-001 §Risks).
    """

    def __init__(self, env: gym.Env, channel: CommChannel) -> None:  # type: ignore[type-arg]
        """Validate the obs space and extend it with a ``comm`` sub-space."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            raise ChamberEnvCompatibilityError(
                f"CommShapingWrapper requires a gym.spaces.Dict observation space; "
                f"got {type(env.observation_space).__name__}. See ADR-001 §Risks."
            )
        self._channel = channel
        self._extend_observation_space()

    def _extend_observation_space(self) -> None:
        self.observation_space = gym.spaces.Dict(
            {
                **self.env.observation_space.spaces,  # type: ignore[attr-defined]
                "comm": gym.spaces.Dict({}),  # M2 fills the real comm obs space.
            }
        )

    def reset(self, **kwargs: object) -> tuple[dict, dict]:  # type: ignore[override]
        """Reset the env and channel, injecting comm into the initial obs.

        ADR-001 §Decision item 3; ADR-003 §Decision.
        """
        obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
        self._channel.reset()
        obs = dict(obs)
        obs["comm"] = self._channel.encode(obs)
        return obs, info

    def step(self, action: object) -> tuple[dict, object, bool, bool, dict]:  # type: ignore[override]
        """Step the env and inject updated comm into obs.

        ADR-001 §Decision item 3; ADR-003 §Decision.
        """
        obs, reward, terminated, truncated, info = self.env.step(action)
        obs = dict(obs)
        obs["comm"] = self._channel.encode(obs)
        return obs, reward, terminated, truncated, info

__init__

__init__(env: Env, channel: CommChannel) -> None

Validate the obs space and extend it with a comm sub-space.

Source code in src/chamber/envs/comm_shaping.py
46
47
48
49
50
51
52
53
54
55
def __init__(self, env: gym.Env, channel: CommChannel) -> None:  # type: ignore[type-arg]
    """Validate the obs space and extend it with a ``comm`` sub-space."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        raise ChamberEnvCompatibilityError(
            f"CommShapingWrapper requires a gym.spaces.Dict observation space; "
            f"got {type(env.observation_space).__name__}. See ADR-001 §Risks."
        )
    self._channel = channel
    self._extend_observation_space()

reset

reset(**kwargs: object) -> tuple[dict, dict]

Reset the env and channel, injecting comm into the initial obs.

ADR-001 §Decision item 3; ADR-003 §Decision.

Source code in src/chamber/envs/comm_shaping.py
65
66
67
68
69
70
71
72
73
74
def reset(self, **kwargs: object) -> tuple[dict, dict]:  # type: ignore[override]
    """Reset the env and channel, injecting comm into the initial obs.

    ADR-001 §Decision item 3; ADR-003 §Decision.
    """
    obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
    self._channel.reset()
    obs = dict(obs)
    obs["comm"] = self._channel.encode(obs)
    return obs, info

step

step(
    action: object,
) -> tuple[dict, object, bool, bool, dict]

Step the env and inject updated comm into obs.

ADR-001 §Decision item 3; ADR-003 §Decision.

Source code in src/chamber/envs/comm_shaping.py
76
77
78
79
80
81
82
83
84
def step(self, action: object) -> tuple[dict, object, bool, bool, dict]:  # type: ignore[override]
    """Step the env and inject updated comm into obs.

    ADR-001 §Decision item 3; ADR-003 §Decision.
    """
    obs, reward, terminated, truncated, info = self.env.step(action)
    obs = dict(obs)
    obs["comm"] = self._channel.encode(obs)
    return obs, reward, terminated, truncated, info

ConditionConfig

Bases: NamedTuple

Resolved per-condition configuration (ADR-007 §Stage 1b).

Returned by :func:resolve_condition. Pure-Python NamedTuple so Tier-1 tests can construct fakes without any SAPIEN dependency.

Attributes:

  • condition_id (str) –

    The verbatim pre-registration condition_id string (one of the four Stage-1 strings).

  • agent_uids (tuple[str, str]) –

    Ordered (ego_uid, partner_uid) tuple. The ego is the agent the trainer updates / the safety filter authorises; the partner is the frozen / heuristic partner.

  • obs_mode (str) –

    ManiSkill v3 env-level obs_mode superset string passed to :class:BaseEnv.__init__. The OM channel-filter wrapper (:class:chamber.envs.stage1_obs_filter.Stage1OMChannelFilter) slices this superset per-uid to the condition's actual channel keep-set.

  • is_om_condition (bool) –

    True for the two OM condition_id strings (vision_only / vision_plus_force_torque_plus_proprio). Drives whether the channel-filter wrapper is engaged downstream.

  • ft_synthesis_enabled (bool) –

    True for the OM-hetero condition. The env always injects the synthesised force-torque channel into obs["agent"]["panda_wristcam"]["force_torque"]; the channel filter then masks or keeps it per condition. The flag here is informational only (consumed by the smoke scripts and the docstring report).

Source code in src/chamber/envs/stage1_pickplace.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
class ConditionConfig(NamedTuple):
    """Resolved per-condition configuration (ADR-007 §Stage 1b).

    Returned by :func:`resolve_condition`. Pure-Python NamedTuple so
    Tier-1 tests can construct fakes without any SAPIEN dependency.

    Attributes:
        condition_id: The verbatim pre-registration ``condition_id``
            string (one of the four Stage-1 strings).
        agent_uids: Ordered ``(ego_uid, partner_uid)`` tuple. The ego
            is the agent the trainer updates / the safety filter
            authorises; the partner is the frozen / heuristic partner.
        obs_mode: ManiSkill v3 env-level ``obs_mode`` superset string
            passed to :class:`BaseEnv.__init__`. The OM channel-filter
            wrapper (:class:`chamber.envs.stage1_obs_filter.Stage1OMChannelFilter`)
            slices this superset per-uid to the condition's actual
            channel keep-set.
        is_om_condition: ``True`` for the two OM ``condition_id`` strings
            (``vision_only`` / ``vision_plus_force_torque_plus_proprio``).
            Drives whether the channel-filter wrapper is engaged
            downstream.
        ft_synthesis_enabled: ``True`` for the OM-hetero condition. The
            env always *injects* the synthesised force-torque channel
            into ``obs["agent"]["panda_wristcam"]["force_torque"]``; the
            channel filter then masks or keeps it per condition. The
            flag here is informational only (consumed by the smoke
            scripts and the docstring report).
    """

    condition_id: str
    agent_uids: tuple[str, str]
    obs_mode: str
    is_om_condition: bool
    ft_synthesis_enabled: bool

MPECooperativePushEnv

Bases: Env

2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).

See module docstring for the task design. The env exposes the standard Gymnasium reset + step API with multi-agent dict actions / dict observations keyed by uid, matching :class:tests.fakes.FakeMultiAgentEnv so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).

Source code in src/chamber/envs/mpe_cooperative_push.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
class MPECooperativePushEnv(gym.Env):  # type: ignore[type-arg]
    """2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).

    See module docstring for the task design. The env exposes the standard
    Gymnasium ``reset`` + ``step`` API with multi-agent dict actions / dict
    observations keyed by ``uid``, matching :class:`tests.fakes.FakeMultiAgentEnv`
    so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).
    """

    metadata: ClassVar[dict[str, list[str]]] = {"render_modes": []}  # type: ignore[misc]

    #: Number of agents the env supports — fixed at 2 by the cooperative-coverage task.
    _N_AGENTS: ClassVar[int] = 2

    def __init__(
        self,
        *,
        agent_uids: tuple[str, str] = ("ego", "partner"),
        episode_length: int = DEFAULT_EPISODE_LENGTH,
        root_seed: int = 0,
    ) -> None:
        """Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

        Args:
            agent_uids: The two uids the env exposes. The default
                ``("ego", "partner")`` matches the ego-AHT training loop
                convention. Reordering swaps which agent the trainer
                updates and which is held frozen — the env is agnostic.
            episode_length: Truncation horizon in env ticks (default
                :data:`DEFAULT_EPISODE_LENGTH`).
            root_seed: Project-wide root seed; the env derives a
                deterministic numpy ``Generator`` from this via
                :func:`concerto.training.seeding.derive_substream`
                (P6 reproducibility).

        Raises:
            ValueError: If ``agent_uids`` does not have exactly two
                distinct entries.
        """
        super().__init__()
        if len(agent_uids) != self._N_AGENTS or agent_uids[0] == agent_uids[1]:
            raise ValueError(
                f"agent_uids must be a 2-tuple of distinct strings; got {agent_uids!r}"
            )
        self._uids: tuple[str, str] = agent_uids
        self._episode_length: int = int(episode_length)
        self._root_seed: int = int(root_seed)

        self.action_space = gym.spaces.Dict(
            {
                uid: gym.spaces.Box(
                    low=-VELOCITY_BOUND,
                    high=VELOCITY_BOUND,
                    shape=(2,),
                    dtype=np.float32,
                )
                for uid in self._uids
            }
        )
        # state channel: self_pos(2) + self_vel(2) + landmark_rel(4) + partner_rel(2) = 10.
        agent_state = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(10,), dtype=np.float32)
        self.observation_space = gym.spaces.Dict(
            {
                "agent": gym.spaces.Dict(
                    {uid: gym.spaces.Dict({"state": agent_state}) for uid in self._uids}
                )
            }
        )

        self._rng: np.random.Generator = derive_substream(
            _SUBSTREAM_NAME, root_seed=self._root_seed
        ).default_rng()
        # Per-episode mutable state — set by reset.
        self._step_count: int = 0
        self._positions: dict[str, NDArray[np.float64]] = {}
        self._velocities: dict[str, NDArray[np.float64]] = {}
        self._landmark_positions: NDArray[np.float64] = np.zeros((N_LANDMARKS, 2))

    def reset(
        self,
        *,
        seed: int | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[dict[str, Any], dict[str, Any]]:
        """Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

        .. note::

            This env's seeding semantics intentionally diverge from
            Gymnasium's convention (where ``reset(seed=K)`` is the
            *complete* seed source). Here, ``reset(seed=K)`` is folded
            with the constructor's ``root_seed`` via
            :func:`concerto.training.seeding.derive_substream` using the
            substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
            so two training runs at distinct ``root_seed`` values do
            *not* collide on the same per-episode RNG even at the same
            ``K``. This matches the project-wide seeding philosophy
            (P6 + ADR-002 §Decisions): the ``root_seed`` is the
            run-level pin, and per-episode ``seed`` is an episode index
            scoped under it.

        Args:
            seed: Per-episode seed (episode index, NOT a complete seed).
                When provided, re-derives the env's RNG via
                ``derive_substream(f"env.mpe_cooperative_push.episode.{seed}",
                root_seed=self._root_seed)``. When ``None``, continues
                from the RNG's existing state (Gymnasium convention).
            options: Reserved for the Gymnasium API; ignored.

        Returns:
            ``(obs, info)`` where ``obs`` matches the
            :attr:`observation_space` shape and ``info`` is empty.
        """
        del options
        if seed is not None:
            self._rng = derive_substream(
                f"{_SUBSTREAM_NAME}.episode.{seed}",
                root_seed=self._root_seed,
            ).default_rng()
        self._step_count = 0
        self._landmark_positions = self._rng.uniform(
            -POSITION_BOUND, POSITION_BOUND, size=(N_LANDMARKS, 2)
        )
        for uid in self._uids:
            self._positions[uid] = self._rng.uniform(-POSITION_BOUND, POSITION_BOUND, size=(2,))
            self._velocities[uid] = np.zeros(2, dtype=np.float64)
        return self._build_obs(), {}

    def step(
        self,
        action: dict[str, NDArray[np.floating]],
    ) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
        """Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

        Args:
            action: Dict keyed by ``uid``, each value a 2-D velocity
                command. Components are clipped to
                ``[-VELOCITY_BOUND, +VELOCITY_BOUND]`` before integration
                so a misbehaving partner cannot escape the plane.

        Returns:
            ``(obs, reward, terminated, truncated, info)``. ``reward`` is
            the *shared* cooperative-coverage scalar (both agents see the
            same value); the trainer's ego/partner split happens upstream
            in :func:`concerto.training.ego_aht.train`. ``terminated`` is
            always ``False`` (no terminal state); ``truncated`` is ``True``
            on the last tick of the episode.

        Raises:
            ValueError: If ``action`` is missing a required uid.
        """
        for uid in self._uids:
            if uid not in action:
                raise ValueError(
                    f"step() action missing uid {uid!r}; got keys {list(action.keys())}"
                )
        for uid in self._uids:
            a = np.clip(
                np.asarray(action[uid], dtype=np.float64),
                -VELOCITY_BOUND,
                VELOCITY_BOUND,
            )
            self._velocities[uid] = a
            self._positions[uid] = np.clip(
                self._positions[uid] + a * DT, -POSITION_BOUND, POSITION_BOUND
            )
        self._step_count += 1
        truncated = self._step_count >= self._episode_length
        return self._build_obs(), self._compute_reward(), False, truncated, {}

    def _build_obs(self) -> dict[str, Any]:
        """Pack the Gymnasium-conformant obs dict (plan/01 §3; plan/04 §3.4)."""
        out_agents: dict[str, dict[str, NDArray[np.float32]]] = {}
        for uid in self._uids:
            self_pos = self._positions[uid]
            self_vel = self._velocities[uid]
            landmark_rel = (self._landmark_positions - self_pos).flatten()
            partner_uid = self._uids[1] if uid == self._uids[0] else self._uids[0]
            partner_rel = self._positions[partner_uid] - self_pos
            state = np.concatenate([self_pos, self_vel, landmark_rel, partner_rel]).astype(
                np.float32
            )
            out_agents[uid] = {"state": state}
        return {"agent": out_agents}

    def _compute_reward(self) -> float:
        """Shared cooperative-coverage scalar (plan/05 §3.5).

        ``r = -sum_landmark min_agent dist(agent, landmark)``. Both
        agents always see the same value so the trainer can route it to
        the ego's policy update without a credit-assignment step.

        Edge case: when an agent sits exactly on a landmark, that
        landmark contributes ``0`` to the sum. Perfect coverage (one
        agent per landmark, both at distance ``0``) yields ``r = 0`` —
        the natural ceiling. The empirical-guarantee assertion in
        T4b.13 is on *moving-window-of-10 non-decreasing*, not on
        approaching this ceiling.

        ``np.linalg.norm`` matches PettingZoo ``simple_spread_v3``'s
        Euclidean reward (squared distance would change the gradient
        shape and break the "matches simple_spread" claim should
        Phase 1 ever cross-validate against the real env).
        """
        # Phase 1 TODO: cross-validate trajectory against
        # mpe2.simple_spread_v3 once the dep-auditor has cleared mpe2.
        total = 0.0
        for landmark in self._landmark_positions:
            distances = [
                float(np.linalg.norm(self._positions[uid] - landmark)) for uid in self._uids
            ]
            total += min(distances)
        return -total

__init__

__init__(
    *,
    agent_uids: tuple[str, str] = ("ego", "partner"),
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
) -> None

Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

Parameters:

  • agent_uids (tuple[str, str], default: ('ego', 'partner') ) –

    The two uids the env exposes. The default ("ego", "partner") matches the ego-AHT training loop convention. Reordering swaps which agent the trainer updates and which is held frozen — the env is agnostic.

  • episode_length (int, default: DEFAULT_EPISODE_LENGTH ) –

    Truncation horizon in env ticks (default :data:DEFAULT_EPISODE_LENGTH).

  • root_seed (int, default: 0 ) –

    Project-wide root seed; the env derives a deterministic numpy Generator from this via :func:concerto.training.seeding.derive_substream (P6 reproducibility).

Raises:

  • ValueError

    If agent_uids does not have exactly two distinct entries.

Source code in src/chamber/envs/mpe_cooperative_push.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def __init__(
    self,
    *,
    agent_uids: tuple[str, str] = ("ego", "partner"),
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
) -> None:
    """Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

    Args:
        agent_uids: The two uids the env exposes. The default
            ``("ego", "partner")`` matches the ego-AHT training loop
            convention. Reordering swaps which agent the trainer
            updates and which is held frozen — the env is agnostic.
        episode_length: Truncation horizon in env ticks (default
            :data:`DEFAULT_EPISODE_LENGTH`).
        root_seed: Project-wide root seed; the env derives a
            deterministic numpy ``Generator`` from this via
            :func:`concerto.training.seeding.derive_substream`
            (P6 reproducibility).

    Raises:
        ValueError: If ``agent_uids`` does not have exactly two
            distinct entries.
    """
    super().__init__()
    if len(agent_uids) != self._N_AGENTS or agent_uids[0] == agent_uids[1]:
        raise ValueError(
            f"agent_uids must be a 2-tuple of distinct strings; got {agent_uids!r}"
        )
    self._uids: tuple[str, str] = agent_uids
    self._episode_length: int = int(episode_length)
    self._root_seed: int = int(root_seed)

    self.action_space = gym.spaces.Dict(
        {
            uid: gym.spaces.Box(
                low=-VELOCITY_BOUND,
                high=VELOCITY_BOUND,
                shape=(2,),
                dtype=np.float32,
            )
            for uid in self._uids
        }
    )
    # state channel: self_pos(2) + self_vel(2) + landmark_rel(4) + partner_rel(2) = 10.
    agent_state = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(10,), dtype=np.float32)
    self.observation_space = gym.spaces.Dict(
        {
            "agent": gym.spaces.Dict(
                {uid: gym.spaces.Dict({"state": agent_state}) for uid in self._uids}
            )
        }
    )

    self._rng: np.random.Generator = derive_substream(
        _SUBSTREAM_NAME, root_seed=self._root_seed
    ).default_rng()
    # Per-episode mutable state — set by reset.
    self._step_count: int = 0
    self._positions: dict[str, NDArray[np.float64]] = {}
    self._velocities: dict[str, NDArray[np.float64]] = {}
    self._landmark_positions: NDArray[np.float64] = np.zeros((N_LANDMARKS, 2))

reset

reset(
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]

Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

.. note::

This env's seeding semantics intentionally diverge from
Gymnasium's convention (where ``reset(seed=K)`` is the
*complete* seed source). Here, ``reset(seed=K)`` is folded
with the constructor's ``root_seed`` via
:func:`concerto.training.seeding.derive_substream` using the
substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
so two training runs at distinct ``root_seed`` values do
*not* collide on the same per-episode RNG even at the same
``K``. This matches the project-wide seeding philosophy
(P6 + ADR-002 §Decisions): the ``root_seed`` is the
run-level pin, and per-episode ``seed`` is an episode index
scoped under it.

Parameters:

  • seed (int | None, default: None ) –

    Per-episode seed (episode index, NOT a complete seed). When provided, re-derives the env's RNG via derive_substream(f"env.mpe_cooperative_push.episode.{seed}", root_seed=self._root_seed). When None, continues from the RNG's existing state (Gymnasium convention).

  • options (dict[str, Any] | None, default: None ) –

    Reserved for the Gymnasium API; ignored.

Returns:

  • dict[str, Any]

    (obs, info) where obs matches the

  • dict[str, Any]

    attr:observation_space shape and info is empty.

Source code in src/chamber/envs/mpe_cooperative_push.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def reset(
    self,
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]:
    """Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

    .. note::

        This env's seeding semantics intentionally diverge from
        Gymnasium's convention (where ``reset(seed=K)`` is the
        *complete* seed source). Here, ``reset(seed=K)`` is folded
        with the constructor's ``root_seed`` via
        :func:`concerto.training.seeding.derive_substream` using the
        substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
        so two training runs at distinct ``root_seed`` values do
        *not* collide on the same per-episode RNG even at the same
        ``K``. This matches the project-wide seeding philosophy
        (P6 + ADR-002 §Decisions): the ``root_seed`` is the
        run-level pin, and per-episode ``seed`` is an episode index
        scoped under it.

    Args:
        seed: Per-episode seed (episode index, NOT a complete seed).
            When provided, re-derives the env's RNG via
            ``derive_substream(f"env.mpe_cooperative_push.episode.{seed}",
            root_seed=self._root_seed)``. When ``None``, continues
            from the RNG's existing state (Gymnasium convention).
        options: Reserved for the Gymnasium API; ignored.

    Returns:
        ``(obs, info)`` where ``obs`` matches the
        :attr:`observation_space` shape and ``info`` is empty.
    """
    del options
    if seed is not None:
        self._rng = derive_substream(
            f"{_SUBSTREAM_NAME}.episode.{seed}",
            root_seed=self._root_seed,
        ).default_rng()
    self._step_count = 0
    self._landmark_positions = self._rng.uniform(
        -POSITION_BOUND, POSITION_BOUND, size=(N_LANDMARKS, 2)
    )
    for uid in self._uids:
        self._positions[uid] = self._rng.uniform(-POSITION_BOUND, POSITION_BOUND, size=(2,))
        self._velocities[uid] = np.zeros(2, dtype=np.float64)
    return self._build_obs(), {}

step

step(
    action: dict[str, NDArray[floating]],
) -> tuple[
    dict[str, Any], float, bool, bool, dict[str, Any]
]

Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

Parameters:

  • action (dict[str, NDArray[floating]]) –

    Dict keyed by uid, each value a 2-D velocity command. Components are clipped to [-VELOCITY_BOUND, +VELOCITY_BOUND] before integration so a misbehaving partner cannot escape the plane.

Returns:

  • dict[str, Any]

    (obs, reward, terminated, truncated, info). reward is

  • float

    the shared cooperative-coverage scalar (both agents see the

  • bool

    same value); the trainer's ego/partner split happens upstream

  • in ( bool ) –

    func:concerto.training.ego_aht.train. terminated is

  • dict[str, Any]

    always False (no terminal state); truncated is True

  • tuple[dict[str, Any], float, bool, bool, dict[str, Any]]

    on the last tick of the episode.

Raises:

  • ValueError

    If action is missing a required uid.

Source code in src/chamber/envs/mpe_cooperative_push.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def step(
    self,
    action: dict[str, NDArray[np.floating]],
) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
    """Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

    Args:
        action: Dict keyed by ``uid``, each value a 2-D velocity
            command. Components are clipped to
            ``[-VELOCITY_BOUND, +VELOCITY_BOUND]`` before integration
            so a misbehaving partner cannot escape the plane.

    Returns:
        ``(obs, reward, terminated, truncated, info)``. ``reward`` is
        the *shared* cooperative-coverage scalar (both agents see the
        same value); the trainer's ego/partner split happens upstream
        in :func:`concerto.training.ego_aht.train`. ``terminated`` is
        always ``False`` (no terminal state); ``truncated`` is ``True``
        on the last tick of the episode.

    Raises:
        ValueError: If ``action`` is missing a required uid.
    """
    for uid in self._uids:
        if uid not in action:
            raise ValueError(
                f"step() action missing uid {uid!r}; got keys {list(action.keys())}"
            )
    for uid in self._uids:
        a = np.clip(
            np.asarray(action[uid], dtype=np.float64),
            -VELOCITY_BOUND,
            VELOCITY_BOUND,
        )
        self._velocities[uid] = a
        self._positions[uid] = np.clip(
            self._positions[uid] + a * DT, -POSITION_BOUND, POSITION_BOUND
        )
    self._step_count += 1
    truncated = self._step_count >= self._episode_length
    return self._build_obs(), self._compute_reward(), False, truncated, {}

PartnerIdAnnotationWrapper

Bases: Wrapper

Inject obs["meta"]["partner_id"] on every reset and step (ADR-006 risk #3).

Wraps any env whose observation space is a :class:gym.spaces.Dict. The wrapper extends the obs space with a "meta" sub-Dict (empty in Phase-0 — Gymnasium's :class:gym.spaces.Dict does not natively type string values, so the actual partner_id field is carried in the runtime observation dict rather than declared in the space).

Note

:meth:gym.spaces.Dict.contains returns False on the "meta" sub-dict because the declared inner space is empty; this matches :class:chamber.envs.CommShapingWrapper's precedent for its "comm" sub-space (no current consumer validates an obs via observation_space.contains). Adding a string-typed inner Space would require an upstream Gymnasium change.

Phase-0 binds partner_id at construction. For a Phase-1 Stage-3 PF spike with mid-episode swap, call :meth:set_partner_id before the next :meth:step; the wrapper will surface the new value on the subsequent obs.

Parameters:

  • env (Env) –

    Inner env with a :class:gym.spaces.Dict observation space.

  • partner_id (str) –

    Stable 16-hex-char hash from :attr:chamber.partners.api.PartnerSpec.partner_id. The wrapper does not validate the format — :class:PartnerSpec is the source of truth.

Raises:

  • ChamberEnvCompatibilityError

    When the inner observation space is not a :class:gym.spaces.Dict (ADR-001 §Risks; matches :class:chamber.envs.CommShapingWrapper's compatibility contract).

Source code in src/chamber/envs/partner_meta.py
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class PartnerIdAnnotationWrapper(gym.Wrapper):  # type: ignore[type-arg]
    """Inject ``obs["meta"]["partner_id"]`` on every reset and step (ADR-006 risk #3).

    Wraps any env whose observation space is a :class:`gym.spaces.Dict`.
    The wrapper extends the obs space with a ``"meta"`` sub-Dict (empty
    in Phase-0 — Gymnasium's :class:`gym.spaces.Dict` does not natively
    type string values, so the actual ``partner_id`` field is carried
    in the runtime observation dict rather than declared in the space).

    Note:
        :meth:`gym.spaces.Dict.contains` returns ``False`` on the
        ``"meta"`` sub-dict because the declared inner space is empty;
        this matches :class:`chamber.envs.CommShapingWrapper`'s
        precedent for its ``"comm"`` sub-space (no current consumer
        validates an obs via ``observation_space.contains``). Adding a
        string-typed inner Space would require an upstream Gymnasium
        change.

    Phase-0 binds ``partner_id`` at construction. For a Phase-1
    Stage-3 PF spike with mid-episode swap, call :meth:`set_partner_id`
    before the next :meth:`step`; the wrapper will surface the new
    value on the subsequent obs.

    Args:
        env: Inner env with a :class:`gym.spaces.Dict` observation space.
        partner_id: Stable 16-hex-char hash from
            :attr:`chamber.partners.api.PartnerSpec.partner_id`. The
            wrapper does not validate the format — :class:`PartnerSpec`
            is the source of truth.

    Raises:
        ChamberEnvCompatibilityError: When the inner observation space
            is not a :class:`gym.spaces.Dict` (ADR-001 §Risks; matches
            :class:`chamber.envs.CommShapingWrapper`'s compatibility
            contract).
    """

    def __init__(self, env: gym.Env, *, partner_id: str) -> None:  # type: ignore[type-arg]
        """Validate the obs space and bind the partner identity (ADR-006 risk #3)."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            msg = (
                f"PartnerIdAnnotationWrapper requires a gym.spaces.Dict observation "
                f"space; got {type(env.observation_space).__name__}. See ADR-001 §Risks."
            )
            raise ChamberEnvCompatibilityError(msg)
        self._partner_id: str = partner_id
        self._extend_observation_space()

    def _extend_observation_space(self) -> None:
        """Add an empty ``"meta"`` sub-space (Gymnasium has no string-typed Space)."""
        existing = dict(self.env.observation_space.spaces)  # type: ignore[attr-defined]
        existing.setdefault(_META_KEY, gym.spaces.Dict({}))
        self.observation_space = gym.spaces.Dict(existing)

    @property
    def partner_id(self) -> str:
        """Return the partner identity the wrapper currently injects (ADR-006 risk #3).

        Returns:
            The stable 16-hex-char hash bound at construction or via
            :meth:`set_partner_id`.
        """
        return self._partner_id

    def set_partner_id(self, partner_id: str) -> None:
        """Rebind the partner identity for the next reset / step (ADR-004 §risk-mitigation #2).

        Used by Phase-1 mid-episode-swap experiments (ADR-007 Stage-3 PF):
        call this before the next :meth:`step` and the wrapper will
        surface the new value on the subsequent obs. The change does
        NOT trigger a new :meth:`reset` automatically — the conformal
        filter does that itself when it observes the new
        ``partner_id`` on ``obs["meta"]``.

        Args:
            partner_id: New stable hash from
                :attr:`chamber.partners.api.PartnerSpec.partner_id`.
        """
        self._partner_id = partner_id

    def reset(  # type: ignore[override]
        self, **kwargs: object
    ) -> tuple[dict, dict]:
        """Reset the env and inject ``partner_id`` into the initial obs (ADR-006 risk #3)."""
        obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
        return self._annotate(obs), info

    def step(  # type: ignore[override]
        self, action: object
    ) -> tuple[dict, object, bool, bool, dict]:
        """Step the env and inject the current ``partner_id`` (ADR-006 risk #3)."""
        obs, reward, terminated, truncated, info = self.env.step(action)
        return self._annotate(obs), reward, terminated, truncated, info

    def _annotate(self, obs: Mapping[str, object]) -> dict[str, object]:
        """Return a copy of ``obs`` with ``obs["meta"]["partner_id"]`` set."""
        out = dict(obs)
        existing_meta = out.get(_META_KEY)
        meta: dict[str, object] = dict(existing_meta) if isinstance(existing_meta, dict) else {}
        meta[_PARTNER_ID_KEY] = self._partner_id
        out[_META_KEY] = meta
        return out

partner_id property

partner_id: str

Return the partner identity the wrapper currently injects (ADR-006 risk #3).

Returns:

  • str

    The stable 16-hex-char hash bound at construction or via

  • str

    meth:set_partner_id.

__init__

__init__(env: Env, *, partner_id: str) -> None

Validate the obs space and bind the partner identity (ADR-006 risk #3).

Source code in src/chamber/envs/partner_meta.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def __init__(self, env: gym.Env, *, partner_id: str) -> None:  # type: ignore[type-arg]
    """Validate the obs space and bind the partner identity (ADR-006 risk #3)."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        msg = (
            f"PartnerIdAnnotationWrapper requires a gym.spaces.Dict observation "
            f"space; got {type(env.observation_space).__name__}. See ADR-001 §Risks."
        )
        raise ChamberEnvCompatibilityError(msg)
    self._partner_id: str = partner_id
    self._extend_observation_space()

reset

reset(**kwargs: object) -> tuple[dict, dict]

Reset the env and inject partner_id into the initial obs (ADR-006 risk #3).

Source code in src/chamber/envs/partner_meta.py
138
139
140
141
142
143
def reset(  # type: ignore[override]
    self, **kwargs: object
) -> tuple[dict, dict]:
    """Reset the env and inject ``partner_id`` into the initial obs (ADR-006 risk #3)."""
    obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
    return self._annotate(obs), info

set_partner_id

set_partner_id(partner_id: str) -> None

Rebind the partner identity for the next reset / step (ADR-004 §risk-mitigation #2).

Used by Phase-1 mid-episode-swap experiments (ADR-007 Stage-3 PF): call this before the next :meth:step and the wrapper will surface the new value on the subsequent obs. The change does NOT trigger a new :meth:reset automatically — the conformal filter does that itself when it observes the new partner_id on obs["meta"].

Parameters:

  • partner_id (str) –

    New stable hash from :attr:chamber.partners.api.PartnerSpec.partner_id.

Source code in src/chamber/envs/partner_meta.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def set_partner_id(self, partner_id: str) -> None:
    """Rebind the partner identity for the next reset / step (ADR-004 §risk-mitigation #2).

    Used by Phase-1 mid-episode-swap experiments (ADR-007 Stage-3 PF):
    call this before the next :meth:`step` and the wrapper will
    surface the new value on the subsequent obs. The change does
    NOT trigger a new :meth:`reset` automatically — the conformal
    filter does that itself when it observes the new
    ``partner_id`` on ``obs["meta"]``.

    Args:
        partner_id: New stable hash from
            :attr:`chamber.partners.api.PartnerSpec.partner_id`.
    """
    self._partner_id = partner_id

step

step(
    action: object,
) -> tuple[dict, object, bool, bool, dict]

Step the env and inject the current partner_id (ADR-006 risk #3).

Source code in src/chamber/envs/partner_meta.py
145
146
147
148
149
150
def step(  # type: ignore[override]
    self, action: object
) -> tuple[dict, object, bool, bool, dict]:
    """Step the env and inject the current ``partner_id`` (ADR-006 risk #3)."""
    obs, reward, terminated, truncated, info = self.env.step(action)
    return self._annotate(obs), reward, terminated, truncated, info

PerAgentActionRepeatWrapper

Bases: Wrapper

Apply different action-repeat counts per agent uid.

Effective control frequency for agent a is::

base_freq / action_repeat[a]

At each env.step call, an agent whose repeat counter has not elapsed re-applies its most recent action instead of the newly submitted one.

ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).

Parameters:

  • env (Env) –

    Env exposing a gym.spaces.Dict action space keyed by agent uid.

  • action_repeat (Mapping[str, int]) –

    Mapping from agent uid to a positive integer repeat count. Keys absent from this mapping default to repeat=1 (pass-through).

Raises:

  • ChamberEnvCompatibilityError

    if the action space is not a gym.spaces.Dict keyed by agent uid (ADR-001 §Risks).

  • ValueError

    if any repeat count is less than 1.

Source code in src/chamber/envs/action_repeat.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class PerAgentActionRepeatWrapper(gym.Wrapper):  # type: ignore[type-arg]
    """Apply different action-repeat counts per agent uid.

    Effective control frequency for agent ``a`` is::

        base_freq / action_repeat[a]

    At each ``env.step`` call, an agent whose repeat counter has not elapsed
    re-applies its most recent action instead of the newly submitted one.

    ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).

    Args:
        env: Env exposing a ``gym.spaces.Dict`` action space keyed by agent uid.
        action_repeat: Mapping from agent uid to a positive integer repeat count.
            Keys absent from this mapping default to repeat=1 (pass-through).

    Raises:
        ChamberEnvCompatibilityError: if the action space is not a
            ``gym.spaces.Dict`` keyed by agent uid (ADR-001 §Risks).
        ValueError: if any repeat count is less than 1.
    """

    def __init__(self, env: gym.Env, action_repeat: Mapping[str, int]) -> None:  # type: ignore[type-arg]
        """Validate the action space and store per-agent repeat counts."""
        super().__init__(env)
        if not isinstance(env.action_space, gym.spaces.Dict):
            raise ChamberEnvCompatibilityError(
                f"PerAgentActionRepeatWrapper requires a gym.spaces.Dict action "
                f"space keyed by agent uid; got {type(env.action_space).__name__}. "
                f"See ADR-001 §Risks."
            )
        for uid, count in action_repeat.items():
            if count < 1:
                raise ValueError(f"action_repeat[{uid!r}] must be ≥ 1, got {count}")
        self._action_repeat: dict[str, int] = dict(action_repeat)
        self._counters: dict[str, int] = dict.fromkeys(action_repeat, 0)
        self._last_action: dict[str, object] = {}

    def step(self, action: dict) -> tuple:  # type: ignore[override]
        """Step with per-agent action repeat applied.

        ADR-001 §Validation criteria condition (c): the slow agent's effective
        action update interval matches 1/rate within one env tick.
        """
        resolved: dict[str, object] = {}
        for uid, act in action.items():
            repeat = self._action_repeat.get(uid, 1)
            counter = self._counters.get(uid, 0)
            if counter == 0:
                resolved[uid] = act
                self._last_action[uid] = act
                self._counters[uid] = repeat - 1
            else:
                resolved[uid] = self._last_action[uid]
                self._counters[uid] = counter - 1
        return self.env.step(resolved)

    def reset(self, **kwargs: object) -> tuple:  # type: ignore[override]
        """Reset the env and clear all per-agent repeat state.

        ADR-001 §Decision item 2: no state leaks across episodes.
        """
        obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
        self._counters = dict.fromkeys(self._action_repeat, 0)
        self._last_action = {}
        return obs, info

__init__

__init__(
    env: Env, action_repeat: Mapping[str, int]
) -> None

Validate the action space and store per-agent repeat counts.

Source code in src/chamber/envs/action_repeat.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def __init__(self, env: gym.Env, action_repeat: Mapping[str, int]) -> None:  # type: ignore[type-arg]
    """Validate the action space and store per-agent repeat counts."""
    super().__init__(env)
    if not isinstance(env.action_space, gym.spaces.Dict):
        raise ChamberEnvCompatibilityError(
            f"PerAgentActionRepeatWrapper requires a gym.spaces.Dict action "
            f"space keyed by agent uid; got {type(env.action_space).__name__}. "
            f"See ADR-001 §Risks."
        )
    for uid, count in action_repeat.items():
        if count < 1:
            raise ValueError(f"action_repeat[{uid!r}] must be ≥ 1, got {count}")
    self._action_repeat: dict[str, int] = dict(action_repeat)
    self._counters: dict[str, int] = dict.fromkeys(action_repeat, 0)
    self._last_action: dict[str, object] = {}

reset

reset(**kwargs: object) -> tuple

Reset the env and clear all per-agent repeat state.

ADR-001 §Decision item 2: no state leaks across episodes.

Source code in src/chamber/envs/action_repeat.py
77
78
79
80
81
82
83
84
85
def reset(self, **kwargs: object) -> tuple:  # type: ignore[override]
    """Reset the env and clear all per-agent repeat state.

    ADR-001 §Decision item 2: no state leaks across episodes.
    """
    obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
    self._counters = dict.fromkeys(self._action_repeat, 0)
    self._last_action = {}
    return obs, info

step

step(action: dict) -> tuple

Step with per-agent action repeat applied.

ADR-001 §Validation criteria condition (c): the slow agent's effective action update interval matches 1/rate within one env tick.

Source code in src/chamber/envs/action_repeat.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def step(self, action: dict) -> tuple:  # type: ignore[override]
    """Step with per-agent action repeat applied.

    ADR-001 §Validation criteria condition (c): the slow agent's effective
    action update interval matches 1/rate within one env tick.
    """
    resolved: dict[str, object] = {}
    for uid, act in action.items():
        repeat = self._action_repeat.get(uid, 1)
        counter = self._counters.get(uid, 0)
        if counter == 0:
            resolved[uid] = act
            self._last_action[uid] = act
            self._counters[uid] = repeat - 1
        else:
            resolved[uid] = self._last_action[uid]
            self._counters[uid] = counter - 1
    return self.env.step(resolved)

Stage1ASStateSynthesizer

Bases: ObservationWrapper

Synthesise obs["agent"][uid]["state"] for AS conditions (ADR-007 §Stage 1b).

ManiSkill v3.0.1 obs_mode="state_dict" (used by the Stage-1 AS conditions) emits per-agent Dict(qpos: Box, qvel: Box, ...) with no top-level "state" key. But :meth:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config reads env.observation_space["agent"][ego_uid]["state"] as a 1-D Box and derives obs_dim = ego_state_space.shape[0]. ManiSkill v3 obs_mode="state" flattens concat(qpos, qvel) into a single "state" key with that shape; this wrapper replicates that concat in obs-space so callers can keep obs_mode="state_dict" for the synthesised-channel injection (force_torque, tcp_pose, ...) the OM-hetero condition needs while still satisfying the trainer's 1-D state contract.

Pass-through for OM conditions (is_om_condition=True — those go through :class:Stage1OMChannelFilter instead) and for envs whose condition_id is missing or unrecognised (Tier-1 safety: keeps the wrapper a no-op for envs that don't satisfy the Stage-1b condition_id contract).

Parameters:

  • env (Env[Any, Any]) –

    Inner env exposing a condition_id attribute.

Raises:

  • TypeError

    If env.observation_space is not a :class:gym.spaces.Dict.

Source code in src/chamber/envs/stage1_obs_filter.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
class Stage1ASStateSynthesizer(gym.ObservationWrapper):  # type: ignore[type-arg]
    """Synthesise ``obs["agent"][uid]["state"]`` for AS conditions (ADR-007 §Stage 1b).

    ManiSkill v3.0.1 ``obs_mode="state_dict"`` (used by the Stage-1 AS
    conditions) emits per-agent ``Dict(qpos: Box, qvel: Box, ...)`` with
    no top-level ``"state"`` key. But
    :meth:`chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config`
    reads ``env.observation_space["agent"][ego_uid]["state"]`` as a 1-D
    Box and derives ``obs_dim = ego_state_space.shape[0]``. ManiSkill v3
    ``obs_mode="state"`` flattens ``concat(qpos, qvel)`` into a single
    ``"state"`` key with that shape; this wrapper replicates that concat
    in obs-space so callers can keep ``obs_mode="state_dict"`` for the
    synthesised-channel injection (``force_torque``, ``tcp_pose``, ...)
    the OM-hetero condition needs while still satisfying the trainer's
    1-D ``state`` contract.

    Pass-through for OM conditions (``is_om_condition=True`` — those go
    through :class:`Stage1OMChannelFilter` instead) and for envs whose
    ``condition_id`` is missing or unrecognised (Tier-1 safety: keeps
    the wrapper a no-op for envs that don't satisfy the Stage-1b
    ``condition_id`` contract).

    Args:
        env: Inner env exposing a ``condition_id`` attribute.

    Raises:
        TypeError: If ``env.observation_space`` is not a
            :class:`gym.spaces.Dict`.
    """

    def __init__(self, env: gym.Env[Any, Any]) -> None:
        """Detect AS-condition envs at construction (ADR-007 §Stage 1b)."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            msg = (
                "Stage1ASStateSynthesizer requires a gym.spaces.Dict observation "
                f"space; got {type(env.observation_space).__name__}."
            )
            raise TypeError(msg)
        condition_id = getattr(env, "condition_id", None)
        if condition_id is None and hasattr(env, "get_wrapper_attr"):
            try:
                condition_id = env.get_wrapper_attr("condition_id")
            except AttributeError:
                condition_id = None
        # Local import to avoid the chamber.envs.stage1_pickplace ↔
        # chamber.envs.stage1_obs_filter circular import that arises now
        # that make_stage1_pickplace_env applies this wrapper.
        from chamber.envs.stage1_pickplace import resolve_condition

        config = None
        if isinstance(condition_id, str):
            try:
                config = resolve_condition(condition_id)
            except ValueError:
                config = None
        self._active: bool = config is not None and not config.is_om_condition
        if self._active:
            self.observation_space = self._build_observation_space(env.observation_space)

    @staticmethod
    def _build_observation_space(inner: gym.spaces.Dict) -> gym.spaces.Dict:
        """Inject a 1-D ``state`` Box per agent under ``obs["agent"][uid]``.

        The injected Box's shape is ``(qpos_dim + qvel_dim,)`` —
        flattened to 1-D so :attr:`gym.spaces.Box.shape[0]` matches the
        ``obs_dim`` HARL's MLPBase reads at construction.
        """
        agent = inner.spaces.get("agent")
        if not isinstance(agent, gym.spaces.Dict):
            return inner
        new_agent_spaces: dict[str, gym.spaces.Space[Any]] = {}
        for uid, sub in agent.spaces.items():
            if isinstance(sub, gym.spaces.Dict):
                qpos = sub.spaces.get("qpos")
                qvel = sub.spaces.get("qvel")
                if isinstance(qpos, gym.spaces.Box) and isinstance(qvel, gym.spaces.Box):
                    state_dim = int(np.prod(qpos.shape)) + int(np.prod(qvel.shape))
                    state_box = gym.spaces.Box(
                        low=-np.inf,
                        high=np.inf,
                        shape=(state_dim,),
                        dtype=np.float32,
                    )
                    augmented = dict(sub.spaces)
                    augmented["state"] = state_box
                    new_agent_spaces[uid] = gym.spaces.Dict(augmented)
                    continue
            new_agent_spaces[uid] = sub
        new_spaces = dict(inner.spaces)
        new_spaces["agent"] = gym.spaces.Dict(new_agent_spaces)
        return gym.spaces.Dict(new_spaces)

    def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
        """Inject synthesised ``state`` per agent (ADR-007 §Stage 1b)."""
        if not self._active:
            return observation
        agent = observation.get("agent")
        if not isinstance(agent, dict):
            return observation
        new_agent: dict[str, Any] = {}
        for uid, sub in agent.items():
            if isinstance(sub, dict) and "qpos" in sub and "qvel" in sub:
                augmented = dict(sub)
                augmented["state"] = _flat_state_concat(sub["qpos"], sub["qvel"])
                new_agent[uid] = augmented
            else:
                new_agent[uid] = sub
        out = dict(observation)
        out["agent"] = new_agent
        return out

    def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors gym.Wrapper's inherited __getattr__ signature
        """Forward attribute access to the inner env (Tier-2 caller contract).

        Gymnasium 1.3 removed :class:`gym.Wrapper`'s implicit
        ``__getattr__``; existing P1.03 callers of
        :func:`make_stage1_pickplace_env` read SAPIEN-env attributes
        directly (``env.agent``, ``env._jacobian_provider``,
        ``env.condition_config``...). Forwarding here keeps that
        contract without forcing every caller to switch to
        :meth:`gym.Wrapper.get_wrapper_attr`.

        ``__getattr__`` only fires when normal attribute lookup misses;
        ``self.env`` is set by :meth:`gym.Wrapper.__init__` before any
        downstream access, so the delegation is safe. The ``env``-name
        guard is a defensive backstop against recursive lookup on the
        env attr itself (only reachable in pathological half-init
        paths).
        """
        if name == "env":
            raise AttributeError(name)
        return getattr(self.env, name)

__getattr__

__getattr__(name: str) -> Any

Forward attribute access to the inner env (Tier-2 caller contract).

Gymnasium 1.3 removed :class:gym.Wrapper's implicit __getattr__; existing P1.03 callers of :func:make_stage1_pickplace_env read SAPIEN-env attributes directly (env.agent, env._jacobian_provider, env.condition_config...). Forwarding here keeps that contract without forcing every caller to switch to :meth:gym.Wrapper.get_wrapper_attr.

__getattr__ only fires when normal attribute lookup misses; self.env is set by :meth:gym.Wrapper.__init__ before any downstream access, so the delegation is safe. The env-name guard is a defensive backstop against recursive lookup on the env attr itself (only reachable in pathological half-init paths).

Source code in src/chamber/envs/stage1_obs_filter.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors gym.Wrapper's inherited __getattr__ signature
    """Forward attribute access to the inner env (Tier-2 caller contract).

    Gymnasium 1.3 removed :class:`gym.Wrapper`'s implicit
    ``__getattr__``; existing P1.03 callers of
    :func:`make_stage1_pickplace_env` read SAPIEN-env attributes
    directly (``env.agent``, ``env._jacobian_provider``,
    ``env.condition_config``...). Forwarding here keeps that
    contract without forcing every caller to switch to
    :meth:`gym.Wrapper.get_wrapper_attr`.

    ``__getattr__`` only fires when normal attribute lookup misses;
    ``self.env`` is set by :meth:`gym.Wrapper.__init__` before any
    downstream access, so the delegation is safe. The ``env``-name
    guard is a defensive backstop against recursive lookup on the
    env attr itself (only reachable in pathological half-init
    paths).
    """
    if name == "env":
        raise AttributeError(name)
    return getattr(self.env, name)

__init__

__init__(env: Env[Any, Any]) -> None

Detect AS-condition envs at construction (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def __init__(self, env: gym.Env[Any, Any]) -> None:
    """Detect AS-condition envs at construction (ADR-007 §Stage 1b)."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        msg = (
            "Stage1ASStateSynthesizer requires a gym.spaces.Dict observation "
            f"space; got {type(env.observation_space).__name__}."
        )
        raise TypeError(msg)
    condition_id = getattr(env, "condition_id", None)
    if condition_id is None and hasattr(env, "get_wrapper_attr"):
        try:
            condition_id = env.get_wrapper_attr("condition_id")
        except AttributeError:
            condition_id = None
    # Local import to avoid the chamber.envs.stage1_pickplace ↔
    # chamber.envs.stage1_obs_filter circular import that arises now
    # that make_stage1_pickplace_env applies this wrapper.
    from chamber.envs.stage1_pickplace import resolve_condition

    config = None
    if isinstance(condition_id, str):
        try:
            config = resolve_condition(condition_id)
        except ValueError:
            config = None
    self._active: bool = config is not None and not config.is_om_condition
    if self._active:
        self.observation_space = self._build_observation_space(env.observation_space)

observation

observation(observation: dict[str, Any]) -> dict[str, Any]

Inject synthesised state per agent (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
    """Inject synthesised ``state`` per agent (ADR-007 §Stage 1b)."""
    if not self._active:
        return observation
    agent = observation.get("agent")
    if not isinstance(agent, dict):
        return observation
    new_agent: dict[str, Any] = {}
    for uid, sub in agent.items():
        if isinstance(sub, dict) and "qpos" in sub and "qvel" in sub:
            augmented = dict(sub)
            augmented["state"] = _flat_state_concat(sub["qpos"], sub["qvel"])
            new_agent[uid] = augmented
        else:
            new_agent[uid] = sub
    out = dict(observation)
    out["agent"] = new_agent
    return out

Stage1OMChannelFilter

Bases: ObservationWrapper

OM-axis per-condition channel filter (ADR-007 §Stage 1b).

Reads the inner env's condition_id at construction time and captures the resulting filter policy. The wrapper does not re-read condition_id on every step — Stage-1b envs are built once per (seed, condition) cell (see stage1_as.py:253-275) and the condition is fixed for the env's lifetime.

Parameters:

  • env (Env[Any, Any]) –

    Inner env exposing a condition_id attribute. The :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv class satisfies this; the wrapper falls back to a pass-through identity if condition_id is missing or not one of the two OM strings.

Raises:

  • TypeError

    If the inner env's observation space is not a :class:gym.spaces.Dict (the wrapper relies on dict-shaped obs).

Source code in src/chamber/envs/stage1_obs_filter.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
class Stage1OMChannelFilter(gym.ObservationWrapper):  # type: ignore[type-arg]
    """OM-axis per-condition channel filter (ADR-007 §Stage 1b).

    Reads the inner env's ``condition_id`` at construction time and
    captures the resulting filter policy. The wrapper does not re-read
    ``condition_id`` on every step — Stage-1b envs are built once per
    ``(seed, condition)`` cell (see ``stage1_as.py:253-275``) and the
    condition is fixed for the env's lifetime.

    Args:
        env: Inner env exposing a ``condition_id`` attribute. The
            :class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv`
            class satisfies this; the wrapper falls back to a
            pass-through identity if ``condition_id`` is missing or
            not one of the two OM strings.

    Raises:
        TypeError: If the inner env's observation space is not a
            :class:`gym.spaces.Dict` (the wrapper relies on dict-shaped
            obs).
    """

    def __init__(self, env: gym.Env[Any, Any]) -> None:
        """Store the per-condition filter policy (ADR-007 §Stage 1b)."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            msg = (
                "Stage1OMChannelFilter requires a gym.spaces.Dict observation "
                f"space; got {type(env.observation_space).__name__}."
            )
            raise TypeError(msg)
        # Read condition_id once at construction; the env-per-cell
        # cadence makes this safe and saves a hot-path attr lookup.
        condition_id = getattr(env, "condition_id", None)
        if condition_id is None and hasattr(env, "get_wrapper_attr"):
            try:
                condition_id = env.get_wrapper_attr("condition_id")
            except AttributeError:
                condition_id = None
        self._condition_id: str | None = condition_id
        self._filter_active: bool = condition_id in _FILTERED_CONDITIONS

    @property
    def condition_id(self) -> str | None:
        """ADR-007 §Stage 1b OM axis: the condition_id resolved at construction."""
        return self._condition_id

    def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
        """Apply the per-condition channel mask (ADR-007 §Stage 1b).

        OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
        and the ``obs["extra"]["force_torque"]`` channel; keeps
        ``obs["extra"]["tcp_pose"]`` + ``obs["extra"]["goal_pos"]``
        (the minimal task-relevant fields) plus ``obs["sensor_data"]``
        (RGB-D camera) untouched. Other conditions pass through.
        """
        if not self._filter_active:
            return observation
        out = dict(observation)
        # Zero-mask per-uid agent proprio (the "no proprioception" half
        # of the vision-only condition_id semantics).
        agent = observation.get("agent")
        if isinstance(agent, dict):
            zeroed_agent: dict[str, Any] = {}
            for uid, sub in agent.items():
                if isinstance(sub, dict):
                    zeroed_agent[uid] = {
                        ch: np.zeros_like(val) if hasattr(val, "shape") else val
                        for ch, val in sub.items()
                    }
                else:
                    zeroed_agent[uid] = sub
            out["agent"] = zeroed_agent
        # Zero-mask task extras NOT in the vision-only keep-set (the
        # "no force-torque" half of the vision-only semantics).
        extra = observation.get("extra")
        if isinstance(extra, dict):
            filtered_extra: dict[str, Any] = {}
            for ch, val in extra.items():
                if ch in _OM_VISION_ONLY_EXTRA_KEEP:
                    filtered_extra[ch] = val
                elif hasattr(val, "shape"):
                    filtered_extra[ch] = np.zeros_like(val)
                else:
                    filtered_extra[ch] = val
            out["extra"] = filtered_extra
        return out

    def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors Stage1ASStateSynthesizer's forwarding contract
        """Forward attribute access to the inner env (Tier-2 caller contract).

        Mirrors :meth:`Stage1ASStateSynthesizer.__getattr__`: Gymnasium 1.3
        removed :class:`gym.Wrapper`'s implicit forwarding, so callers of
        :func:`chamber.envs.stage1_pickplace.make_stage1_pickplace_env`
        — which now applies this wrapper on top of the AS synthesizer —
        would otherwise lose access to SAPIEN-env attributes such as
        ``env.agent``, ``env.obs_mode``, ``env.condition_config``.
        """
        if name == "env":
            raise AttributeError(name)
        return getattr(self.env, name)

condition_id property

condition_id: str | None

ADR-007 §Stage 1b OM axis: the condition_id resolved at construction.

__getattr__

__getattr__(name: str) -> Any

Forward attribute access to the inner env (Tier-2 caller contract).

Mirrors :meth:Stage1ASStateSynthesizer.__getattr__: Gymnasium 1.3 removed :class:gym.Wrapper's implicit forwarding, so callers of :func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env — which now applies this wrapper on top of the AS synthesizer — would otherwise lose access to SAPIEN-env attributes such as env.agent, env.obs_mode, env.condition_config.

Source code in src/chamber/envs/stage1_obs_filter.py
335
336
337
338
339
340
341
342
343
344
345
346
347
def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors Stage1ASStateSynthesizer's forwarding contract
    """Forward attribute access to the inner env (Tier-2 caller contract).

    Mirrors :meth:`Stage1ASStateSynthesizer.__getattr__`: Gymnasium 1.3
    removed :class:`gym.Wrapper`'s implicit forwarding, so callers of
    :func:`chamber.envs.stage1_pickplace.make_stage1_pickplace_env`
    — which now applies this wrapper on top of the AS synthesizer —
    would otherwise lose access to SAPIEN-env attributes such as
    ``env.agent``, ``env.obs_mode``, ``env.condition_config``.
    """
    if name == "env":
        raise AttributeError(name)
    return getattr(self.env, name)

__init__

__init__(env: Env[Any, Any]) -> None

Store the per-condition filter policy (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def __init__(self, env: gym.Env[Any, Any]) -> None:
    """Store the per-condition filter policy (ADR-007 §Stage 1b)."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        msg = (
            "Stage1OMChannelFilter requires a gym.spaces.Dict observation "
            f"space; got {type(env.observation_space).__name__}."
        )
        raise TypeError(msg)
    # Read condition_id once at construction; the env-per-cell
    # cadence makes this safe and saves a hot-path attr lookup.
    condition_id = getattr(env, "condition_id", None)
    if condition_id is None and hasattr(env, "get_wrapper_attr"):
        try:
            condition_id = env.get_wrapper_attr("condition_id")
        except AttributeError:
            condition_id = None
    self._condition_id: str | None = condition_id
    self._filter_active: bool = condition_id in _FILTERED_CONDITIONS

observation

observation(observation: dict[str, Any]) -> dict[str, Any]

Apply the per-condition channel mask (ADR-007 §Stage 1b).

OM-vision-only zero-masks the per-uid agent-proprio sub-dicts and the obs["extra"]["force_torque"] channel; keeps obs["extra"]["tcp_pose"] + obs["extra"]["goal_pos"] (the minimal task-relevant fields) plus obs["sensor_data"] (RGB-D camera) untouched. Other conditions pass through.

Source code in src/chamber/envs/stage1_obs_filter.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
    """Apply the per-condition channel mask (ADR-007 §Stage 1b).

    OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
    and the ``obs["extra"]["force_torque"]`` channel; keeps
    ``obs["extra"]["tcp_pose"]`` + ``obs["extra"]["goal_pos"]``
    (the minimal task-relevant fields) plus ``obs["sensor_data"]``
    (RGB-D camera) untouched. Other conditions pass through.
    """
    if not self._filter_active:
        return observation
    out = dict(observation)
    # Zero-mask per-uid agent proprio (the "no proprioception" half
    # of the vision-only condition_id semantics).
    agent = observation.get("agent")
    if isinstance(agent, dict):
        zeroed_agent: dict[str, Any] = {}
        for uid, sub in agent.items():
            if isinstance(sub, dict):
                zeroed_agent[uid] = {
                    ch: np.zeros_like(val) if hasattr(val, "shape") else val
                    for ch, val in sub.items()
                }
            else:
                zeroed_agent[uid] = sub
        out["agent"] = zeroed_agent
    # Zero-mask task extras NOT in the vision-only keep-set (the
    # "no force-torque" half of the vision-only semantics).
    extra = observation.get("extra")
    if isinstance(extra, dict):
        filtered_extra: dict[str, Any] = {}
        for ch, val in extra.items():
            if ch in _OM_VISION_ONLY_EXTRA_KEEP:
                filtered_extra[ch] = val
            elif hasattr(val, "shape"):
                filtered_extra[ch] = np.zeros_like(val)
            else:
                filtered_extra[ch] = val
        out["extra"] = filtered_extra
    return out

TextureFilterObsWrapper

Bases: ObservationWrapper

Filter observation channels per agent uid, zero-masking unlisted channels.

Tensor shapes are preserved — masked channels become zero arrays of the same shape and dtype. This keeps downstream policy networks oblivious to which modalities a given partner actually receives.

ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.

Parameters:

  • env (Env) –

    Env whose observation has shape {"agent": {uid: {channel: array}}}.

  • keep_per_agent (Mapping[str, Iterable[str]]) –

    Mapping from agent uid to the iterable of channel keys to keep. Channels not listed are zero-masked. Uids absent from this mapping are passed through unchanged.

Source code in src/chamber/envs/texture_filter.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class TextureFilterObsWrapper(gym.ObservationWrapper):  # type: ignore[type-arg]
    """Filter observation channels per agent uid, zero-masking unlisted channels.

    Tensor shapes are preserved — masked channels become zero arrays of the
    same shape and dtype. This keeps downstream policy networks oblivious to
    which modalities a given partner actually receives.

    ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.

    Args:
        env: Env whose observation has shape ``{"agent": {uid: {channel: array}}}``.
        keep_per_agent: Mapping from agent uid to the iterable of channel keys
            to keep. Channels not listed are zero-masked. Uids absent from this
            mapping are passed through unchanged.
    """

    def __init__(self, env: gym.Env, keep_per_agent: Mapping[str, Iterable[str]]) -> None:  # type: ignore[type-arg]
        """Store per-agent channel keep-sets as frozensets for O(1) lookup."""
        super().__init__(env)
        self._keep: dict[str, frozenset[str]] = {
            uid: frozenset(keys) for uid, keys in keep_per_agent.items()
        }

    def observation(self, observation: dict) -> dict:  # type: ignore[override]
        """Apply per-agent channel mask, zero-filling unlisted channels.

        ADR-001 §Decision item 1: obs modality is a controllable heterogeneity
        axis; shapes are invariant so vector rollout is unaffected.
        """
        agent_obs = observation.get("agent")
        if not isinstance(agent_obs, dict):
            return observation
        filtered_agents: dict[str, dict] = {}
        for uid, per_agent in agent_obs.items():
            if uid not in self._keep or not isinstance(per_agent, dict):
                filtered_agents[uid] = per_agent
                continue
            keep = self._keep[uid]
            filtered_agents[uid] = {
                ch: val if ch in keep else np.zeros_like(val) for ch, val in per_agent.items()
            }
        return {**observation, "agent": filtered_agents}

__init__

__init__(
    env: Env, keep_per_agent: Mapping[str, Iterable[str]]
) -> None

Store per-agent channel keep-sets as frozensets for O(1) lookup.

Source code in src/chamber/envs/texture_filter.py
35
36
37
38
39
40
def __init__(self, env: gym.Env, keep_per_agent: Mapping[str, Iterable[str]]) -> None:  # type: ignore[type-arg]
    """Store per-agent channel keep-sets as frozensets for O(1) lookup."""
    super().__init__(env)
    self._keep: dict[str, frozenset[str]] = {
        uid: frozenset(keys) for uid, keys in keep_per_agent.items()
    }

observation

observation(observation: dict) -> dict

Apply per-agent channel mask, zero-filling unlisted channels.

ADR-001 §Decision item 1: obs modality is a controllable heterogeneity axis; shapes are invariant so vector rollout is unaffected.

Source code in src/chamber/envs/texture_filter.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def observation(self, observation: dict) -> dict:  # type: ignore[override]
    """Apply per-agent channel mask, zero-filling unlisted channels.

    ADR-001 §Decision item 1: obs modality is a controllable heterogeneity
    axis; shapes are invariant so vector rollout is unaffected.
    """
    agent_obs = observation.get("agent")
    if not isinstance(agent_obs, dict):
        return observation
    filtered_agents: dict[str, dict] = {}
    for uid, per_agent in agent_obs.items():
        if uid not in self._keep or not isinstance(per_agent, dict):
            filtered_agents[uid] = per_agent
            continue
        keep = self._keep[uid]
        filtered_agents[uid] = {
            ch: val if ch in keep else np.zeros_like(val) for ch, val in per_agent.items()
        }
    return {**observation, "agent": filtered_agents}

build_control_models_for_condition

build_control_models_for_condition(
    condition_id: str,
    *,
    jacobian_fn: Callable[[AgentSnapshot], NDArray[float64]]
    | None = None,
    fetch_action_dim: int | None = None,
    panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]

Build per-uid :class:AgentControlModel map for condition_id (ADR-004 §Decision).

Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable is plumbed in by the env at construction time (see :meth:Stage1PickPlaceEnv.build_control_models); Tier-1 tests pass jacobian_fn=None to exercise the model-construction path, which keeps the placeholder per :class:JacobianControlModel's loud-fail-on-use contract (concerto/safety/api.py:552).

Parameters:

  • condition_id (str) –

    Stage-1 condition_id (raises if unknown).

  • jacobian_fn (Callable[[AgentSnapshot], NDArray[float64]] | None, default: None ) –

    Optional Jacobian callable to embed in the panda uid's :class:JacobianControlModel. None means the placeholder model is returned (Tier-1 path); pass the env's closure-via-env-reference callable (lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])) to get a working model for Tier-2 / production.

  • fetch_action_dim (int | None, default: None ) –

    Action dimension of the fetch uid's Box. None (default) is the Tier-1 sentinel — :class:DoubleIntegratorControlModel with action_dim=2 (just the differential-drive base axis; the arm + body + gripper are folded into the same model with the canonical "treat the whole action as a 1-D Cartesian acceleration" approximation P1.05.5 will tighten). Tier-2 paths pass the actual env.action_space[uid].shape[0].

  • panda_action_dim (int, default: 8 ) –

    Action dimension of the panda uid's Box. Defaults to 8 (7 arm joint deltas + 1 mimic-gripper joint delta under pd_joint_delta_pos). Both panda_wristcam and panda_partner share this shape.

Returns:

Raises:

  • ValueError

    If condition_id is not in the four-condition table (delegated to :func:resolve_condition).

Source code in src/chamber/envs/stage1_pickplace.py
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def build_control_models_for_condition(
    condition_id: str,
    *,
    jacobian_fn: Callable[[AgentSnapshot], NDArray[np.float64]] | None = None,
    fetch_action_dim: int | None = None,
    panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]:
    """Build per-uid :class:`AgentControlModel` map for ``condition_id`` (ADR-004 §Decision).

    Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable
    is plumbed in by the env at construction time (see
    :meth:`Stage1PickPlaceEnv.build_control_models`); Tier-1 tests pass
    ``jacobian_fn=None`` to exercise the model-construction path, which
    keeps the placeholder per :class:`JacobianControlModel`'s
    loud-fail-on-use contract (``concerto/safety/api.py:552``).

    Args:
        condition_id: Stage-1 ``condition_id`` (raises if unknown).
        jacobian_fn: Optional Jacobian callable to embed in the panda
            uid's :class:`JacobianControlModel`. ``None`` means the
            placeholder model is returned (Tier-1 path); pass the env's
            closure-via-env-reference callable
            (``lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])``)
            to get a working model for Tier-2 / production.
        fetch_action_dim: Action dimension of the fetch uid's Box.
            ``None`` (default) is the Tier-1 sentinel — :class:`DoubleIntegratorControlModel`
            with ``action_dim=2`` (just the differential-drive base axis;
            the arm + body + gripper are folded into the same model
            with the canonical "treat the whole action as a 1-D
            Cartesian acceleration" approximation P1.05.5 will tighten).
            Tier-2 paths pass the actual ``env.action_space[uid].shape[0]``.
        panda_action_dim: Action dimension of the panda uid's Box.
            Defaults to ``8`` (7 arm joint deltas + 1 mimic-gripper
            joint delta under ``pd_joint_delta_pos``). Both
            ``panda_wristcam`` and ``panda_partner`` share this shape.

    Returns:
        ``{uid: AgentControlModel}`` map keyed by the condition's
        :attr:`ConditionConfig.agent_uids`. The panda uid (always the
        ego, always present) gets a :class:`JacobianControlModel`; the
        partner uid gets a :class:`JacobianControlModel` for
        ``panda_partner`` (AS-homo) or
        :class:`DoubleIntegratorControlModel` for ``fetch`` (AS-hetero /
        OM-*).

    Raises:
        ValueError: If ``condition_id`` is not in the four-condition
            table (delegated to :func:`resolve_condition`).
    """
    config = resolve_condition(condition_id)
    ego_uid, partner_uid = config.agent_uids
    models: dict[str, AgentControlModel] = {}
    # Ego is always panda_wristcam — 7-DOF arm + mimic-gripper.
    # JacobianControlModel position_dim=3 (Cartesian) matches the
    # AgentSnapshot Cartesian-only contract (cbf_qp.py:103-110).
    models[ego_uid] = JacobianControlModel(
        uid=ego_uid,
        action_dim=panda_action_dim,
        position_dim=3,
        jacobian_fn=jacobian_fn,
        max_cartesian_accel_value=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
    )
    if partner_uid == "panda_partner":
        # AS-homo: second panda, identical kinematics, same Jacobian
        # callable wired by the env via the closure pattern.
        models[partner_uid] = JacobianControlModel(
            uid=partner_uid,
            action_dim=panda_action_dim,
            position_dim=3,
            jacobian_fn=jacobian_fn,
            max_cartesian_accel_value=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
        )
    else:
        # AS-hetero / OM-*: fetch base+arm uses DoubleIntegratorControlModel.
        # action_dim default 2 covers the 2-DOF differential-drive base
        # slice that ADR-007 §Stage 1 AS spike framing names; real
        # Tier-2 path passes the full action_space shape.
        effective_action_dim = 2 if fetch_action_dim is None else fetch_action_dim
        models[partner_uid] = DoubleIntegratorControlModel(
            uid=partner_uid,
            action_dim=effective_action_dim,
        )
    return models

make_stage1_pickplace_env

make_stage1_pickplace_env(
    *,
    condition_id: str,
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
    num_envs: int = 1,
    render_mode: str | None = None,
    render_backend: str | None = None,
    force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]

Build a :class:Stage1PickPlaceEnv instance (ADR-007 §Stage 1b).

Factory entry point. SAPIEN / ManiSkill imports are deferred to the body so python -c "import chamber.envs.stage1_pickplace" works on a Vulkan-less host (Tier-1 contract). The :class:Stage1PickPlaceEnv class itself is defined inside the factory body — same pattern as :func:chamber.benchmarks.stage0_smoke.make_stage0_env.

Parameters:

  • condition_id (str) –

    One of the four Stage-1 condition_id strings; raises :class:ValueError on any other input (cf. :func:resolve_condition).

  • episode_length (int, default: DEFAULT_EPISODE_LENGTH ) –

    Truncation horizon, in env ticks. Default :data:DEFAULT_EPISODE_LENGTH (100). See module docstring for the budget rationale.

  • root_seed (int, default: 0 ) –

    Project root seed routed to the env's :func:concerto.training.seeding.derive_substream substream for P6 reproducibility (P6 / ADR-002 determinism rule).

  • num_envs (int, default: 1 ) –

    ManiSkill vectorisation count; default 1 (Stage-1b paths are single-env per cell). Higher values are supported but untested.

  • render_mode (str | None, default: None ) –

    ManiSkill render_mode (None, "human", "rgb_array", etc.). None (default) disables rendering for the spike-runner path.

  • render_backend (str | None, default: None ) –

    ManiSkill render_backend ("gpu", "none", …). None (default) falls through to ManiSkill's own default. The Stage-0 smoke env uses CHAMBER_RENDER_BACKEND env-var probe + the visual-material strip patch; Stage-1b follows the same pattern but exposes the backend choice as a constructor kwarg too.

  • force_torque_noise_sigma (float, default: SYNTHESISED_FT_NOISE_SIGMA_N ) –

    sigma of the zero-mean Gaussian noise added to the synthesised force-torque channel, in Newtons. Default :data:SYNTHESISED_FT_NOISE_SIGMA_N. Override only with an ADR-007 §Revision history entry (the value is load-bearing for the OM gate; see module docstring).

Returns:

  • Env[Any, Any]

    class:Stage1PickPlaceEnv instance, ready to call

  • Env[Any, Any]

    reset(seed=K) on. The action space is a

  • Env[Any, Any]

    class:gym.spaces.Dict keyed by the resolved

  • Env[Any, Any]

    attr:ConditionConfig.agent_uids.

Raises:

  • ValueError

    If condition_id is unknown (see :func:resolve_condition).

  • ChamberEnvCompatibilityError

    If ManiSkill / SAPIEN / Vulkan initialisation fails (CPU-only host without a fallback render_backend="none"; see ADR-001 §Risks).

Source code in src/chamber/envs/stage1_pickplace.py
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
def make_stage1_pickplace_env(
    *,
    condition_id: str,
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
    num_envs: int = 1,
    render_mode: str | None = None,
    render_backend: str | None = None,
    force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]:
    """Build a :class:`Stage1PickPlaceEnv` instance (ADR-007 §Stage 1b).

    Factory entry point. SAPIEN / ManiSkill imports are deferred to the
    body so ``python -c "import chamber.envs.stage1_pickplace"`` works
    on a Vulkan-less host (Tier-1 contract). The
    :class:`Stage1PickPlaceEnv` class itself is defined inside the
    factory body — same pattern as
    :func:`chamber.benchmarks.stage0_smoke.make_stage0_env`.

    Args:
        condition_id: One of the four Stage-1 ``condition_id`` strings;
            raises :class:`ValueError` on any other input
            (cf. :func:`resolve_condition`).
        episode_length: Truncation horizon, in env ticks. Default
            :data:`DEFAULT_EPISODE_LENGTH` (100). See module docstring
            for the budget rationale.
        root_seed: Project root seed routed to the env's
            :func:`concerto.training.seeding.derive_substream` substream
            for P6 reproducibility (P6 / ADR-002 determinism rule).
        num_envs: ManiSkill vectorisation count; default 1 (Stage-1b
            paths are single-env per cell). Higher values are supported
            but untested.
        render_mode: ManiSkill ``render_mode`` (``None``,
            ``"human"``, ``"rgb_array"``, etc.). ``None`` (default)
            disables rendering for the spike-runner path.
        render_backend: ManiSkill ``render_backend`` (``"gpu"``,
            ``"none"``, …). ``None`` (default) falls through to
            ManiSkill's own default. The Stage-0 smoke env uses
            ``CHAMBER_RENDER_BACKEND`` env-var probe + the
            visual-material strip patch; Stage-1b follows the same
            pattern but exposes the backend choice as a constructor
            kwarg too.
        force_torque_noise_sigma: sigma of the zero-mean Gaussian noise
            added to the synthesised force-torque channel, in Newtons.
            Default :data:`SYNTHESISED_FT_NOISE_SIGMA_N`. Override only
            with an ADR-007 §Revision history entry (the value is
            load-bearing for the OM gate; see module docstring).

    Returns:
        :class:`Stage1PickPlaceEnv` instance, ready to call
        ``reset(seed=K)`` on. The action space is a
        :class:`gym.spaces.Dict` keyed by the resolved
        :attr:`ConditionConfig.agent_uids`.

    Raises:
        ValueError: If ``condition_id`` is unknown (see
            :func:`resolve_condition`).
        ChamberEnvCompatibilityError: If ManiSkill / SAPIEN / Vulkan
            initialisation fails (CPU-only host without a fallback
            ``render_backend="none"``; see ADR-001 §Risks).
    """
    # Validate condition_id eagerly (the inner __init__ will resolve again,
    # but raising here gives the user a useful traceback in the Tier-1
    # bogus-id path without paying the SAPIEN-import cost).
    resolve_condition(condition_id)
    try:
        import mani_skill.envs  # noqa: F401 - registers ManiSkill env IDs
        import sapien
        import torch
        from mani_skill.envs.sapien_env import BaseEnv
        from mani_skill.utils.building import actors
        from mani_skill.utils.scene_builder.table import TableSceneBuilder
        from mani_skill.utils.structs.pose import Pose

        # Trigger PandaPartner registration before BaseEnv reaches the
        # multi-agent build loop. The chamber.agents package's
        # try/except ensures this import is the place where a Tier-1
        # context would have already failed if it was going to.
        import chamber.agents  # noqa: F401 - side-effect import for register_agent
        from chamber.agents.panda_jacobian import (
            CARTESIAN_POSITION_DIM,
            PANDA_ARM_DOF,
            PandaJacobianProvider,
        )
        from chamber.envs._sapien_compat import (
            load_agent_with_bare_uids,
            patch_sapien_urdf_no_visual_material,
        )
    except ImportError as exc:
        raise ChamberEnvCompatibilityError(
            "Stage1PickPlaceEnv requires mani_skill / sapien / pytorch_kinematics "
            "in the active venv. Install per pyproject.toml; see ADR-001 §Risks."
        ) from exc

    if render_backend == "none":
        # Headless host workaround — same idempotent SAPIEN URDF patch
        # the Stage-0 smoke env relies on (ADR-001 §Risks).
        patch_sapien_urdf_no_visual_material()

    class Stage1PickPlaceEnv(BaseEnv):  # type: ignore[misc, valid-type]
        """Two-robot pick-place env (ADR-007 §Stage 1b; ADR-001 §Decision).

        Subclasses :class:`mani_skill.envs.sapien_env.BaseEnv` directly
        (matching :class:`chamber.benchmarks.stage0_smoke._Stage0SmokeEnv`).
        See the module docstring for the wrapper-only-discipline /
        synthesised-FT / closure-via-env-Jacobian / cooperation-gradient
        rationale.

        Implements the four condition_id strings the AS / OM pre-
        registrations name; the obs-mode superset is set per condition
        at constructor time, the per-uid channel filtering happens in
        :class:`chamber.envs.stage1_obs_filter.Stage1OMChannelFilter`
        (kept separate so the AS path doesn't pay the filter cost).
        """

        SUPPORTED_ROBOTS: ClassVar[list[tuple[str, str]]] = [  # type: ignore[assignment]
            ("panda_wristcam", "panda_partner"),
            ("panda_wristcam", "fetch"),
        ]

        def __init__(
            self,
            *,
            condition_id: str,
            episode_length: int,
            root_seed: int,
            num_envs: int,
            render_mode: str | None,
            render_backend: str | None,
            force_torque_noise_sigma: float,
        ) -> None:
            """Build the env (ADR-007 §Stage 1b).

            Stores the condition-resolved metadata, sets up the
            determinism harness, then calls
            ``super().__init__(robot_uids=config.agent_uids, ...)``.
            The Jacobian provider is constructed *after* the super-init
            because it reads the panda's URDF directly (independent of
            SAPIEN scene state); the closure plumbed into
            :meth:`build_control_models` then reads ``self._latest_qpos``
            which the env updates each :meth:`_before_simulation_step`.
            """
            self._condition_config: ConditionConfig = resolve_condition(condition_id)
            self._episode_length: int = int(episode_length)
            self._root_seed: int = int(root_seed)
            self._force_torque_noise_sigma: float = float(force_torque_noise_sigma)
            self._rng: np.random.Generator = derive_substream(
                _SUBSTREAM_NAME, root_seed=self._root_seed
            ).default_rng()
            # Per-uid latest qpos cache for the Jacobian closure; populated
            # before each ``step``. See module docstring on the closure-
            # via-env-reference workaround (decision Q3 mod 3; ADR-004
            # §Open questions / slice P1.05.5).
            self._latest_qpos: dict[str, NDArray[np.float64]] = {}
            self._step_count: int = 0
            # PickCubeEnv-style task parameters; consistent with the
            # upstream defaults so the success / reward shaping path
            # behaves predictably under the multi-agent rig.
            self._cube_half_size: float = 0.02
            self._goal_thresh: float = 0.025
            self._cube_spawn_half_size: float = 0.05
            self._cube_spawn_center: tuple[float, float] = (0.0, 0.0)
            self._max_goal_height: float = 0.3
            # Attributes that ``_initialize_episode`` / ``_before_simulation_step``
            # touch MUST be assigned before ``super().__init__()``: ManiSkill's
            # ``BaseEnv.__init__`` calls ``self.reset(...)`` during super-init
            # (mani_skill/envs/sapien_env.py:327), which invokes
            # ``_initialize_episode`` *and* the post-reset qpos primer below
            # before the subclass body resumes. Subclass attributes written
            # after super-init do not yet exist when those hooks fire.
            self._panda_arm_dof: int = PANDA_ARM_DOF
            # Per-uid cache for the numerical-difference velocity in
            # build_agent_snapshots (D8 of the P1.04.5 design pass).
            # Cleared by _initialize_episode at every reset so the
            # cross-episode position delta doesn't produce a spurious
            # first-step velocity. The first-step velocity defaults to
            # zero on each fresh episode (the agents start at rest by
            # construction).
            self._prev_positions: dict[str, NDArray[np.float64]] = {}
            try:
                super().__init__(
                    robot_uids=self._condition_config.agent_uids,  # type: ignore[arg-type]
                    num_envs=num_envs,
                    obs_mode=self._condition_config.obs_mode,
                    control_mode="pd_joint_delta_pos",
                    render_mode=render_mode,
                    render_backend=render_backend if render_backend is not None else "gpu",
                )
            except RuntimeError as exc:
                raise ChamberEnvCompatibilityError(
                    "SAPIEN/Vulkan initialisation failed during "
                    f"Stage1PickPlaceEnv(condition_id={condition_id!r}) build: "
                    f"{exc}\nSet CHAMBER_RENDER_BACKEND=none (or "
                    'render_backend="none") on CUDA-only hosts; see ADR-001 §Risks.'
                ) from exc
            # Jacobian provider is wristcam-URDF based; the partner
            # (panda_partner) shares the kinematic chain so a single
            # provider serves both panda uids. ``panda_v2.urdf`` and
            # ``panda_v3.urdf`` differ only in the wristcam mount
            # (extrinsic frame); the 7 arm joints + TCP link are
            # identical, so reusing the v3-URDF chain for the partner
            # introduces no error in the linear Jacobian.
            self._jacobian_provider: PandaJacobianProvider = PandaJacobianProvider()
            # Per-uid safety radii (Q3 of the P1.04.5 design pass).
            # Asserted at construction against URDF-envelope minimums.
            self._safety_radii: dict[str, float] = {
                uid: PANDA_SAFETY_RADIUS_M
                if uid in ("panda_wristcam", "panda_partner")
                else FETCH_SAFETY_RADIUS_M
                for uid in self._condition_config.agent_uids
            }
            _validate_safety_radii(self._safety_radii)

        # ----- ManiSkill v3 BaseEnv hooks -----

        def _load_agent(  # type: ignore[override]
            self,
            options: dict[str, Any],
            initial_agent_poses: object = None,
            build_separate: bool = False,
        ) -> None:
            """Multi-robot ``_load_agent`` override with per-uid base poses.

            Delegates to
            :func:`chamber.envs._sapien_compat.load_agent_with_bare_uids`
            for the suffix-strip rebind. Passes an explicit per-agent
            pose list so the two robots don't collide at the table
            origin.
            """
            if initial_agent_poses is None:
                poses: list[object] = []
                for uid in self.robot_uids:  # type: ignore[union-attr]
                    xyz = _AGENT_BASE_POSE_XYZ.get(uid)
                    if xyz is None:
                        # Unknown uid (shouldn't happen given the
                        # condition table) — leave SAPIEN to use URDF
                        # default root pose.
                        poses.append(None)
                    else:
                        poses.append(sapien.Pose(p=list(xyz)))
                initial_agent_poses = poses
            load_agent_with_bare_uids(
                self,
                options,
                initial_agent_poses=initial_agent_poses,
                build_separate=build_separate,
            )

        def _load_scene(self, options: dict[str, Any]) -> None:
            """Build the table + cube + goal site (PickCubeEnv recipe; ADR-007 §Stage 1b)."""
            del options
            self.table_scene = TableSceneBuilder(self, robot_init_qpos_noise=0.02)
            self.table_scene.build()
            self.cube = actors.build_cube(
                self.scene,
                half_size=self._cube_half_size,
                color=[1, 0, 0, 1],
                name="cube",
                initial_pose=sapien.Pose(p=[0, 0, self._cube_half_size]),
            )
            self.goal_site = actors.build_sphere(
                self.scene,
                radius=self._goal_thresh,
                color=[0, 1, 0, 1],
                name="goal_site",
                body_type="kinematic",
                add_collision=False,
                initial_pose=sapien.Pose(),
            )
            self._hidden_objects.append(self.goal_site)  # type: ignore[attr-defined]

        def _initialize_episode(self, env_idx: torch.Tensor, options: dict[str, Any]) -> None:
            """Reset cube + goal poses per episode (P6 determinism via :attr:`_rng`)."""
            del options
            # Clear the per-agent position cache so build_agent_snapshots'
            # first-step velocity (numerical-difference) defaults to
            # zero on each fresh episode rather than diffing against
            # the previous episode's final position (R2 of the P1.04.5
            # risk register).
            self._prev_positions.clear()
            with torch.device(self.device):  # type: ignore[attr-defined]
                b = len(env_idx)
                self.table_scene.initialize(env_idx)
                # Cube xy uniform in the spawn box; z at half-size so it
                # rests on the table surface. Routed through the env's
                # P6-harnessed RNG instead of torch.rand so two
                # reset(seed=K) calls reproduce byte-identical state
                # (P6 / ADR-002 determinism rule).
                xy_cube_np = self._rng.uniform(
                    -self._cube_spawn_half_size, self._cube_spawn_half_size, size=(b, 2)
                )
                xy_cube = torch.from_numpy(np.asarray(xy_cube_np, dtype=np.float32))
                xyz = torch.zeros((b, 3))
                xyz[:, 0] = xy_cube[:, 0] + self._cube_spawn_center[0]
                xyz[:, 1] = xy_cube[:, 1] + self._cube_spawn_center[1]
                xyz[:, 2] = self._cube_half_size
                # No rotation randomisation in P1.03 — upstream PickCubeEnv
                # uses ``randomization.random_quaternions(b, lock_x=True,
                # lock_y=True)``; for the two-robot env the orientation
                # matters less than the position. Re-enable if pilot data
                # shows it changes the AS gap.
                self.cube.set_pose(Pose.create_from_pq(xyz))

                # Goal xy in the same envelope; z uniform in [0, max].
                xy_goal_np = self._rng.uniform(
                    -self._cube_spawn_half_size, self._cube_spawn_half_size, size=(b, 2)
                )
                z_goal_np = self._rng.uniform(0.0, self._max_goal_height, size=(b,))
                xy_goal = torch.from_numpy(np.asarray(xy_goal_np, dtype=np.float32))
                z_goal = torch.from_numpy(np.asarray(z_goal_np, dtype=np.float32))
                goal_xyz = torch.zeros((b, 3))
                goal_xyz[:, 0] = xy_goal[:, 0] + self._cube_spawn_center[0]
                goal_xyz[:, 1] = xy_goal[:, 1] + self._cube_spawn_center[1]
                goal_xyz[:, 2] = z_goal + xyz[:, 2]
                self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))
            # Prime the qpos cache so the Jacobian closure has a valid
            # value at the first safety-filter call. The training loop
            # in ``concerto.training.ego_aht.train`` queries the filter
            # *before* the first ``env.step()`` (act-then-step order),
            # so the per-physics-step ``_before_simulation_step`` hook
            # has not yet fired after a reset. Without this primer the
            # closure raises the "qpos not yet cached" guard.
            self._refresh_latest_qpos()

        # ----- Observation / reward / success (panda-routed; ADR-007 §Stage 1b) -----

        @property
        def _panda_agent(self) -> Any:  # noqa: ANN401 - ManiSkill Panda agent has no project type
            """Return the panda_wristcam :class:`Panda` agent instance (ego).

            Routes through ``self.agent.agents_dict`` instead of the
            single-agent ``self.agent`` attribute upstream PickCubeEnv
            relies on. Used by :meth:`evaluate` and
            :meth:`compute_normalized_dense_reward` so the success
            predicates and reward shaping respect the multi-agent rig.
            """
            return self.agent.agents_dict["panda_wristcam"]  # type: ignore[attr-defined, no-any-return]

        def _get_obs_extra(self, info: dict[str, Any]) -> dict[str, Any]:
            """Inject task obs + synthesised force-torque (ADR-007 §Stage 1b).

            Extras include:

            - ``tcp_pose`` — panda end-effector pose (mirrors upstream
              PickCubeEnv).
            - ``goal_pos`` — the goal site's xyz position.
            - ``cube_pose`` / ``cube_to_tcp_pos`` / ``cube_to_goal_pos``
              — present only when ``"state"`` is in the env's
              ``obs_mode`` (matches upstream PickCubeEnv's branch).
            - **``force_torque`` (this env's contribution)** — a 6-D
              Cartesian force vector per panda fingertip
              (``[fx_l, fy_l, fz_l, fx_r, fy_r, fz_r]``), synthesised
              from
              :meth:`PhysxArticulation.get_net_contact_forces([finger_links])`
              plus zero-mean sigma=:data:`SYNTHESISED_FT_NOISE_SIGMA_N`
              Gaussian noise routed through the env's P6-harnessed RNG.
              **Not** a real Panda wrist FT sensor reading; see module
              docstring and ADR-007 §Open questions for the wrist-mount
              ablation deferral.
            """
            del info
            panda = self._panda_agent
            obs: dict[str, Any] = {
                "tcp_pose": panda.tcp_pose.raw_pose,
                "goal_pos": self.goal_site.pose.p,
            }
            if "state" in self.obs_mode:  # type: ignore[operator]
                obs["cube_pose"] = self.cube.pose.raw_pose
                obs["cube_to_tcp_pos"] = self.cube.pose.p - panda.tcp_pose.p
                obs["cube_to_goal_pos"] = self.goal_site.pose.p - self.cube.pose.p
            # Synthesised FT: always injected so the OM channel-filter
            # wrapper has something to keep/zero per condition. The
            # AS conditions' state_dict obs_mode ignores it downstream.
            obs["force_torque"] = self._compute_synthesised_force_torque()
            return obs

        def _compute_synthesised_force_torque(self) -> NDArray[np.float32]:
            """Synthesise the 6-D fingertip-contact FT vector (ADR-007 §Stage 1b)."""
            panda = self._panda_agent
            fingers = ["panda_leftfinger", "panda_rightfinger"]
            try:
                # ``get_net_contact_forces`` returns shape ``(N_envs, n_links, 3)``
                # (Cartesian force per link) at SAPIEN's contact-manager
                # boundary (mani_skill/utils/structs/articulation.py:490).
                forces = panda.robot.get_net_contact_forces(fingers)
            except (RuntimeError, KeyError):
                # Finger links not yet registered (pre-reset state) —
                # return zeros so the wrapper-chain shape stays stable.
                return np.zeros((1, 6), dtype=np.float32)
            f_np = np.asarray(forces.detach().cpu(), dtype=np.float32)  # type: ignore[union-attr]
            # Flatten the per-link Cartesian force into a 6-D vector;
            # add zero-mean Gaussian noise from the P6-harnessed RNG so
            # the OM-hetero gate's signal-to-noise lands at the
            # documented Franka-spec lower bound.
            flat = f_np.reshape(f_np.shape[0], -1)  # (N_envs, 6)
            if self._force_torque_noise_sigma > 0.0:
                noise = self._rng.normal(
                    loc=0.0,
                    scale=self._force_torque_noise_sigma,
                    size=flat.shape,
                ).astype(np.float32)
                flat = flat + noise
            return flat

        def evaluate(self) -> dict[str, Any]:
            """ADR-007 §Stage 1b: multi-agent success predicate (panda-routed)."""
            panda = self._panda_agent
            is_obj_placed = (
                torch.linalg.norm(self.goal_site.pose.p - self.cube.pose.p, axis=1)
                <= self._goal_thresh
            )
            is_grasped = panda.is_grasping(self.cube)
            is_robot_static = panda.is_static(0.2)
            return {
                "success": is_obj_placed & is_robot_static,
                "is_obj_placed": is_obj_placed,
                "is_robot_static": is_robot_static,
                "is_grasped": is_grasped,
            }

        def compute_normalized_dense_reward(
            self,
            obs: Any,  # noqa: ANN401 - ManiSkill obs is an unconstrained dict-of-tensors
            action: Any,  # noqa: ANN401 - ManiSkill action is an unconstrained dict-of-tensors
            info: dict[str, Any],
        ) -> Any:  # noqa: ANN401 - upstream returns torch tensor, no project type
            """ADR-007 §Stage 1b: multi-agent normalised dense reward (panda-routed).

            Matches the upstream ``PickCubeEnv.compute_dense_reward`` shape
            (reaching + grasp + place * grasp + static * placed + 5*success)
            divided by 5. The reward signal is panda-centric (the panda
            does the picking); the fetch / partner is graded implicitly
            through whether it gets out of the way / cooperates on
            placement — explicit fetch-side reward shaping is a P1.04
            follow-up.
            """
            del obs, action
            panda = self._panda_agent
            tcp_to_obj_dist = torch.linalg.norm(self.cube.pose.p - panda.tcp_pose.p, axis=1)
            reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
            reward = reaching_reward
            is_grasped = info["is_grasped"]
            reward = reward + is_grasped
            obj_to_goal_dist = torch.linalg.norm(self.goal_site.pose.p - self.cube.pose.p, axis=1)
            place_reward = 1 - torch.tanh(5 * obj_to_goal_dist)
            reward = reward + place_reward * is_grasped
            qvel = panda.robot.get_qvel()
            qvel = qvel[..., :-2]  # drop the two gripper-finger DOF
            static_reward = 1 - torch.tanh(5 * torch.linalg.norm(qvel, axis=1))
            reward = reward + static_reward * info["is_obj_placed"]
            reward[info["success"]] = 5
            return reward / 5

        # ----- Per-step qpos refresh (Jacobian closure feed) -----

        def _refresh_latest_qpos(self) -> None:
            """Snapshot every panda uid's arm qpos into :attr:`_latest_qpos`.

            P1.04.5; ADR-007 §Stage 1b. Called from both
            :meth:`_before_simulation_step` (each physics step) and
            :meth:`_initialize_episode` (post-reset primer). The latter
            primes the cache between ``env.reset()`` and the first
            ``env.step()`` so the safety filter — which the training
            loop runs *before* the first step — can read a valid qpos
            without raising the "qpos not yet cached" guard.
            """
            for uid in self.robot_uids:  # type: ignore[union-attr]
                if uid in ("panda_wristcam", "panda_partner"):
                    agent = self.agent.agents_dict[uid]  # type: ignore[attr-defined]
                    full_qpos = agent.robot.get_qpos()
                    # Slice off the 2 gripper-finger joints; the
                    # Jacobian chain to panda_hand_tcp covers the 7 arm
                    # joints only.
                    qpos_arm = np.asarray(
                        full_qpos.detach().cpu(),
                        dtype=np.float64,  # type: ignore[union-attr]
                    ).reshape(-1)[: self._panda_arm_dof]
                    self._latest_qpos[uid] = qpos_arm

        def _before_simulation_step(self) -> None:
            """Cache panda arm qpos before each physics step (Jacobian closure feed).

            The :class:`JacobianControlModel` callable wired by
            :meth:`build_control_models` reads
            ``self._latest_qpos["panda_wristcam"]`` to compute the
            Jacobian at the current configuration. ManiSkill's
            :class:`BaseEnv` calls this hook once per physics step
            before SAPIEN advances; updating the cache here gives the
            closure access to the most recent qpos at the next safety-
            filter invocation. See module docstring on the closure-
            via-env-reference workaround (P1.05.5 contingent
            follow-up).
            """
            super()._before_simulation_step()
            self._refresh_latest_qpos()

        # ----- Public API for the safety-stack consumer -----

        @property
        def condition_id(self) -> str:
            """ADR-007 §Stage 1b: the :attr:`ConditionConfig.condition_id` this env was built for."""  # noqa: E501
            return self._condition_config.condition_id

        @property
        def condition_config(self) -> ConditionConfig:
            """The resolved :class:`ConditionConfig` for read-only consumers (ADR-007 §Stage 1b)."""
            return self._condition_config

        @property
        def ego_uid(self) -> str:
            """ADR-007 §Stage 1b: the ego uid for the active condition.

            First element of :attr:`ConditionConfig.agent_uids` by the
            Phase-0 ``_CONDITION_UIDS`` convention. Consumed by the
            Phase-1 :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            (P1.04) so the factory can read the ego uid from the env
            rather than from a side-table — single source of truth at
            the env layer.
            """
            return self._condition_config.agent_uids[0]

        @property
        def partner_uid(self) -> str:
            """ADR-007 §Stage 1b: the partner uid for the active condition.

            Second element of :attr:`ConditionConfig.agent_uids` (the
            ego is the first; the partner is the second by the Phase-0
            ``_CONDITION_UIDS`` convention from
            :mod:`chamber.benchmarks.stage1_as` /
            :mod:`chamber.benchmarks.stage1_om`). Consumed by the
            Phase-1 :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            (P1.04) to route the per-condition partner-stack dispatch
            (``panda_partner`` for AS-homo; ``fetch`` for AS-hetero /
            OM-*). Single source of truth at the env layer — future
            Stage-2/3 envs (P1.06+) follow the same property contract
            without needing a factory-side dispatch table.
            """
            return self._condition_config.agent_uids[1]

        def build_control_models(self) -> Mapping[str, AgentControlModel]:
            """Return per-uid :class:`AgentControlModel` map (ADR-004 §Decision).

            Plumbed in by
            :func:`chamber.benchmarks.training_runner.build_control_models`
            via the ``getattr(env, "build_control_models", None)``
            delegation route (P1.03 / Task B).

            The Jacobian callable is the closure-via-env-reference
            workaround documented in the module docstring: it captures
            ``self`` and reads ``self._latest_qpos["panda_wristcam"]``
            at call time. The closure stays valid across the 20
            evaluation episodes within each ``(seed, condition)`` cell
            because
            :func:`chamber.benchmarks.stage1_as._run_axis_with_factories`
            builds the env once per cell and reuses it
            (``stage1_as.py:253-275``).

            For the AS-homo condition both panda uids share the same
            Jacobian provider (their kinematic chains are identical;
            see :class:`chamber.agents.panda_jacobian.PandaJacobianProvider`'s
            choice of URDF). The closure resolves to the correct uid's
            latest qpos at call time.
            """
            # Source the panda action dim from the action space (same
            # path the fetch dim is sourced from below). Falls back to
            # 8 — the ``pd_joint_delta_pos`` default (7 arm + 1 mimic-
            # gripper). The closure captures this value once at
            # build-time, which is correct: action-space shape is fixed
            # over an env's lifetime.
            panda_action_dim = 8
            action_space = self.action_space
            if isinstance(action_space, gym.spaces.Dict):
                sub_panda = action_space.spaces.get("panda_wristcam")
                if isinstance(sub_panda, gym.spaces.Box) and sub_panda.shape is not None:
                    panda_action_dim = int(sub_panda.shape[0])

            def _panda_jacobian(snap: AgentSnapshot) -> NDArray[np.float64]:
                # Pull the live qpos from the env's cache; the closure
                # captures self by reference so the latest pre-step
                # value is always used.
                qpos = self._latest_qpos.get("panda_wristcam")
                if qpos is None:
                    msg = (
                        "Stage1PickPlaceEnv: panda_wristcam qpos not yet cached. "
                        "The JacobianControlModel closure was invoked before "
                        "the first env.step() — reset the env first."
                    )
                    raise RuntimeError(msg)
                # PandaJacobianProvider returns the linear (3, 7) Jacobian
                # for the 7-joint chain to ``panda_hand_tcp``. The
                # ``pd_joint_delta_pos`` control mode's panda action is
                # 8-D (7 arm deltas + 1 mimic-gripper delta — see the
                # ``panda_action_dim=8`` default in
                # :func:`build_control_models_for_condition`). Pad with
                # a zero column so the JacobianControlModel matmul
                # ``jac @ action`` composes: the mimic-gripper joint is
                # a sibling of ``panda_hand``, not in the TCP kinematic
                # chain, so its contribution to TCP Cartesian position
                # is zero by construction.
                jac_3x7 = self._jacobian_provider(snap, qpos)
                jac_padded = np.zeros((CARTESIAN_POSITION_DIM, panda_action_dim), dtype=np.float64)
                jac_padded[:, : self._panda_arm_dof] = jac_3x7
                return jac_padded

            fetch_uid_dim: int | None = None
            partner_uid = self._condition_config.agent_uids[1]
            if partner_uid == "fetch" and isinstance(action_space, gym.spaces.Dict):
                sub = action_space.spaces.get(partner_uid)
                if isinstance(sub, gym.spaces.Box) and sub.shape is not None:
                    fetch_uid_dim = int(sub.shape[0])
            return build_control_models_for_condition(
                self._condition_config.condition_id,
                jacobian_fn=_panda_jacobian,
                fetch_action_dim=fetch_uid_dim,
                panda_action_dim=panda_action_dim,
            )

        # ----- P1.04.5: safety-stack snapshot + bounds wiring -----

        def build_agent_snapshots(self) -> dict[str, AgentSnapshot]:
            """Build per-uid Cartesian :class:`AgentSnapshot` map (P1.04.5; ADR-007 §Stage 1b).

            Reads each agent's Cartesian position (TCP pose for the
            pandas; base-link pose for the fetch) and computes velocity
            via numerical-difference against the cached
            :attr:`_prev_positions` from the previous step. First-step
            velocity defaults to zero on every fresh episode (the
            agents start at rest by construction; the cache is cleared
            in :meth:`_initialize_episode`).

            Used by the P1.04.5 safety-stack integration: the
            :func:`concerto.training.ego_aht.train` loop calls this
            once per step (when the safety filter is wired) to build
            the snapshot map the CBF-QP outer filter consumes via the
            ``partner_predicted_states`` kwarg + the conformal update's
            ``snaps_now`` / ``snaps_prev`` pair.

            Returns:
                ``{uid: AgentSnapshot(position, velocity, radius)}`` for
                every uid in :attr:`ConditionConfig.agent_uids`. Position
                is the 3-D Cartesian centre (panda TCP for the pandas;
                fetch base origin for the fetch); velocity is the
                numerical-difference against the previous call (zero on
                first call after :meth:`_initialize_episode`); radius
                is from :data:`PANDA_SAFETY_RADIUS_M` /
                :data:`FETCH_SAFETY_RADIUS_M` (validated at env
                construction by :func:`_validate_safety_radii`).
            """
            snapshots: dict[str, AgentSnapshot] = {}
            for uid in self._condition_config.agent_uids:
                agent = self.agent.agents_dict[uid]  # type: ignore[attr-defined]
                if uid in ("panda_wristcam", "panda_partner"):
                    # Panda: TCP pose. Squeeze the (num_envs=1, 3)
                    # batch dim to a flat 3-vector for the AgentSnapshot.
                    pos_raw = agent.tcp_pose.p
                else:
                    # Fetch: base-link pose (the robot's root pose).
                    pos_raw = agent.robot.get_pose().p
                position = np.asarray(
                    pos_raw.detach().cpu(),
                    dtype=np.float64,  # type: ignore[union-attr]
                ).reshape(-1)
                # ManiSkill emits (num_envs, 3) for each agent pose; the
                # ravel above flattens (1, 3) -> (3,) for num_envs=1, but
                # multi-env runs would emit (num_envs, 3) which ravels to
                # (3*num_envs,). The Stage-1b cell construction pins
                # num_envs=1 (factory contract); the slice is a defensive
                # guard against future multi-env builds.
                _cartesian_dim = 3  # ADR-004 §Decision: AgentSnapshot is 3-D Cartesian.
                if position.shape[0] > _cartesian_dim:
                    position = position[:_cartesian_dim]
                prev = self._prev_positions.get(uid)
                if prev is None:
                    velocity = np.zeros(3, dtype=np.float64)
                else:
                    # Numerical-difference velocity. The dt is the
                    # control_timestep; for the AgentSnapshot's contract
                    # this is the per-step Cartesian velocity. The
                    # safety filter's conformal update consumes this
                    # value alongside dt to compute the predicted
                    # pose; the consistency is the caller's
                    # responsibility (see
                    # chamber.benchmarks.training_runner.build_safety_dt).
                    velocity = (position - prev) / float(self.control_timestep)
                self._prev_positions[uid] = position
                snapshots[uid] = AgentSnapshot(
                    position=position,
                    velocity=velocity,
                    radius=self._safety_radii[uid],
                )
            return snapshots

        def build_bounds(self) -> Bounds:
            """Build the per-task :class:`Bounds` envelope (P1.04.5; ADR-006 §Decision).

            Sourced from the module-level constants pinned in P1.03
            (cartesian_accel_capacity) plus sensible Phase-1 defaults
            for the remaining fields. Consumed by the safety filter +
            the audit-gate predicate A's RHS (``cartesian_accel_capacity``).
            """
            return Bounds(
                action_linf_component=0.1,
                cartesian_accel_capacity=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
                action_rate=10.0,
                comm_latency_ms=0.0,
                force_limit=50.0,
            )

    inner = Stage1PickPlaceEnv(
        condition_id=condition_id,
        episode_length=episode_length,
        root_seed=root_seed,
        num_envs=num_envs,
        render_mode=render_mode,
        render_backend=render_backend,
        force_torque_noise_sigma=force_torque_noise_sigma,
    )
    # AS conditions: ManiSkill v3 obs_mode="state_dict" emits per-agent
    # Dict(qpos, qvel) but EgoPPOTrainer.from_config reads
    # obs["agent"][ego_uid]["state"]. The AS synthesizer adds that key;
    # pass-through for OM conditions. The OM channel filter then masks
    # per-condition for OM-vision-only; pass-through for AS and OM-hetero.
    # Wired in fixed order so OM callers of the factory don't have to
    # wrap manually (issue #165 §Proposed scope 4).
    from chamber.envs.stage1_obs_filter import (
        Stage1ASStateSynthesizer,
        Stage1OMChannelFilter,
    )

    return Stage1OMChannelFilter(Stage1ASStateSynthesizer(inner))

resolve_condition

resolve_condition(condition_id: str) -> ConditionConfig

Resolve a pre-registered condition_id to its config (ADR-007 §Stage 1b).

Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the four-condition coverage and the ValueError raise on bogus IDs.

Parameters:

  • condition_id (str) –

    One of the four verbatim Stage-1 pre-registration strings (see spikes/preregistration/{AS,OM}.yaml).

Returns:

Raises:

  • ValueError

    If condition_id is not one of the four Stage-1 pre-registration strings. The message names the four valid options and cites ADR-007 §Discipline (the project anti-pattern for editing prereg YAMLs).

Source code in src/chamber/envs/stage1_pickplace.py
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
def resolve_condition(condition_id: str) -> ConditionConfig:
    """Resolve a pre-registered ``condition_id`` to its config (ADR-007 §Stage 1b).

    Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the
    four-condition coverage and the ``ValueError`` raise on bogus IDs.

    Args:
        condition_id: One of the four verbatim Stage-1 pre-registration
            strings (see ``spikes/preregistration/{AS,OM}.yaml``).

    Returns:
        Resolved :class:`ConditionConfig` carrying the
        ``agent_uids`` tuple, env-level ``obs_mode`` superset, and the
        boolean flags downstream wrappers / scripts read.

    Raises:
        ValueError: If ``condition_id`` is not one of the four
            Stage-1 pre-registration strings. The message names the
            four valid options and cites ADR-007 §Discipline (the
            project anti-pattern for editing prereg YAMLs).
    """
    try:
        return _CONDITION_TABLE[condition_id]
    except KeyError as exc:
        msg = (
            f"Stage1PickPlaceEnv: condition_id {condition_id!r} is not one of the "
            f"four Stage-1 pre-registration strings {sorted(_CONDITION_TABLE)!r}. "
            "If the prereg YAML has been edited, re-issue with a new git_tag "
            "per ADR-007 §Discipline; do not rename condition_ids in code."
        )
        raise ValueError(msg) from exc

Per-agent action-repeat wrapper.

ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).

PerAgentActionRepeatWrapper

Bases: Wrapper

Apply different action-repeat counts per agent uid.

Effective control frequency for agent a is::

base_freq / action_repeat[a]

At each env.step call, an agent whose repeat counter has not elapsed re-applies its most recent action instead of the newly submitted one.

ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).

Parameters:

  • env (Env) –

    Env exposing a gym.spaces.Dict action space keyed by agent uid.

  • action_repeat (Mapping[str, int]) –

    Mapping from agent uid to a positive integer repeat count. Keys absent from this mapping default to repeat=1 (pass-through).

Raises:

  • ChamberEnvCompatibilityError

    if the action space is not a gym.spaces.Dict keyed by agent uid (ADR-001 §Risks).

  • ValueError

    if any repeat count is less than 1.

Source code in src/chamber/envs/action_repeat.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
class PerAgentActionRepeatWrapper(gym.Wrapper):  # type: ignore[type-arg]
    """Apply different action-repeat counts per agent uid.

    Effective control frequency for agent ``a`` is::

        base_freq / action_repeat[a]

    At each ``env.step`` call, an agent whose repeat counter has not elapsed
    re-applies its most recent action instead of the newly submitted one.

    ADR-001 §Decision item 2; ADR-007 Stage 2 CR axis (rev 3 §Stage 2).

    Args:
        env: Env exposing a ``gym.spaces.Dict`` action space keyed by agent uid.
        action_repeat: Mapping from agent uid to a positive integer repeat count.
            Keys absent from this mapping default to repeat=1 (pass-through).

    Raises:
        ChamberEnvCompatibilityError: if the action space is not a
            ``gym.spaces.Dict`` keyed by agent uid (ADR-001 §Risks).
        ValueError: if any repeat count is less than 1.
    """

    def __init__(self, env: gym.Env, action_repeat: Mapping[str, int]) -> None:  # type: ignore[type-arg]
        """Validate the action space and store per-agent repeat counts."""
        super().__init__(env)
        if not isinstance(env.action_space, gym.spaces.Dict):
            raise ChamberEnvCompatibilityError(
                f"PerAgentActionRepeatWrapper requires a gym.spaces.Dict action "
                f"space keyed by agent uid; got {type(env.action_space).__name__}. "
                f"See ADR-001 §Risks."
            )
        for uid, count in action_repeat.items():
            if count < 1:
                raise ValueError(f"action_repeat[{uid!r}] must be ≥ 1, got {count}")
        self._action_repeat: dict[str, int] = dict(action_repeat)
        self._counters: dict[str, int] = dict.fromkeys(action_repeat, 0)
        self._last_action: dict[str, object] = {}

    def step(self, action: dict) -> tuple:  # type: ignore[override]
        """Step with per-agent action repeat applied.

        ADR-001 §Validation criteria condition (c): the slow agent's effective
        action update interval matches 1/rate within one env tick.
        """
        resolved: dict[str, object] = {}
        for uid, act in action.items():
            repeat = self._action_repeat.get(uid, 1)
            counter = self._counters.get(uid, 0)
            if counter == 0:
                resolved[uid] = act
                self._last_action[uid] = act
                self._counters[uid] = repeat - 1
            else:
                resolved[uid] = self._last_action[uid]
                self._counters[uid] = counter - 1
        return self.env.step(resolved)

    def reset(self, **kwargs: object) -> tuple:  # type: ignore[override]
        """Reset the env and clear all per-agent repeat state.

        ADR-001 §Decision item 2: no state leaks across episodes.
        """
        obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
        self._counters = dict.fromkeys(self._action_repeat, 0)
        self._last_action = {}
        return obs, info

__init__

__init__(
    env: Env, action_repeat: Mapping[str, int]
) -> None

Validate the action space and store per-agent repeat counts.

Source code in src/chamber/envs/action_repeat.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def __init__(self, env: gym.Env, action_repeat: Mapping[str, int]) -> None:  # type: ignore[type-arg]
    """Validate the action space and store per-agent repeat counts."""
    super().__init__(env)
    if not isinstance(env.action_space, gym.spaces.Dict):
        raise ChamberEnvCompatibilityError(
            f"PerAgentActionRepeatWrapper requires a gym.spaces.Dict action "
            f"space keyed by agent uid; got {type(env.action_space).__name__}. "
            f"See ADR-001 §Risks."
        )
    for uid, count in action_repeat.items():
        if count < 1:
            raise ValueError(f"action_repeat[{uid!r}] must be ≥ 1, got {count}")
    self._action_repeat: dict[str, int] = dict(action_repeat)
    self._counters: dict[str, int] = dict.fromkeys(action_repeat, 0)
    self._last_action: dict[str, object] = {}

reset

reset(**kwargs: object) -> tuple

Reset the env and clear all per-agent repeat state.

ADR-001 §Decision item 2: no state leaks across episodes.

Source code in src/chamber/envs/action_repeat.py
77
78
79
80
81
82
83
84
85
def reset(self, **kwargs: object) -> tuple:  # type: ignore[override]
    """Reset the env and clear all per-agent repeat state.

    ADR-001 §Decision item 2: no state leaks across episodes.
    """
    obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
    self._counters = dict.fromkeys(self._action_repeat, 0)
    self._last_action = {}
    return obs, info

step

step(action: dict) -> tuple

Step with per-agent action repeat applied.

ADR-001 §Validation criteria condition (c): the slow agent's effective action update interval matches 1/rate within one env tick.

Source code in src/chamber/envs/action_repeat.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def step(self, action: dict) -> tuple:  # type: ignore[override]
    """Step with per-agent action repeat applied.

    ADR-001 §Validation criteria condition (c): the slow agent's effective
    action update interval matches 1/rate within one env tick.
    """
    resolved: dict[str, object] = {}
    for uid, act in action.items():
        repeat = self._action_repeat.get(uid, 1)
        counter = self._counters.get(uid, 0)
        if counter == 0:
            resolved[uid] = act
            self._last_action[uid] = act
            self._counters[uid] = repeat - 1
        else:
            resolved[uid] = self._last_action[uid]
            self._counters[uid] = counter - 1
    return self.env.step(resolved)

Per-agent observation-channel filter.

ADR-001 §Decision item 1 (obs modality heterogeneity axis); ADR-007 §Stage 1 OM axis (rev 3 §Stage 1).

TextureFilterObsWrapper

Bases: ObservationWrapper

Filter observation channels per agent uid, zero-masking unlisted channels.

Tensor shapes are preserved — masked channels become zero arrays of the same shape and dtype. This keeps downstream policy networks oblivious to which modalities a given partner actually receives.

ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.

Parameters:

  • env (Env) –

    Env whose observation has shape {"agent": {uid: {channel: array}}}.

  • keep_per_agent (Mapping[str, Iterable[str]]) –

    Mapping from agent uid to the iterable of channel keys to keep. Channels not listed are zero-masked. Uids absent from this mapping are passed through unchanged.

Source code in src/chamber/envs/texture_filter.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
class TextureFilterObsWrapper(gym.ObservationWrapper):  # type: ignore[type-arg]
    """Filter observation channels per agent uid, zero-masking unlisted channels.

    Tensor shapes are preserved — masked channels become zero arrays of the
    same shape and dtype. This keeps downstream policy networks oblivious to
    which modalities a given partner actually receives.

    ADR-001 §Decision item 1; ADR-007 §Stage 1 OM axis.

    Args:
        env: Env whose observation has shape ``{"agent": {uid: {channel: array}}}``.
        keep_per_agent: Mapping from agent uid to the iterable of channel keys
            to keep. Channels not listed are zero-masked. Uids absent from this
            mapping are passed through unchanged.
    """

    def __init__(self, env: gym.Env, keep_per_agent: Mapping[str, Iterable[str]]) -> None:  # type: ignore[type-arg]
        """Store per-agent channel keep-sets as frozensets for O(1) lookup."""
        super().__init__(env)
        self._keep: dict[str, frozenset[str]] = {
            uid: frozenset(keys) for uid, keys in keep_per_agent.items()
        }

    def observation(self, observation: dict) -> dict:  # type: ignore[override]
        """Apply per-agent channel mask, zero-filling unlisted channels.

        ADR-001 §Decision item 1: obs modality is a controllable heterogeneity
        axis; shapes are invariant so vector rollout is unaffected.
        """
        agent_obs = observation.get("agent")
        if not isinstance(agent_obs, dict):
            return observation
        filtered_agents: dict[str, dict] = {}
        for uid, per_agent in agent_obs.items():
            if uid not in self._keep or not isinstance(per_agent, dict):
                filtered_agents[uid] = per_agent
                continue
            keep = self._keep[uid]
            filtered_agents[uid] = {
                ch: val if ch in keep else np.zeros_like(val) for ch, val in per_agent.items()
            }
        return {**observation, "agent": filtered_agents}

__init__

__init__(
    env: Env, keep_per_agent: Mapping[str, Iterable[str]]
) -> None

Store per-agent channel keep-sets as frozensets for O(1) lookup.

Source code in src/chamber/envs/texture_filter.py
35
36
37
38
39
40
def __init__(self, env: gym.Env, keep_per_agent: Mapping[str, Iterable[str]]) -> None:  # type: ignore[type-arg]
    """Store per-agent channel keep-sets as frozensets for O(1) lookup."""
    super().__init__(env)
    self._keep: dict[str, frozenset[str]] = {
        uid: frozenset(keys) for uid, keys in keep_per_agent.items()
    }

observation

observation(observation: dict) -> dict

Apply per-agent channel mask, zero-filling unlisted channels.

ADR-001 §Decision item 1: obs modality is a controllable heterogeneity axis; shapes are invariant so vector rollout is unaffected.

Source code in src/chamber/envs/texture_filter.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def observation(self, observation: dict) -> dict:  # type: ignore[override]
    """Apply per-agent channel mask, zero-filling unlisted channels.

    ADR-001 §Decision item 1: obs modality is a controllable heterogeneity
    axis; shapes are invariant so vector rollout is unaffected.
    """
    agent_obs = observation.get("agent")
    if not isinstance(agent_obs, dict):
        return observation
    filtered_agents: dict[str, dict] = {}
    for uid, per_agent in agent_obs.items():
        if uid not in self._keep or not isinstance(per_agent, dict):
            filtered_agents[uid] = per_agent
            continue
        keep = self._keep[uid]
        filtered_agents[uid] = {
            ch: val if ch in keep else np.zeros_like(val) for ch, val in per_agent.items()
        }
    return {**observation, "agent": filtered_agents}

Communication-channel observation injector (M1 carrier; M2 fills logic).

ADR-001 §Decision item 3 (comm shaping is the outermost wrapper); ADR-003 §Decision (fixed-format + optional learned overlay via CommChannel); ADR-006 §Decision (URLLC-anchored numeric bounds enforced by the channel impl).

The actual comm logic (fixed-format encoding, AoI accounting, optional learned overlay, URLLC-anchored degradation) lives in :mod:chamber.comm (M2). This wrapper is a thin adapter that places the comm output into obs["comm"].

CommShapingWrapper

Bases: Wrapper

Inject comm-channel output into the observation dict as obs["comm"].

Wraps any env whose observation space is a gym.spaces.Dict. On each reset and step, calls channel.encode(obs) and stores the result under the "comm" key. The channel itself controls all degradation semantics (latency, drop, jitter); this wrapper is a pure carrier.

ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.

Parameters:

  • env (Env) –

    Env with a gym.spaces.Dict observation space.

  • channel (CommChannel) –

    A :class:chamber.comm.api.CommChannel instance. In M1 this is a stub :class:chamber.comm.FixedFormatCommChannel that emits an empty packet shell. M2 fills the real fixed-format encoding.

Raises:

Source code in src/chamber/envs/comm_shaping.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class CommShapingWrapper(gym.Wrapper):  # type: ignore[type-arg]
    """Inject comm-channel output into the observation dict as ``obs["comm"]``.

    Wraps any env whose observation space is a ``gym.spaces.Dict``. On each
    reset and step, calls ``channel.encode(obs)`` and stores the result under
    the ``"comm"`` key. The channel itself controls all degradation semantics
    (latency, drop, jitter); this wrapper is a pure carrier.

    ADR-001 §Decision item 3; ADR-003 §Decision; ADR-006 §Decision.

    Args:
        env: Env with a ``gym.spaces.Dict`` observation space.
        channel: A :class:`chamber.comm.api.CommChannel` instance. In M1 this
            is a stub :class:`chamber.comm.FixedFormatCommChannel` that emits
            an empty packet shell. M2 fills the real fixed-format encoding.

    Raises:
        ChamberEnvCompatibilityError: if the inner observation space is not a
            ``gym.spaces.Dict`` (ADR-001 §Risks).
    """

    def __init__(self, env: gym.Env, channel: CommChannel) -> None:  # type: ignore[type-arg]
        """Validate the obs space and extend it with a ``comm`` sub-space."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            raise ChamberEnvCompatibilityError(
                f"CommShapingWrapper requires a gym.spaces.Dict observation space; "
                f"got {type(env.observation_space).__name__}. See ADR-001 §Risks."
            )
        self._channel = channel
        self._extend_observation_space()

    def _extend_observation_space(self) -> None:
        self.observation_space = gym.spaces.Dict(
            {
                **self.env.observation_space.spaces,  # type: ignore[attr-defined]
                "comm": gym.spaces.Dict({}),  # M2 fills the real comm obs space.
            }
        )

    def reset(self, **kwargs: object) -> tuple[dict, dict]:  # type: ignore[override]
        """Reset the env and channel, injecting comm into the initial obs.

        ADR-001 §Decision item 3; ADR-003 §Decision.
        """
        obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
        self._channel.reset()
        obs = dict(obs)
        obs["comm"] = self._channel.encode(obs)
        return obs, info

    def step(self, action: object) -> tuple[dict, object, bool, bool, dict]:  # type: ignore[override]
        """Step the env and inject updated comm into obs.

        ADR-001 §Decision item 3; ADR-003 §Decision.
        """
        obs, reward, terminated, truncated, info = self.env.step(action)
        obs = dict(obs)
        obs["comm"] = self._channel.encode(obs)
        return obs, reward, terminated, truncated, info

__init__

__init__(env: Env, channel: CommChannel) -> None

Validate the obs space and extend it with a comm sub-space.

Source code in src/chamber/envs/comm_shaping.py
46
47
48
49
50
51
52
53
54
55
def __init__(self, env: gym.Env, channel: CommChannel) -> None:  # type: ignore[type-arg]
    """Validate the obs space and extend it with a ``comm`` sub-space."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        raise ChamberEnvCompatibilityError(
            f"CommShapingWrapper requires a gym.spaces.Dict observation space; "
            f"got {type(env.observation_space).__name__}. See ADR-001 §Risks."
        )
    self._channel = channel
    self._extend_observation_space()

reset

reset(**kwargs: object) -> tuple[dict, dict]

Reset the env and channel, injecting comm into the initial obs.

ADR-001 §Decision item 3; ADR-003 §Decision.

Source code in src/chamber/envs/comm_shaping.py
65
66
67
68
69
70
71
72
73
74
def reset(self, **kwargs: object) -> tuple[dict, dict]:  # type: ignore[override]
    """Reset the env and channel, injecting comm into the initial obs.

    ADR-001 §Decision item 3; ADR-003 §Decision.
    """
    obs, info = self.env.reset(**kwargs)  # type: ignore[arg-type]
    self._channel.reset()
    obs = dict(obs)
    obs["comm"] = self._channel.encode(obs)
    return obs, info

step

step(
    action: object,
) -> tuple[dict, object, bool, bool, dict]

Step the env and inject updated comm into obs.

ADR-001 §Decision item 3; ADR-003 §Decision.

Source code in src/chamber/envs/comm_shaping.py
76
77
78
79
80
81
82
83
84
def step(self, action: object) -> tuple[dict, object, bool, bool, dict]:  # type: ignore[override]
    """Step the env and inject updated comm into obs.

    ADR-001 §Decision item 3; ADR-003 §Decision.
    """
    obs, reward, terminated, truncated, info = self.env.step(action)
    obs = dict(obs)
    obs["comm"] = self._channel.encode(obs)
    return obs, reward, terminated, truncated, info

2-agent MPE Cooperative-Push env for the empirical-guarantee experiment (T4b.13).

Plan/05 §3.5 (decision row "Empirical guarantee Phase-0 verification") + ADR-002 risk-mitigation #1: the Phase-0 ego-AHT empirical-guarantee experiment runs on a 2-agent MPE-style cooperative continuous-control task. This module ships a deterministic hand-rolled equivalent of PettingZoo's simple_spread_v3 rather than depending on PettingZoo itself — Phase 0 keeps the dependency footprint minimal (PettingZoo deprecated MPE in 1.25 and split it into the younger mpe2 package; either lift would mean a dep-auditor pass for a single Phase-0 task). The task structure matches:

  • Two agents (default uids ego and partner) on the unit-square plane [-1, 1]^2 with continuous 2-D velocity actions clipped to [-1, 1] per axis.
  • Two landmarks fixed per episode at random positions sampled from a determinism-harnessed numpy Generator (seed routed via :func:concerto.training.seeding.derive_substream).
  • Shared reward (both agents see the same scalar): r = -sum_landmark min_agent ||agent_pos - landmark_pos||. The cooperative-coverage gradient encourages the agents to pick distinct landmarks; a single agent cannot drive r to zero on its own.
  • Episode length fixed at :data:DEFAULT_EPISODE_LENGTH; truncation only (no terminal state).

The env is intentionally Gymnasium-compatible with CONCERTO's other wrappers (M2 comm + M3 safety): observation is keyed by obs["agent"][uid] with the state channel that :class:chamber.partners.heuristic.ScriptedHeuristicPartner already reads in its three-tier fallback (plan/04 §3.4). Wrapping this env through :class:chamber.envs.action_repeat.PerAgentActionRepeatWrapper, :class:chamber.envs.comm_shaping.CommShapingWrapper, etc. is therefore a zero-config drop-in.

Determinism (P6): two reset(seed=K) calls on the same root_seed produce byte-identical landmark layouts and agent start poses; the same action stream then produces a byte-identical state trajectory and reward curve. test_mpe_cooperative_push.py::test_determinism_byte_identical_traj pins this contract.

MPECooperativePushEnv

Bases: Env

2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).

See module docstring for the task design. The env exposes the standard Gymnasium reset + step API with multi-agent dict actions / dict observations keyed by uid, matching :class:tests.fakes.FakeMultiAgentEnv so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).

Source code in src/chamber/envs/mpe_cooperative_push.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
class MPECooperativePushEnv(gym.Env):  # type: ignore[type-arg]
    """2-agent MPE-style cooperative continuous-control env (T4b.13; ADR-002 risk-mitigation #1).

    See module docstring for the task design. The env exposes the standard
    Gymnasium ``reset`` + ``step`` API with multi-agent dict actions / dict
    observations keyed by ``uid``, matching :class:`tests.fakes.FakeMultiAgentEnv`
    so the M2/M3 wrapper stack drops in unchanged (plan/01 §3).
    """

    metadata: ClassVar[dict[str, list[str]]] = {"render_modes": []}  # type: ignore[misc]

    #: Number of agents the env supports — fixed at 2 by the cooperative-coverage task.
    _N_AGENTS: ClassVar[int] = 2

    def __init__(
        self,
        *,
        agent_uids: tuple[str, str] = ("ego", "partner"),
        episode_length: int = DEFAULT_EPISODE_LENGTH,
        root_seed: int = 0,
    ) -> None:
        """Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

        Args:
            agent_uids: The two uids the env exposes. The default
                ``("ego", "partner")`` matches the ego-AHT training loop
                convention. Reordering swaps which agent the trainer
                updates and which is held frozen — the env is agnostic.
            episode_length: Truncation horizon in env ticks (default
                :data:`DEFAULT_EPISODE_LENGTH`).
            root_seed: Project-wide root seed; the env derives a
                deterministic numpy ``Generator`` from this via
                :func:`concerto.training.seeding.derive_substream`
                (P6 reproducibility).

        Raises:
            ValueError: If ``agent_uids`` does not have exactly two
                distinct entries.
        """
        super().__init__()
        if len(agent_uids) != self._N_AGENTS or agent_uids[0] == agent_uids[1]:
            raise ValueError(
                f"agent_uids must be a 2-tuple of distinct strings; got {agent_uids!r}"
            )
        self._uids: tuple[str, str] = agent_uids
        self._episode_length: int = int(episode_length)
        self._root_seed: int = int(root_seed)

        self.action_space = gym.spaces.Dict(
            {
                uid: gym.spaces.Box(
                    low=-VELOCITY_BOUND,
                    high=VELOCITY_BOUND,
                    shape=(2,),
                    dtype=np.float32,
                )
                for uid in self._uids
            }
        )
        # state channel: self_pos(2) + self_vel(2) + landmark_rel(4) + partner_rel(2) = 10.
        agent_state = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(10,), dtype=np.float32)
        self.observation_space = gym.spaces.Dict(
            {
                "agent": gym.spaces.Dict(
                    {uid: gym.spaces.Dict({"state": agent_state}) for uid in self._uids}
                )
            }
        )

        self._rng: np.random.Generator = derive_substream(
            _SUBSTREAM_NAME, root_seed=self._root_seed
        ).default_rng()
        # Per-episode mutable state — set by reset.
        self._step_count: int = 0
        self._positions: dict[str, NDArray[np.float64]] = {}
        self._velocities: dict[str, NDArray[np.float64]] = {}
        self._landmark_positions: NDArray[np.float64] = np.zeros((N_LANDMARKS, 2))

    def reset(
        self,
        *,
        seed: int | None = None,
        options: dict[str, Any] | None = None,
    ) -> tuple[dict[str, Any], dict[str, Any]]:
        """Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

        .. note::

            This env's seeding semantics intentionally diverge from
            Gymnasium's convention (where ``reset(seed=K)`` is the
            *complete* seed source). Here, ``reset(seed=K)`` is folded
            with the constructor's ``root_seed`` via
            :func:`concerto.training.seeding.derive_substream` using the
            substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
            so two training runs at distinct ``root_seed`` values do
            *not* collide on the same per-episode RNG even at the same
            ``K``. This matches the project-wide seeding philosophy
            (P6 + ADR-002 §Decisions): the ``root_seed`` is the
            run-level pin, and per-episode ``seed`` is an episode index
            scoped under it.

        Args:
            seed: Per-episode seed (episode index, NOT a complete seed).
                When provided, re-derives the env's RNG via
                ``derive_substream(f"env.mpe_cooperative_push.episode.{seed}",
                root_seed=self._root_seed)``. When ``None``, continues
                from the RNG's existing state (Gymnasium convention).
            options: Reserved for the Gymnasium API; ignored.

        Returns:
            ``(obs, info)`` where ``obs`` matches the
            :attr:`observation_space` shape and ``info`` is empty.
        """
        del options
        if seed is not None:
            self._rng = derive_substream(
                f"{_SUBSTREAM_NAME}.episode.{seed}",
                root_seed=self._root_seed,
            ).default_rng()
        self._step_count = 0
        self._landmark_positions = self._rng.uniform(
            -POSITION_BOUND, POSITION_BOUND, size=(N_LANDMARKS, 2)
        )
        for uid in self._uids:
            self._positions[uid] = self._rng.uniform(-POSITION_BOUND, POSITION_BOUND, size=(2,))
            self._velocities[uid] = np.zeros(2, dtype=np.float64)
        return self._build_obs(), {}

    def step(
        self,
        action: dict[str, NDArray[np.floating]],
    ) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
        """Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

        Args:
            action: Dict keyed by ``uid``, each value a 2-D velocity
                command. Components are clipped to
                ``[-VELOCITY_BOUND, +VELOCITY_BOUND]`` before integration
                so a misbehaving partner cannot escape the plane.

        Returns:
            ``(obs, reward, terminated, truncated, info)``. ``reward`` is
            the *shared* cooperative-coverage scalar (both agents see the
            same value); the trainer's ego/partner split happens upstream
            in :func:`concerto.training.ego_aht.train`. ``terminated`` is
            always ``False`` (no terminal state); ``truncated`` is ``True``
            on the last tick of the episode.

        Raises:
            ValueError: If ``action`` is missing a required uid.
        """
        for uid in self._uids:
            if uid not in action:
                raise ValueError(
                    f"step() action missing uid {uid!r}; got keys {list(action.keys())}"
                )
        for uid in self._uids:
            a = np.clip(
                np.asarray(action[uid], dtype=np.float64),
                -VELOCITY_BOUND,
                VELOCITY_BOUND,
            )
            self._velocities[uid] = a
            self._positions[uid] = np.clip(
                self._positions[uid] + a * DT, -POSITION_BOUND, POSITION_BOUND
            )
        self._step_count += 1
        truncated = self._step_count >= self._episode_length
        return self._build_obs(), self._compute_reward(), False, truncated, {}

    def _build_obs(self) -> dict[str, Any]:
        """Pack the Gymnasium-conformant obs dict (plan/01 §3; plan/04 §3.4)."""
        out_agents: dict[str, dict[str, NDArray[np.float32]]] = {}
        for uid in self._uids:
            self_pos = self._positions[uid]
            self_vel = self._velocities[uid]
            landmark_rel = (self._landmark_positions - self_pos).flatten()
            partner_uid = self._uids[1] if uid == self._uids[0] else self._uids[0]
            partner_rel = self._positions[partner_uid] - self_pos
            state = np.concatenate([self_pos, self_vel, landmark_rel, partner_rel]).astype(
                np.float32
            )
            out_agents[uid] = {"state": state}
        return {"agent": out_agents}

    def _compute_reward(self) -> float:
        """Shared cooperative-coverage scalar (plan/05 §3.5).

        ``r = -sum_landmark min_agent dist(agent, landmark)``. Both
        agents always see the same value so the trainer can route it to
        the ego's policy update without a credit-assignment step.

        Edge case: when an agent sits exactly on a landmark, that
        landmark contributes ``0`` to the sum. Perfect coverage (one
        agent per landmark, both at distance ``0``) yields ``r = 0`` —
        the natural ceiling. The empirical-guarantee assertion in
        T4b.13 is on *moving-window-of-10 non-decreasing*, not on
        approaching this ceiling.

        ``np.linalg.norm`` matches PettingZoo ``simple_spread_v3``'s
        Euclidean reward (squared distance would change the gradient
        shape and break the "matches simple_spread" claim should
        Phase 1 ever cross-validate against the real env).
        """
        # Phase 1 TODO: cross-validate trajectory against
        # mpe2.simple_spread_v3 once the dep-auditor has cleared mpe2.
        total = 0.0
        for landmark in self._landmark_positions:
            distances = [
                float(np.linalg.norm(self._positions[uid] - landmark)) for uid in self._uids
            ]
            total += min(distances)
        return -total

__init__

__init__(
    *,
    agent_uids: tuple[str, str] = ("ego", "partner"),
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
) -> None

Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

Parameters:

  • agent_uids (tuple[str, str], default: ('ego', 'partner') ) –

    The two uids the env exposes. The default ("ego", "partner") matches the ego-AHT training loop convention. Reordering swaps which agent the trainer updates and which is held frozen — the env is agnostic.

  • episode_length (int, default: DEFAULT_EPISODE_LENGTH ) –

    Truncation horizon in env ticks (default :data:DEFAULT_EPISODE_LENGTH).

  • root_seed (int, default: 0 ) –

    Project-wide root seed; the env derives a deterministic numpy Generator from this via :func:concerto.training.seeding.derive_substream (P6 reproducibility).

Raises:

  • ValueError

    If agent_uids does not have exactly two distinct entries.

Source code in src/chamber/envs/mpe_cooperative_push.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def __init__(
    self,
    *,
    agent_uids: tuple[str, str] = ("ego", "partner"),
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
) -> None:
    """Build a 2-agent Cooperative-Push env (T4b.13; plan/05 §3.5).

    Args:
        agent_uids: The two uids the env exposes. The default
            ``("ego", "partner")`` matches the ego-AHT training loop
            convention. Reordering swaps which agent the trainer
            updates and which is held frozen — the env is agnostic.
        episode_length: Truncation horizon in env ticks (default
            :data:`DEFAULT_EPISODE_LENGTH`).
        root_seed: Project-wide root seed; the env derives a
            deterministic numpy ``Generator`` from this via
            :func:`concerto.training.seeding.derive_substream`
            (P6 reproducibility).

    Raises:
        ValueError: If ``agent_uids`` does not have exactly two
            distinct entries.
    """
    super().__init__()
    if len(agent_uids) != self._N_AGENTS or agent_uids[0] == agent_uids[1]:
        raise ValueError(
            f"agent_uids must be a 2-tuple of distinct strings; got {agent_uids!r}"
        )
    self._uids: tuple[str, str] = agent_uids
    self._episode_length: int = int(episode_length)
    self._root_seed: int = int(root_seed)

    self.action_space = gym.spaces.Dict(
        {
            uid: gym.spaces.Box(
                low=-VELOCITY_BOUND,
                high=VELOCITY_BOUND,
                shape=(2,),
                dtype=np.float32,
            )
            for uid in self._uids
        }
    )
    # state channel: self_pos(2) + self_vel(2) + landmark_rel(4) + partner_rel(2) = 10.
    agent_state = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(10,), dtype=np.float32)
    self.observation_space = gym.spaces.Dict(
        {
            "agent": gym.spaces.Dict(
                {uid: gym.spaces.Dict({"state": agent_state}) for uid in self._uids}
            )
        }
    )

    self._rng: np.random.Generator = derive_substream(
        _SUBSTREAM_NAME, root_seed=self._root_seed
    ).default_rng()
    # Per-episode mutable state — set by reset.
    self._step_count: int = 0
    self._positions: dict[str, NDArray[np.float64]] = {}
    self._velocities: dict[str, NDArray[np.float64]] = {}
    self._landmark_positions: NDArray[np.float64] = np.zeros((N_LANDMARKS, 2))

reset

reset(
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]

Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

.. note::

This env's seeding semantics intentionally diverge from
Gymnasium's convention (where ``reset(seed=K)`` is the
*complete* seed source). Here, ``reset(seed=K)`` is folded
with the constructor's ``root_seed`` via
:func:`concerto.training.seeding.derive_substream` using the
substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
so two training runs at distinct ``root_seed`` values do
*not* collide on the same per-episode RNG even at the same
``K``. This matches the project-wide seeding philosophy
(P6 + ADR-002 §Decisions): the ``root_seed`` is the
run-level pin, and per-episode ``seed`` is an episode index
scoped under it.

Parameters:

  • seed (int | None, default: None ) –

    Per-episode seed (episode index, NOT a complete seed). When provided, re-derives the env's RNG via derive_substream(f"env.mpe_cooperative_push.episode.{seed}", root_seed=self._root_seed). When None, continues from the RNG's existing state (Gymnasium convention).

  • options (dict[str, Any] | None, default: None ) –

    Reserved for the Gymnasium API; ignored.

Returns:

  • dict[str, Any]

    (obs, info) where obs matches the

  • dict[str, Any]

    attr:observation_space shape and info is empty.

Source code in src/chamber/envs/mpe_cooperative_push.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def reset(
    self,
    *,
    seed: int | None = None,
    options: dict[str, Any] | None = None,
) -> tuple[dict[str, Any], dict[str, Any]]:
    """Reset to a determinism-harnessed initial state (T4b.13; ADR-002 risk-mitigation #1; P6).

    .. note::

        This env's seeding semantics intentionally diverge from
        Gymnasium's convention (where ``reset(seed=K)`` is the
        *complete* seed source). Here, ``reset(seed=K)`` is folded
        with the constructor's ``root_seed`` via
        :func:`concerto.training.seeding.derive_substream` using the
        substream name ``"env.mpe_cooperative_push.episode.{K}"`` —
        so two training runs at distinct ``root_seed`` values do
        *not* collide on the same per-episode RNG even at the same
        ``K``. This matches the project-wide seeding philosophy
        (P6 + ADR-002 §Decisions): the ``root_seed`` is the
        run-level pin, and per-episode ``seed`` is an episode index
        scoped under it.

    Args:
        seed: Per-episode seed (episode index, NOT a complete seed).
            When provided, re-derives the env's RNG via
            ``derive_substream(f"env.mpe_cooperative_push.episode.{seed}",
            root_seed=self._root_seed)``. When ``None``, continues
            from the RNG's existing state (Gymnasium convention).
        options: Reserved for the Gymnasium API; ignored.

    Returns:
        ``(obs, info)`` where ``obs`` matches the
        :attr:`observation_space` shape and ``info`` is empty.
    """
    del options
    if seed is not None:
        self._rng = derive_substream(
            f"{_SUBSTREAM_NAME}.episode.{seed}",
            root_seed=self._root_seed,
        ).default_rng()
    self._step_count = 0
    self._landmark_positions = self._rng.uniform(
        -POSITION_BOUND, POSITION_BOUND, size=(N_LANDMARKS, 2)
    )
    for uid in self._uids:
        self._positions[uid] = self._rng.uniform(-POSITION_BOUND, POSITION_BOUND, size=(2,))
        self._velocities[uid] = np.zeros(2, dtype=np.float64)
    return self._build_obs(), {}

step

step(
    action: dict[str, NDArray[floating]],
) -> tuple[
    dict[str, Any], float, bool, bool, dict[str, Any]
]

Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

Parameters:

  • action (dict[str, NDArray[floating]]) –

    Dict keyed by uid, each value a 2-D velocity command. Components are clipped to [-VELOCITY_BOUND, +VELOCITY_BOUND] before integration so a misbehaving partner cannot escape the plane.

Returns:

  • dict[str, Any]

    (obs, reward, terminated, truncated, info). reward is

  • float

    the shared cooperative-coverage scalar (both agents see the

  • bool

    same value); the trainer's ego/partner split happens upstream

  • in ( bool ) –

    func:concerto.training.ego_aht.train. terminated is

  • dict[str, Any]

    always False (no terminal state); truncated is True

  • tuple[dict[str, Any], float, bool, bool, dict[str, Any]]

    on the last tick of the episode.

Raises:

  • ValueError

    If action is missing a required uid.

Source code in src/chamber/envs/mpe_cooperative_push.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def step(
    self,
    action: dict[str, NDArray[np.floating]],
) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
    """Advance one tick (T4b.13; ADR-002 risk-mitigation #1; plan/05 §3.5).

    Args:
        action: Dict keyed by ``uid``, each value a 2-D velocity
            command. Components are clipped to
            ``[-VELOCITY_BOUND, +VELOCITY_BOUND]`` before integration
            so a misbehaving partner cannot escape the plane.

    Returns:
        ``(obs, reward, terminated, truncated, info)``. ``reward`` is
        the *shared* cooperative-coverage scalar (both agents see the
        same value); the trainer's ego/partner split happens upstream
        in :func:`concerto.training.ego_aht.train`. ``terminated`` is
        always ``False`` (no terminal state); ``truncated`` is ``True``
        on the last tick of the episode.

    Raises:
        ValueError: If ``action`` is missing a required uid.
    """
    for uid in self._uids:
        if uid not in action:
            raise ValueError(
                f"step() action missing uid {uid!r}; got keys {list(action.keys())}"
            )
    for uid in self._uids:
        a = np.clip(
            np.asarray(action[uid], dtype=np.float64),
            -VELOCITY_BOUND,
            VELOCITY_BOUND,
        )
        self._velocities[uid] = a
        self._positions[uid] = np.clip(
            self._positions[uid] + a * DT, -POSITION_BOUND, POSITION_BOUND
        )
    self._step_count += 1
    truncated = self._step_count >= self._episode_length
    return self._build_obs(), self._compute_reward(), False, truncated, {}

Stage-1b pick-place env (ADR-007 §Stage 1b; P1.03 / PR #).

Real ManiSkill v3 wrap of the canonical pick-place task with the panda + fetch / panda + panda_partner robot tuples. Supports the four Stage-1 condition_id strings the AS / OM pre-registrations name verbatim (spikes/preregistration/AS.yaml, spikes/preregistration/OM.yaml; git tags prereg-stage1-{AS,OM}-2026-05-15):

  • AS-homo: stage1_pickplace_panda_only_mappo_shared_param -> agents ("panda_wristcam", "panda_partner").
  • AS-hetero: stage1_pickplace_panda_plus_fetch_ego_aht_happo_per_agent -> agents ("panda_wristcam", "fetch").
  • OM-homo: stage1_pickplace_vision_only -> agents ("panda_wristcam", "fetch"); vision-only channel keep-set.
  • OM-hetero: stage1_pickplace_vision_plus_force_torque_plus_proprio -> same agents; vision + synthesised FT + proprio channel keep-set.

Per ADR-007 §Discipline the pre-registration YAMLs and their git tags carry forward unchanged from Phase 0 — Stage 1b uses the same condition IDs, seed list (5 seeds, 20 episodes per seed = 100 episodes per condition), and episode budget; only the env implementation behind the condition IDs changes. Editing a prereg YAML post-launch is a project anti-pattern (ADR-007 §Discipline).

Stage-1a's MPE stand-in (:class:chamber.envs.mpe_cooperative_push.MPECooperativePushEnv) sequenced rig validation; this env is the Stage-1b science evaluation target. Compute budget per ADR-007 §Stage 1b: 5 GPU-h per axis on A100; :data:DEFAULT_EPISODE_LENGTH = 100 chosen to fit the budget given ~0.8-1.2 s/step at multi-agent SAPIEN scale.

Wrapper-only discipline (ADR-001 §Decision (wrapper-only)): no patches to mani_skill/; no _* private imports. The env subclasses :class:mani_skill.envs.sapien_env.BaseEnv directly (matching the Stage-0 smoke env's _Stage0SmokeEnv pattern, but with the canonical pick-place reward and a two-robot tuple). The :func:chamber.envs._sapien_compat.load_agent_with_bare_uids shim handles the multi-agent uid-suffix strip-rebind that both Stage-0 and Stage-1 need.

Action space and the AS axis (decision Q3 / ADR-004 §Decision): All agents use control_mode="pd_joint_delta_pos" so the 7-DOF panda arm action stays joint-space, not Cartesian — preserving the AS axis's intended "7-DOF arm vs 2-DOF differential-drive base" heterogeneity (ADR-007 §Stage 1 AS spike framing). The CBF-QP outer filter's Cartesian-acceleration interpretation of the action is then an approximation absorbed by the conformal-slack overlay (Huriot & Sibai 2025 §VI); see the ADR-007 §Stage 1b note for the quantitative dimensional analysis ("off by 1/dt² (~2500x at dt=0.02s, small-Δp scale)") and ADR-004 §Open questions / slice P1.05.5 for the contingent long-term fix (extend :class:AgentSnapshot with qpos).

Force-torque synthesis (decision Q2 / ADR-007 §Revision history rev. 5): The OM-hetero "force_torque" channel is synthesised from the panda gripper's fingertip contact wrenches via :meth:Articulation.get_net_contact_forces(["panda_leftfinger", "panda_rightfinger"]) plus zero-mean Gaussian noise at :data:SYNTHESISED_FT_NOISE_SIGMA_N Newtons (default 0.5 N — at the lower bound of the real Franka wrist-FT noise floor of 0.5-1.5 N). This is not a real Panda wrist FT sensor reading. Wrist-mounted virtual FT is a Phase-1.5 ablation (ADR-007 §Open questions; Stage-1b+0.5 follow-up). The sigma value lives in this module + the ADR-007 §Revision history entry only — never in the prereg YAML (ADR-007 §Discipline: tag rotation forbidden).

Cooperation gradient (decision Q in design conversation): Cube and goal spawn locations follow the upstream PickCubeEnv defaults for now (cube within ±5 cm of the table origin; goal within the same envelope, z ∈ [0, 0.3]). The cooperative gradient between AS-homo (two pandas) and AS-hetero (panda + fetch) is itself a Stage-1b empirical question; pose tuning is sequenced as a P1.04 follow-up once the first pilot's success-rate landscape is known (don't gold-plate at design time).

Tier-1 / Tier-2 split (decision 9): The module's top-level imports are intentionally Tier-1-safe (numpy, gymnasium, concerto.safety.api types, chamber.envs.errors). ManiSkill / SAPIEN / pytorch_kinematics imports happen inside :func:make_stage1_pickplace_env so python -c "import chamber.envs.stage1_pickplace" succeeds on a Vulkan-less host. The pure-function Tier-1 surface (:func:resolve_condition, :func:build_control_models_for_condition) is what the Tier-1 tests exercise; Tier-2 SAPIEN-gated tests cover real env construction.

References: - ADR-001 §Decision (wrapper-only); ADR-001 §Validation criteria (Stage-0 pattern this env mirrors). - ADR-004 §Decision (CBF-QP + JacobianControlModel contract); ADR-004 §Open questions (AgentSnapshot.qpos extension, slice P1.05.5). - ADR-005 §Decision (SAPIEN 3 + Warp-MPM substrate, inherited from ManiSkill v3). - ADR-007 §Stage 1b (this env's spec); ADR-007 §Discipline (tag rotation forbidden); ADR-007 §Open questions (FT wrist-mount ablation, P1.05.5 contingent slice). - spikes/preregistration/{AS,OM}.yaml (the condition_id strings this env resolves verbatim). - :class:concerto.safety.api.JacobianControlModel, :class:concerto.safety.api.DoubleIntegratorControlModel (the per-uid control models :meth:Stage1PickPlaceEnv.build_control_models returns).

ConditionConfig

Bases: NamedTuple

Resolved per-condition configuration (ADR-007 §Stage 1b).

Returned by :func:resolve_condition. Pure-Python NamedTuple so Tier-1 tests can construct fakes without any SAPIEN dependency.

Attributes:

  • condition_id (str) –

    The verbatim pre-registration condition_id string (one of the four Stage-1 strings).

  • agent_uids (tuple[str, str]) –

    Ordered (ego_uid, partner_uid) tuple. The ego is the agent the trainer updates / the safety filter authorises; the partner is the frozen / heuristic partner.

  • obs_mode (str) –

    ManiSkill v3 env-level obs_mode superset string passed to :class:BaseEnv.__init__. The OM channel-filter wrapper (:class:chamber.envs.stage1_obs_filter.Stage1OMChannelFilter) slices this superset per-uid to the condition's actual channel keep-set.

  • is_om_condition (bool) –

    True for the two OM condition_id strings (vision_only / vision_plus_force_torque_plus_proprio). Drives whether the channel-filter wrapper is engaged downstream.

  • ft_synthesis_enabled (bool) –

    True for the OM-hetero condition. The env always injects the synthesised force-torque channel into obs["agent"]["panda_wristcam"]["force_torque"]; the channel filter then masks or keeps it per condition. The flag here is informational only (consumed by the smoke scripts and the docstring report).

Source code in src/chamber/envs/stage1_pickplace.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
class ConditionConfig(NamedTuple):
    """Resolved per-condition configuration (ADR-007 §Stage 1b).

    Returned by :func:`resolve_condition`. Pure-Python NamedTuple so
    Tier-1 tests can construct fakes without any SAPIEN dependency.

    Attributes:
        condition_id: The verbatim pre-registration ``condition_id``
            string (one of the four Stage-1 strings).
        agent_uids: Ordered ``(ego_uid, partner_uid)`` tuple. The ego
            is the agent the trainer updates / the safety filter
            authorises; the partner is the frozen / heuristic partner.
        obs_mode: ManiSkill v3 env-level ``obs_mode`` superset string
            passed to :class:`BaseEnv.__init__`. The OM channel-filter
            wrapper (:class:`chamber.envs.stage1_obs_filter.Stage1OMChannelFilter`)
            slices this superset per-uid to the condition's actual
            channel keep-set.
        is_om_condition: ``True`` for the two OM ``condition_id`` strings
            (``vision_only`` / ``vision_plus_force_torque_plus_proprio``).
            Drives whether the channel-filter wrapper is engaged
            downstream.
        ft_synthesis_enabled: ``True`` for the OM-hetero condition. The
            env always *injects* the synthesised force-torque channel
            into ``obs["agent"]["panda_wristcam"]["force_torque"]``; the
            channel filter then masks or keeps it per condition. The
            flag here is informational only (consumed by the smoke
            scripts and the docstring report).
    """

    condition_id: str
    agent_uids: tuple[str, str]
    obs_mode: str
    is_om_condition: bool
    ft_synthesis_enabled: bool

build_control_models_for_condition

build_control_models_for_condition(
    condition_id: str,
    *,
    jacobian_fn: Callable[[AgentSnapshot], NDArray[float64]]
    | None = None,
    fetch_action_dim: int | None = None,
    panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]

Build per-uid :class:AgentControlModel map for condition_id (ADR-004 §Decision).

Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable is plumbed in by the env at construction time (see :meth:Stage1PickPlaceEnv.build_control_models); Tier-1 tests pass jacobian_fn=None to exercise the model-construction path, which keeps the placeholder per :class:JacobianControlModel's loud-fail-on-use contract (concerto/safety/api.py:552).

Parameters:

  • condition_id (str) –

    Stage-1 condition_id (raises if unknown).

  • jacobian_fn (Callable[[AgentSnapshot], NDArray[float64]] | None, default: None ) –

    Optional Jacobian callable to embed in the panda uid's :class:JacobianControlModel. None means the placeholder model is returned (Tier-1 path); pass the env's closure-via-env-reference callable (lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])) to get a working model for Tier-2 / production.

  • fetch_action_dim (int | None, default: None ) –

    Action dimension of the fetch uid's Box. None (default) is the Tier-1 sentinel — :class:DoubleIntegratorControlModel with action_dim=2 (just the differential-drive base axis; the arm + body + gripper are folded into the same model with the canonical "treat the whole action as a 1-D Cartesian acceleration" approximation P1.05.5 will tighten). Tier-2 paths pass the actual env.action_space[uid].shape[0].

  • panda_action_dim (int, default: 8 ) –

    Action dimension of the panda uid's Box. Defaults to 8 (7 arm joint deltas + 1 mimic-gripper joint delta under pd_joint_delta_pos). Both panda_wristcam and panda_partner share this shape.

Returns:

Raises:

  • ValueError

    If condition_id is not in the four-condition table (delegated to :func:resolve_condition).

Source code in src/chamber/envs/stage1_pickplace.py
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def build_control_models_for_condition(
    condition_id: str,
    *,
    jacobian_fn: Callable[[AgentSnapshot], NDArray[np.float64]] | None = None,
    fetch_action_dim: int | None = None,
    panda_action_dim: int = 8,
) -> dict[str, AgentControlModel]:
    """Build per-uid :class:`AgentControlModel` map for ``condition_id`` (ADR-004 §Decision).

    Pure Tier-1 function — no SAPIEN dependency. The Jacobian callable
    is plumbed in by the env at construction time (see
    :meth:`Stage1PickPlaceEnv.build_control_models`); Tier-1 tests pass
    ``jacobian_fn=None`` to exercise the model-construction path, which
    keeps the placeholder per :class:`JacobianControlModel`'s
    loud-fail-on-use contract (``concerto/safety/api.py:552``).

    Args:
        condition_id: Stage-1 ``condition_id`` (raises if unknown).
        jacobian_fn: Optional Jacobian callable to embed in the panda
            uid's :class:`JacobianControlModel`. ``None`` means the
            placeholder model is returned (Tier-1 path); pass the env's
            closure-via-env-reference callable
            (``lambda snap: env._jacobian_provider(snap, env._latest_qpos["panda_wristcam"])``)
            to get a working model for Tier-2 / production.
        fetch_action_dim: Action dimension of the fetch uid's Box.
            ``None`` (default) is the Tier-1 sentinel — :class:`DoubleIntegratorControlModel`
            with ``action_dim=2`` (just the differential-drive base axis;
            the arm + body + gripper are folded into the same model
            with the canonical "treat the whole action as a 1-D
            Cartesian acceleration" approximation P1.05.5 will tighten).
            Tier-2 paths pass the actual ``env.action_space[uid].shape[0]``.
        panda_action_dim: Action dimension of the panda uid's Box.
            Defaults to ``8`` (7 arm joint deltas + 1 mimic-gripper
            joint delta under ``pd_joint_delta_pos``). Both
            ``panda_wristcam`` and ``panda_partner`` share this shape.

    Returns:
        ``{uid: AgentControlModel}`` map keyed by the condition's
        :attr:`ConditionConfig.agent_uids`. The panda uid (always the
        ego, always present) gets a :class:`JacobianControlModel`; the
        partner uid gets a :class:`JacobianControlModel` for
        ``panda_partner`` (AS-homo) or
        :class:`DoubleIntegratorControlModel` for ``fetch`` (AS-hetero /
        OM-*).

    Raises:
        ValueError: If ``condition_id`` is not in the four-condition
            table (delegated to :func:`resolve_condition`).
    """
    config = resolve_condition(condition_id)
    ego_uid, partner_uid = config.agent_uids
    models: dict[str, AgentControlModel] = {}
    # Ego is always panda_wristcam — 7-DOF arm + mimic-gripper.
    # JacobianControlModel position_dim=3 (Cartesian) matches the
    # AgentSnapshot Cartesian-only contract (cbf_qp.py:103-110).
    models[ego_uid] = JacobianControlModel(
        uid=ego_uid,
        action_dim=panda_action_dim,
        position_dim=3,
        jacobian_fn=jacobian_fn,
        max_cartesian_accel_value=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
    )
    if partner_uid == "panda_partner":
        # AS-homo: second panda, identical kinematics, same Jacobian
        # callable wired by the env via the closure pattern.
        models[partner_uid] = JacobianControlModel(
            uid=partner_uid,
            action_dim=panda_action_dim,
            position_dim=3,
            jacobian_fn=jacobian_fn,
            max_cartesian_accel_value=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
        )
    else:
        # AS-hetero / OM-*: fetch base+arm uses DoubleIntegratorControlModel.
        # action_dim default 2 covers the 2-DOF differential-drive base
        # slice that ADR-007 §Stage 1 AS spike framing names; real
        # Tier-2 path passes the full action_space shape.
        effective_action_dim = 2 if fetch_action_dim is None else fetch_action_dim
        models[partner_uid] = DoubleIntegratorControlModel(
            uid=partner_uid,
            action_dim=effective_action_dim,
        )
    return models

make_stage1_pickplace_env

make_stage1_pickplace_env(
    *,
    condition_id: str,
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
    num_envs: int = 1,
    render_mode: str | None = None,
    render_backend: str | None = None,
    force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]

Build a :class:Stage1PickPlaceEnv instance (ADR-007 §Stage 1b).

Factory entry point. SAPIEN / ManiSkill imports are deferred to the body so python -c "import chamber.envs.stage1_pickplace" works on a Vulkan-less host (Tier-1 contract). The :class:Stage1PickPlaceEnv class itself is defined inside the factory body — same pattern as :func:chamber.benchmarks.stage0_smoke.make_stage0_env.

Parameters:

  • condition_id (str) –

    One of the four Stage-1 condition_id strings; raises :class:ValueError on any other input (cf. :func:resolve_condition).

  • episode_length (int, default: DEFAULT_EPISODE_LENGTH ) –

    Truncation horizon, in env ticks. Default :data:DEFAULT_EPISODE_LENGTH (100). See module docstring for the budget rationale.

  • root_seed (int, default: 0 ) –

    Project root seed routed to the env's :func:concerto.training.seeding.derive_substream substream for P6 reproducibility (P6 / ADR-002 determinism rule).

  • num_envs (int, default: 1 ) –

    ManiSkill vectorisation count; default 1 (Stage-1b paths are single-env per cell). Higher values are supported but untested.

  • render_mode (str | None, default: None ) –

    ManiSkill render_mode (None, "human", "rgb_array", etc.). None (default) disables rendering for the spike-runner path.

  • render_backend (str | None, default: None ) –

    ManiSkill render_backend ("gpu", "none", …). None (default) falls through to ManiSkill's own default. The Stage-0 smoke env uses CHAMBER_RENDER_BACKEND env-var probe + the visual-material strip patch; Stage-1b follows the same pattern but exposes the backend choice as a constructor kwarg too.

  • force_torque_noise_sigma (float, default: SYNTHESISED_FT_NOISE_SIGMA_N ) –

    sigma of the zero-mean Gaussian noise added to the synthesised force-torque channel, in Newtons. Default :data:SYNTHESISED_FT_NOISE_SIGMA_N. Override only with an ADR-007 §Revision history entry (the value is load-bearing for the OM gate; see module docstring).

Returns:

  • Env[Any, Any]

    class:Stage1PickPlaceEnv instance, ready to call

  • Env[Any, Any]

    reset(seed=K) on. The action space is a

  • Env[Any, Any]

    class:gym.spaces.Dict keyed by the resolved

  • Env[Any, Any]

    attr:ConditionConfig.agent_uids.

Raises:

  • ValueError

    If condition_id is unknown (see :func:resolve_condition).

  • ChamberEnvCompatibilityError

    If ManiSkill / SAPIEN / Vulkan initialisation fails (CPU-only host without a fallback render_backend="none"; see ADR-001 §Risks).

Source code in src/chamber/envs/stage1_pickplace.py
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
def make_stage1_pickplace_env(
    *,
    condition_id: str,
    episode_length: int = DEFAULT_EPISODE_LENGTH,
    root_seed: int = 0,
    num_envs: int = 1,
    render_mode: str | None = None,
    render_backend: str | None = None,
    force_torque_noise_sigma: float = SYNTHESISED_FT_NOISE_SIGMA_N,
) -> gym.Env[Any, Any]:
    """Build a :class:`Stage1PickPlaceEnv` instance (ADR-007 §Stage 1b).

    Factory entry point. SAPIEN / ManiSkill imports are deferred to the
    body so ``python -c "import chamber.envs.stage1_pickplace"`` works
    on a Vulkan-less host (Tier-1 contract). The
    :class:`Stage1PickPlaceEnv` class itself is defined inside the
    factory body — same pattern as
    :func:`chamber.benchmarks.stage0_smoke.make_stage0_env`.

    Args:
        condition_id: One of the four Stage-1 ``condition_id`` strings;
            raises :class:`ValueError` on any other input
            (cf. :func:`resolve_condition`).
        episode_length: Truncation horizon, in env ticks. Default
            :data:`DEFAULT_EPISODE_LENGTH` (100). See module docstring
            for the budget rationale.
        root_seed: Project root seed routed to the env's
            :func:`concerto.training.seeding.derive_substream` substream
            for P6 reproducibility (P6 / ADR-002 determinism rule).
        num_envs: ManiSkill vectorisation count; default 1 (Stage-1b
            paths are single-env per cell). Higher values are supported
            but untested.
        render_mode: ManiSkill ``render_mode`` (``None``,
            ``"human"``, ``"rgb_array"``, etc.). ``None`` (default)
            disables rendering for the spike-runner path.
        render_backend: ManiSkill ``render_backend`` (``"gpu"``,
            ``"none"``, …). ``None`` (default) falls through to
            ManiSkill's own default. The Stage-0 smoke env uses
            ``CHAMBER_RENDER_BACKEND`` env-var probe + the
            visual-material strip patch; Stage-1b follows the same
            pattern but exposes the backend choice as a constructor
            kwarg too.
        force_torque_noise_sigma: sigma of the zero-mean Gaussian noise
            added to the synthesised force-torque channel, in Newtons.
            Default :data:`SYNTHESISED_FT_NOISE_SIGMA_N`. Override only
            with an ADR-007 §Revision history entry (the value is
            load-bearing for the OM gate; see module docstring).

    Returns:
        :class:`Stage1PickPlaceEnv` instance, ready to call
        ``reset(seed=K)`` on. The action space is a
        :class:`gym.spaces.Dict` keyed by the resolved
        :attr:`ConditionConfig.agent_uids`.

    Raises:
        ValueError: If ``condition_id`` is unknown (see
            :func:`resolve_condition`).
        ChamberEnvCompatibilityError: If ManiSkill / SAPIEN / Vulkan
            initialisation fails (CPU-only host without a fallback
            ``render_backend="none"``; see ADR-001 §Risks).
    """
    # Validate condition_id eagerly (the inner __init__ will resolve again,
    # but raising here gives the user a useful traceback in the Tier-1
    # bogus-id path without paying the SAPIEN-import cost).
    resolve_condition(condition_id)
    try:
        import mani_skill.envs  # noqa: F401 - registers ManiSkill env IDs
        import sapien
        import torch
        from mani_skill.envs.sapien_env import BaseEnv
        from mani_skill.utils.building import actors
        from mani_skill.utils.scene_builder.table import TableSceneBuilder
        from mani_skill.utils.structs.pose import Pose

        # Trigger PandaPartner registration before BaseEnv reaches the
        # multi-agent build loop. The chamber.agents package's
        # try/except ensures this import is the place where a Tier-1
        # context would have already failed if it was going to.
        import chamber.agents  # noqa: F401 - side-effect import for register_agent
        from chamber.agents.panda_jacobian import (
            CARTESIAN_POSITION_DIM,
            PANDA_ARM_DOF,
            PandaJacobianProvider,
        )
        from chamber.envs._sapien_compat import (
            load_agent_with_bare_uids,
            patch_sapien_urdf_no_visual_material,
        )
    except ImportError as exc:
        raise ChamberEnvCompatibilityError(
            "Stage1PickPlaceEnv requires mani_skill / sapien / pytorch_kinematics "
            "in the active venv. Install per pyproject.toml; see ADR-001 §Risks."
        ) from exc

    if render_backend == "none":
        # Headless host workaround — same idempotent SAPIEN URDF patch
        # the Stage-0 smoke env relies on (ADR-001 §Risks).
        patch_sapien_urdf_no_visual_material()

    class Stage1PickPlaceEnv(BaseEnv):  # type: ignore[misc, valid-type]
        """Two-robot pick-place env (ADR-007 §Stage 1b; ADR-001 §Decision).

        Subclasses :class:`mani_skill.envs.sapien_env.BaseEnv` directly
        (matching :class:`chamber.benchmarks.stage0_smoke._Stage0SmokeEnv`).
        See the module docstring for the wrapper-only-discipline /
        synthesised-FT / closure-via-env-Jacobian / cooperation-gradient
        rationale.

        Implements the four condition_id strings the AS / OM pre-
        registrations name; the obs-mode superset is set per condition
        at constructor time, the per-uid channel filtering happens in
        :class:`chamber.envs.stage1_obs_filter.Stage1OMChannelFilter`
        (kept separate so the AS path doesn't pay the filter cost).
        """

        SUPPORTED_ROBOTS: ClassVar[list[tuple[str, str]]] = [  # type: ignore[assignment]
            ("panda_wristcam", "panda_partner"),
            ("panda_wristcam", "fetch"),
        ]

        def __init__(
            self,
            *,
            condition_id: str,
            episode_length: int,
            root_seed: int,
            num_envs: int,
            render_mode: str | None,
            render_backend: str | None,
            force_torque_noise_sigma: float,
        ) -> None:
            """Build the env (ADR-007 §Stage 1b).

            Stores the condition-resolved metadata, sets up the
            determinism harness, then calls
            ``super().__init__(robot_uids=config.agent_uids, ...)``.
            The Jacobian provider is constructed *after* the super-init
            because it reads the panda's URDF directly (independent of
            SAPIEN scene state); the closure plumbed into
            :meth:`build_control_models` then reads ``self._latest_qpos``
            which the env updates each :meth:`_before_simulation_step`.
            """
            self._condition_config: ConditionConfig = resolve_condition(condition_id)
            self._episode_length: int = int(episode_length)
            self._root_seed: int = int(root_seed)
            self._force_torque_noise_sigma: float = float(force_torque_noise_sigma)
            self._rng: np.random.Generator = derive_substream(
                _SUBSTREAM_NAME, root_seed=self._root_seed
            ).default_rng()
            # Per-uid latest qpos cache for the Jacobian closure; populated
            # before each ``step``. See module docstring on the closure-
            # via-env-reference workaround (decision Q3 mod 3; ADR-004
            # §Open questions / slice P1.05.5).
            self._latest_qpos: dict[str, NDArray[np.float64]] = {}
            self._step_count: int = 0
            # PickCubeEnv-style task parameters; consistent with the
            # upstream defaults so the success / reward shaping path
            # behaves predictably under the multi-agent rig.
            self._cube_half_size: float = 0.02
            self._goal_thresh: float = 0.025
            self._cube_spawn_half_size: float = 0.05
            self._cube_spawn_center: tuple[float, float] = (0.0, 0.0)
            self._max_goal_height: float = 0.3
            # Attributes that ``_initialize_episode`` / ``_before_simulation_step``
            # touch MUST be assigned before ``super().__init__()``: ManiSkill's
            # ``BaseEnv.__init__`` calls ``self.reset(...)`` during super-init
            # (mani_skill/envs/sapien_env.py:327), which invokes
            # ``_initialize_episode`` *and* the post-reset qpos primer below
            # before the subclass body resumes. Subclass attributes written
            # after super-init do not yet exist when those hooks fire.
            self._panda_arm_dof: int = PANDA_ARM_DOF
            # Per-uid cache for the numerical-difference velocity in
            # build_agent_snapshots (D8 of the P1.04.5 design pass).
            # Cleared by _initialize_episode at every reset so the
            # cross-episode position delta doesn't produce a spurious
            # first-step velocity. The first-step velocity defaults to
            # zero on each fresh episode (the agents start at rest by
            # construction).
            self._prev_positions: dict[str, NDArray[np.float64]] = {}
            try:
                super().__init__(
                    robot_uids=self._condition_config.agent_uids,  # type: ignore[arg-type]
                    num_envs=num_envs,
                    obs_mode=self._condition_config.obs_mode,
                    control_mode="pd_joint_delta_pos",
                    render_mode=render_mode,
                    render_backend=render_backend if render_backend is not None else "gpu",
                )
            except RuntimeError as exc:
                raise ChamberEnvCompatibilityError(
                    "SAPIEN/Vulkan initialisation failed during "
                    f"Stage1PickPlaceEnv(condition_id={condition_id!r}) build: "
                    f"{exc}\nSet CHAMBER_RENDER_BACKEND=none (or "
                    'render_backend="none") on CUDA-only hosts; see ADR-001 §Risks.'
                ) from exc
            # Jacobian provider is wristcam-URDF based; the partner
            # (panda_partner) shares the kinematic chain so a single
            # provider serves both panda uids. ``panda_v2.urdf`` and
            # ``panda_v3.urdf`` differ only in the wristcam mount
            # (extrinsic frame); the 7 arm joints + TCP link are
            # identical, so reusing the v3-URDF chain for the partner
            # introduces no error in the linear Jacobian.
            self._jacobian_provider: PandaJacobianProvider = PandaJacobianProvider()
            # Per-uid safety radii (Q3 of the P1.04.5 design pass).
            # Asserted at construction against URDF-envelope minimums.
            self._safety_radii: dict[str, float] = {
                uid: PANDA_SAFETY_RADIUS_M
                if uid in ("panda_wristcam", "panda_partner")
                else FETCH_SAFETY_RADIUS_M
                for uid in self._condition_config.agent_uids
            }
            _validate_safety_radii(self._safety_radii)

        # ----- ManiSkill v3 BaseEnv hooks -----

        def _load_agent(  # type: ignore[override]
            self,
            options: dict[str, Any],
            initial_agent_poses: object = None,
            build_separate: bool = False,
        ) -> None:
            """Multi-robot ``_load_agent`` override with per-uid base poses.

            Delegates to
            :func:`chamber.envs._sapien_compat.load_agent_with_bare_uids`
            for the suffix-strip rebind. Passes an explicit per-agent
            pose list so the two robots don't collide at the table
            origin.
            """
            if initial_agent_poses is None:
                poses: list[object] = []
                for uid in self.robot_uids:  # type: ignore[union-attr]
                    xyz = _AGENT_BASE_POSE_XYZ.get(uid)
                    if xyz is None:
                        # Unknown uid (shouldn't happen given the
                        # condition table) — leave SAPIEN to use URDF
                        # default root pose.
                        poses.append(None)
                    else:
                        poses.append(sapien.Pose(p=list(xyz)))
                initial_agent_poses = poses
            load_agent_with_bare_uids(
                self,
                options,
                initial_agent_poses=initial_agent_poses,
                build_separate=build_separate,
            )

        def _load_scene(self, options: dict[str, Any]) -> None:
            """Build the table + cube + goal site (PickCubeEnv recipe; ADR-007 §Stage 1b)."""
            del options
            self.table_scene = TableSceneBuilder(self, robot_init_qpos_noise=0.02)
            self.table_scene.build()
            self.cube = actors.build_cube(
                self.scene,
                half_size=self._cube_half_size,
                color=[1, 0, 0, 1],
                name="cube",
                initial_pose=sapien.Pose(p=[0, 0, self._cube_half_size]),
            )
            self.goal_site = actors.build_sphere(
                self.scene,
                radius=self._goal_thresh,
                color=[0, 1, 0, 1],
                name="goal_site",
                body_type="kinematic",
                add_collision=False,
                initial_pose=sapien.Pose(),
            )
            self._hidden_objects.append(self.goal_site)  # type: ignore[attr-defined]

        def _initialize_episode(self, env_idx: torch.Tensor, options: dict[str, Any]) -> None:
            """Reset cube + goal poses per episode (P6 determinism via :attr:`_rng`)."""
            del options
            # Clear the per-agent position cache so build_agent_snapshots'
            # first-step velocity (numerical-difference) defaults to
            # zero on each fresh episode rather than diffing against
            # the previous episode's final position (R2 of the P1.04.5
            # risk register).
            self._prev_positions.clear()
            with torch.device(self.device):  # type: ignore[attr-defined]
                b = len(env_idx)
                self.table_scene.initialize(env_idx)
                # Cube xy uniform in the spawn box; z at half-size so it
                # rests on the table surface. Routed through the env's
                # P6-harnessed RNG instead of torch.rand so two
                # reset(seed=K) calls reproduce byte-identical state
                # (P6 / ADR-002 determinism rule).
                xy_cube_np = self._rng.uniform(
                    -self._cube_spawn_half_size, self._cube_spawn_half_size, size=(b, 2)
                )
                xy_cube = torch.from_numpy(np.asarray(xy_cube_np, dtype=np.float32))
                xyz = torch.zeros((b, 3))
                xyz[:, 0] = xy_cube[:, 0] + self._cube_spawn_center[0]
                xyz[:, 1] = xy_cube[:, 1] + self._cube_spawn_center[1]
                xyz[:, 2] = self._cube_half_size
                # No rotation randomisation in P1.03 — upstream PickCubeEnv
                # uses ``randomization.random_quaternions(b, lock_x=True,
                # lock_y=True)``; for the two-robot env the orientation
                # matters less than the position. Re-enable if pilot data
                # shows it changes the AS gap.
                self.cube.set_pose(Pose.create_from_pq(xyz))

                # Goal xy in the same envelope; z uniform in [0, max].
                xy_goal_np = self._rng.uniform(
                    -self._cube_spawn_half_size, self._cube_spawn_half_size, size=(b, 2)
                )
                z_goal_np = self._rng.uniform(0.0, self._max_goal_height, size=(b,))
                xy_goal = torch.from_numpy(np.asarray(xy_goal_np, dtype=np.float32))
                z_goal = torch.from_numpy(np.asarray(z_goal_np, dtype=np.float32))
                goal_xyz = torch.zeros((b, 3))
                goal_xyz[:, 0] = xy_goal[:, 0] + self._cube_spawn_center[0]
                goal_xyz[:, 1] = xy_goal[:, 1] + self._cube_spawn_center[1]
                goal_xyz[:, 2] = z_goal + xyz[:, 2]
                self.goal_site.set_pose(Pose.create_from_pq(goal_xyz))
            # Prime the qpos cache so the Jacobian closure has a valid
            # value at the first safety-filter call. The training loop
            # in ``concerto.training.ego_aht.train`` queries the filter
            # *before* the first ``env.step()`` (act-then-step order),
            # so the per-physics-step ``_before_simulation_step`` hook
            # has not yet fired after a reset. Without this primer the
            # closure raises the "qpos not yet cached" guard.
            self._refresh_latest_qpos()

        # ----- Observation / reward / success (panda-routed; ADR-007 §Stage 1b) -----

        @property
        def _panda_agent(self) -> Any:  # noqa: ANN401 - ManiSkill Panda agent has no project type
            """Return the panda_wristcam :class:`Panda` agent instance (ego).

            Routes through ``self.agent.agents_dict`` instead of the
            single-agent ``self.agent`` attribute upstream PickCubeEnv
            relies on. Used by :meth:`evaluate` and
            :meth:`compute_normalized_dense_reward` so the success
            predicates and reward shaping respect the multi-agent rig.
            """
            return self.agent.agents_dict["panda_wristcam"]  # type: ignore[attr-defined, no-any-return]

        def _get_obs_extra(self, info: dict[str, Any]) -> dict[str, Any]:
            """Inject task obs + synthesised force-torque (ADR-007 §Stage 1b).

            Extras include:

            - ``tcp_pose`` — panda end-effector pose (mirrors upstream
              PickCubeEnv).
            - ``goal_pos`` — the goal site's xyz position.
            - ``cube_pose`` / ``cube_to_tcp_pos`` / ``cube_to_goal_pos``
              — present only when ``"state"`` is in the env's
              ``obs_mode`` (matches upstream PickCubeEnv's branch).
            - **``force_torque`` (this env's contribution)** — a 6-D
              Cartesian force vector per panda fingertip
              (``[fx_l, fy_l, fz_l, fx_r, fy_r, fz_r]``), synthesised
              from
              :meth:`PhysxArticulation.get_net_contact_forces([finger_links])`
              plus zero-mean sigma=:data:`SYNTHESISED_FT_NOISE_SIGMA_N`
              Gaussian noise routed through the env's P6-harnessed RNG.
              **Not** a real Panda wrist FT sensor reading; see module
              docstring and ADR-007 §Open questions for the wrist-mount
              ablation deferral.
            """
            del info
            panda = self._panda_agent
            obs: dict[str, Any] = {
                "tcp_pose": panda.tcp_pose.raw_pose,
                "goal_pos": self.goal_site.pose.p,
            }
            if "state" in self.obs_mode:  # type: ignore[operator]
                obs["cube_pose"] = self.cube.pose.raw_pose
                obs["cube_to_tcp_pos"] = self.cube.pose.p - panda.tcp_pose.p
                obs["cube_to_goal_pos"] = self.goal_site.pose.p - self.cube.pose.p
            # Synthesised FT: always injected so the OM channel-filter
            # wrapper has something to keep/zero per condition. The
            # AS conditions' state_dict obs_mode ignores it downstream.
            obs["force_torque"] = self._compute_synthesised_force_torque()
            return obs

        def _compute_synthesised_force_torque(self) -> NDArray[np.float32]:
            """Synthesise the 6-D fingertip-contact FT vector (ADR-007 §Stage 1b)."""
            panda = self._panda_agent
            fingers = ["panda_leftfinger", "panda_rightfinger"]
            try:
                # ``get_net_contact_forces`` returns shape ``(N_envs, n_links, 3)``
                # (Cartesian force per link) at SAPIEN's contact-manager
                # boundary (mani_skill/utils/structs/articulation.py:490).
                forces = panda.robot.get_net_contact_forces(fingers)
            except (RuntimeError, KeyError):
                # Finger links not yet registered (pre-reset state) —
                # return zeros so the wrapper-chain shape stays stable.
                return np.zeros((1, 6), dtype=np.float32)
            f_np = np.asarray(forces.detach().cpu(), dtype=np.float32)  # type: ignore[union-attr]
            # Flatten the per-link Cartesian force into a 6-D vector;
            # add zero-mean Gaussian noise from the P6-harnessed RNG so
            # the OM-hetero gate's signal-to-noise lands at the
            # documented Franka-spec lower bound.
            flat = f_np.reshape(f_np.shape[0], -1)  # (N_envs, 6)
            if self._force_torque_noise_sigma > 0.0:
                noise = self._rng.normal(
                    loc=0.0,
                    scale=self._force_torque_noise_sigma,
                    size=flat.shape,
                ).astype(np.float32)
                flat = flat + noise
            return flat

        def evaluate(self) -> dict[str, Any]:
            """ADR-007 §Stage 1b: multi-agent success predicate (panda-routed)."""
            panda = self._panda_agent
            is_obj_placed = (
                torch.linalg.norm(self.goal_site.pose.p - self.cube.pose.p, axis=1)
                <= self._goal_thresh
            )
            is_grasped = panda.is_grasping(self.cube)
            is_robot_static = panda.is_static(0.2)
            return {
                "success": is_obj_placed & is_robot_static,
                "is_obj_placed": is_obj_placed,
                "is_robot_static": is_robot_static,
                "is_grasped": is_grasped,
            }

        def compute_normalized_dense_reward(
            self,
            obs: Any,  # noqa: ANN401 - ManiSkill obs is an unconstrained dict-of-tensors
            action: Any,  # noqa: ANN401 - ManiSkill action is an unconstrained dict-of-tensors
            info: dict[str, Any],
        ) -> Any:  # noqa: ANN401 - upstream returns torch tensor, no project type
            """ADR-007 §Stage 1b: multi-agent normalised dense reward (panda-routed).

            Matches the upstream ``PickCubeEnv.compute_dense_reward`` shape
            (reaching + grasp + place * grasp + static * placed + 5*success)
            divided by 5. The reward signal is panda-centric (the panda
            does the picking); the fetch / partner is graded implicitly
            through whether it gets out of the way / cooperates on
            placement — explicit fetch-side reward shaping is a P1.04
            follow-up.
            """
            del obs, action
            panda = self._panda_agent
            tcp_to_obj_dist = torch.linalg.norm(self.cube.pose.p - panda.tcp_pose.p, axis=1)
            reaching_reward = 1 - torch.tanh(5 * tcp_to_obj_dist)
            reward = reaching_reward
            is_grasped = info["is_grasped"]
            reward = reward + is_grasped
            obj_to_goal_dist = torch.linalg.norm(self.goal_site.pose.p - self.cube.pose.p, axis=1)
            place_reward = 1 - torch.tanh(5 * obj_to_goal_dist)
            reward = reward + place_reward * is_grasped
            qvel = panda.robot.get_qvel()
            qvel = qvel[..., :-2]  # drop the two gripper-finger DOF
            static_reward = 1 - torch.tanh(5 * torch.linalg.norm(qvel, axis=1))
            reward = reward + static_reward * info["is_obj_placed"]
            reward[info["success"]] = 5
            return reward / 5

        # ----- Per-step qpos refresh (Jacobian closure feed) -----

        def _refresh_latest_qpos(self) -> None:
            """Snapshot every panda uid's arm qpos into :attr:`_latest_qpos`.

            P1.04.5; ADR-007 §Stage 1b. Called from both
            :meth:`_before_simulation_step` (each physics step) and
            :meth:`_initialize_episode` (post-reset primer). The latter
            primes the cache between ``env.reset()`` and the first
            ``env.step()`` so the safety filter — which the training
            loop runs *before* the first step — can read a valid qpos
            without raising the "qpos not yet cached" guard.
            """
            for uid in self.robot_uids:  # type: ignore[union-attr]
                if uid in ("panda_wristcam", "panda_partner"):
                    agent = self.agent.agents_dict[uid]  # type: ignore[attr-defined]
                    full_qpos = agent.robot.get_qpos()
                    # Slice off the 2 gripper-finger joints; the
                    # Jacobian chain to panda_hand_tcp covers the 7 arm
                    # joints only.
                    qpos_arm = np.asarray(
                        full_qpos.detach().cpu(),
                        dtype=np.float64,  # type: ignore[union-attr]
                    ).reshape(-1)[: self._panda_arm_dof]
                    self._latest_qpos[uid] = qpos_arm

        def _before_simulation_step(self) -> None:
            """Cache panda arm qpos before each physics step (Jacobian closure feed).

            The :class:`JacobianControlModel` callable wired by
            :meth:`build_control_models` reads
            ``self._latest_qpos["panda_wristcam"]`` to compute the
            Jacobian at the current configuration. ManiSkill's
            :class:`BaseEnv` calls this hook once per physics step
            before SAPIEN advances; updating the cache here gives the
            closure access to the most recent qpos at the next safety-
            filter invocation. See module docstring on the closure-
            via-env-reference workaround (P1.05.5 contingent
            follow-up).
            """
            super()._before_simulation_step()
            self._refresh_latest_qpos()

        # ----- Public API for the safety-stack consumer -----

        @property
        def condition_id(self) -> str:
            """ADR-007 §Stage 1b: the :attr:`ConditionConfig.condition_id` this env was built for."""  # noqa: E501
            return self._condition_config.condition_id

        @property
        def condition_config(self) -> ConditionConfig:
            """The resolved :class:`ConditionConfig` for read-only consumers (ADR-007 §Stage 1b)."""
            return self._condition_config

        @property
        def ego_uid(self) -> str:
            """ADR-007 §Stage 1b: the ego uid for the active condition.

            First element of :attr:`ConditionConfig.agent_uids` by the
            Phase-0 ``_CONDITION_UIDS`` convention. Consumed by the
            Phase-1 :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            (P1.04) so the factory can read the ego uid from the env
            rather than from a side-table — single source of truth at
            the env layer.
            """
            return self._condition_config.agent_uids[0]

        @property
        def partner_uid(self) -> str:
            """ADR-007 §Stage 1b: the partner uid for the active condition.

            Second element of :attr:`ConditionConfig.agent_uids` (the
            ego is the first; the partner is the second by the Phase-0
            ``_CONDITION_UIDS`` convention from
            :mod:`chamber.benchmarks.stage1_as` /
            :mod:`chamber.benchmarks.stage1_om`). Consumed by the
            Phase-1 :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            (P1.04) to route the per-condition partner-stack dispatch
            (``panda_partner`` for AS-homo; ``fetch`` for AS-hetero /
            OM-*). Single source of truth at the env layer — future
            Stage-2/3 envs (P1.06+) follow the same property contract
            without needing a factory-side dispatch table.
            """
            return self._condition_config.agent_uids[1]

        def build_control_models(self) -> Mapping[str, AgentControlModel]:
            """Return per-uid :class:`AgentControlModel` map (ADR-004 §Decision).

            Plumbed in by
            :func:`chamber.benchmarks.training_runner.build_control_models`
            via the ``getattr(env, "build_control_models", None)``
            delegation route (P1.03 / Task B).

            The Jacobian callable is the closure-via-env-reference
            workaround documented in the module docstring: it captures
            ``self`` and reads ``self._latest_qpos["panda_wristcam"]``
            at call time. The closure stays valid across the 20
            evaluation episodes within each ``(seed, condition)`` cell
            because
            :func:`chamber.benchmarks.stage1_as._run_axis_with_factories`
            builds the env once per cell and reuses it
            (``stage1_as.py:253-275``).

            For the AS-homo condition both panda uids share the same
            Jacobian provider (their kinematic chains are identical;
            see :class:`chamber.agents.panda_jacobian.PandaJacobianProvider`'s
            choice of URDF). The closure resolves to the correct uid's
            latest qpos at call time.
            """
            # Source the panda action dim from the action space (same
            # path the fetch dim is sourced from below). Falls back to
            # 8 — the ``pd_joint_delta_pos`` default (7 arm + 1 mimic-
            # gripper). The closure captures this value once at
            # build-time, which is correct: action-space shape is fixed
            # over an env's lifetime.
            panda_action_dim = 8
            action_space = self.action_space
            if isinstance(action_space, gym.spaces.Dict):
                sub_panda = action_space.spaces.get("panda_wristcam")
                if isinstance(sub_panda, gym.spaces.Box) and sub_panda.shape is not None:
                    panda_action_dim = int(sub_panda.shape[0])

            def _panda_jacobian(snap: AgentSnapshot) -> NDArray[np.float64]:
                # Pull the live qpos from the env's cache; the closure
                # captures self by reference so the latest pre-step
                # value is always used.
                qpos = self._latest_qpos.get("panda_wristcam")
                if qpos is None:
                    msg = (
                        "Stage1PickPlaceEnv: panda_wristcam qpos not yet cached. "
                        "The JacobianControlModel closure was invoked before "
                        "the first env.step() — reset the env first."
                    )
                    raise RuntimeError(msg)
                # PandaJacobianProvider returns the linear (3, 7) Jacobian
                # for the 7-joint chain to ``panda_hand_tcp``. The
                # ``pd_joint_delta_pos`` control mode's panda action is
                # 8-D (7 arm deltas + 1 mimic-gripper delta — see the
                # ``panda_action_dim=8`` default in
                # :func:`build_control_models_for_condition`). Pad with
                # a zero column so the JacobianControlModel matmul
                # ``jac @ action`` composes: the mimic-gripper joint is
                # a sibling of ``panda_hand``, not in the TCP kinematic
                # chain, so its contribution to TCP Cartesian position
                # is zero by construction.
                jac_3x7 = self._jacobian_provider(snap, qpos)
                jac_padded = np.zeros((CARTESIAN_POSITION_DIM, panda_action_dim), dtype=np.float64)
                jac_padded[:, : self._panda_arm_dof] = jac_3x7
                return jac_padded

            fetch_uid_dim: int | None = None
            partner_uid = self._condition_config.agent_uids[1]
            if partner_uid == "fetch" and isinstance(action_space, gym.spaces.Dict):
                sub = action_space.spaces.get(partner_uid)
                if isinstance(sub, gym.spaces.Box) and sub.shape is not None:
                    fetch_uid_dim = int(sub.shape[0])
            return build_control_models_for_condition(
                self._condition_config.condition_id,
                jacobian_fn=_panda_jacobian,
                fetch_action_dim=fetch_uid_dim,
                panda_action_dim=panda_action_dim,
            )

        # ----- P1.04.5: safety-stack snapshot + bounds wiring -----

        def build_agent_snapshots(self) -> dict[str, AgentSnapshot]:
            """Build per-uid Cartesian :class:`AgentSnapshot` map (P1.04.5; ADR-007 §Stage 1b).

            Reads each agent's Cartesian position (TCP pose for the
            pandas; base-link pose for the fetch) and computes velocity
            via numerical-difference against the cached
            :attr:`_prev_positions` from the previous step. First-step
            velocity defaults to zero on every fresh episode (the
            agents start at rest by construction; the cache is cleared
            in :meth:`_initialize_episode`).

            Used by the P1.04.5 safety-stack integration: the
            :func:`concerto.training.ego_aht.train` loop calls this
            once per step (when the safety filter is wired) to build
            the snapshot map the CBF-QP outer filter consumes via the
            ``partner_predicted_states`` kwarg + the conformal update's
            ``snaps_now`` / ``snaps_prev`` pair.

            Returns:
                ``{uid: AgentSnapshot(position, velocity, radius)}`` for
                every uid in :attr:`ConditionConfig.agent_uids`. Position
                is the 3-D Cartesian centre (panda TCP for the pandas;
                fetch base origin for the fetch); velocity is the
                numerical-difference against the previous call (zero on
                first call after :meth:`_initialize_episode`); radius
                is from :data:`PANDA_SAFETY_RADIUS_M` /
                :data:`FETCH_SAFETY_RADIUS_M` (validated at env
                construction by :func:`_validate_safety_radii`).
            """
            snapshots: dict[str, AgentSnapshot] = {}
            for uid in self._condition_config.agent_uids:
                agent = self.agent.agents_dict[uid]  # type: ignore[attr-defined]
                if uid in ("panda_wristcam", "panda_partner"):
                    # Panda: TCP pose. Squeeze the (num_envs=1, 3)
                    # batch dim to a flat 3-vector for the AgentSnapshot.
                    pos_raw = agent.tcp_pose.p
                else:
                    # Fetch: base-link pose (the robot's root pose).
                    pos_raw = agent.robot.get_pose().p
                position = np.asarray(
                    pos_raw.detach().cpu(),
                    dtype=np.float64,  # type: ignore[union-attr]
                ).reshape(-1)
                # ManiSkill emits (num_envs, 3) for each agent pose; the
                # ravel above flattens (1, 3) -> (3,) for num_envs=1, but
                # multi-env runs would emit (num_envs, 3) which ravels to
                # (3*num_envs,). The Stage-1b cell construction pins
                # num_envs=1 (factory contract); the slice is a defensive
                # guard against future multi-env builds.
                _cartesian_dim = 3  # ADR-004 §Decision: AgentSnapshot is 3-D Cartesian.
                if position.shape[0] > _cartesian_dim:
                    position = position[:_cartesian_dim]
                prev = self._prev_positions.get(uid)
                if prev is None:
                    velocity = np.zeros(3, dtype=np.float64)
                else:
                    # Numerical-difference velocity. The dt is the
                    # control_timestep; for the AgentSnapshot's contract
                    # this is the per-step Cartesian velocity. The
                    # safety filter's conformal update consumes this
                    # value alongside dt to compute the predicted
                    # pose; the consistency is the caller's
                    # responsibility (see
                    # chamber.benchmarks.training_runner.build_safety_dt).
                    velocity = (position - prev) / float(self.control_timestep)
                self._prev_positions[uid] = position
                snapshots[uid] = AgentSnapshot(
                    position=position,
                    velocity=velocity,
                    radius=self._safety_radii[uid],
                )
            return snapshots

        def build_bounds(self) -> Bounds:
            """Build the per-task :class:`Bounds` envelope (P1.04.5; ADR-006 §Decision).

            Sourced from the module-level constants pinned in P1.03
            (cartesian_accel_capacity) plus sensible Phase-1 defaults
            for the remaining fields. Consumed by the safety filter +
            the audit-gate predicate A's RHS (``cartesian_accel_capacity``).
            """
            return Bounds(
                action_linf_component=0.1,
                cartesian_accel_capacity=PANDA_CARTESIAN_ACCEL_CAPACITY_MS2,
                action_rate=10.0,
                comm_latency_ms=0.0,
                force_limit=50.0,
            )

    inner = Stage1PickPlaceEnv(
        condition_id=condition_id,
        episode_length=episode_length,
        root_seed=root_seed,
        num_envs=num_envs,
        render_mode=render_mode,
        render_backend=render_backend,
        force_torque_noise_sigma=force_torque_noise_sigma,
    )
    # AS conditions: ManiSkill v3 obs_mode="state_dict" emits per-agent
    # Dict(qpos, qvel) but EgoPPOTrainer.from_config reads
    # obs["agent"][ego_uid]["state"]. The AS synthesizer adds that key;
    # pass-through for OM conditions. The OM channel filter then masks
    # per-condition for OM-vision-only; pass-through for AS and OM-hetero.
    # Wired in fixed order so OM callers of the factory don't have to
    # wrap manually (issue #165 §Proposed scope 4).
    from chamber.envs.stage1_obs_filter import (
        Stage1ASStateSynthesizer,
        Stage1OMChannelFilter,
    )

    return Stage1OMChannelFilter(Stage1ASStateSynthesizer(inner))

resolve_condition

resolve_condition(condition_id: str) -> ConditionConfig

Resolve a pre-registered condition_id to its config (ADR-007 §Stage 1b).

Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the four-condition coverage and the ValueError raise on bogus IDs.

Parameters:

  • condition_id (str) –

    One of the four verbatim Stage-1 pre-registration strings (see spikes/preregistration/{AS,OM}.yaml).

Returns:

Raises:

  • ValueError

    If condition_id is not one of the four Stage-1 pre-registration strings. The message names the four valid options and cites ADR-007 §Discipline (the project anti-pattern for editing prereg YAMLs).

Source code in src/chamber/envs/stage1_pickplace.py
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
def resolve_condition(condition_id: str) -> ConditionConfig:
    """Resolve a pre-registered ``condition_id`` to its config (ADR-007 §Stage 1b).

    Pure Tier-1 function — no SAPIEN dependency. Tier-1 tests pin the
    four-condition coverage and the ``ValueError`` raise on bogus IDs.

    Args:
        condition_id: One of the four verbatim Stage-1 pre-registration
            strings (see ``spikes/preregistration/{AS,OM}.yaml``).

    Returns:
        Resolved :class:`ConditionConfig` carrying the
        ``agent_uids`` tuple, env-level ``obs_mode`` superset, and the
        boolean flags downstream wrappers / scripts read.

    Raises:
        ValueError: If ``condition_id`` is not one of the four
            Stage-1 pre-registration strings. The message names the
            four valid options and cites ADR-007 §Discipline (the
            project anti-pattern for editing prereg YAMLs).
    """
    try:
        return _CONDITION_TABLE[condition_id]
    except KeyError as exc:
        msg = (
            f"Stage1PickPlaceEnv: condition_id {condition_id!r} is not one of the "
            f"four Stage-1 pre-registration strings {sorted(_CONDITION_TABLE)!r}. "
            "If the prereg YAML has been edited, re-issue with a new git_tag "
            "per ADR-007 §Discipline; do not rename condition_ids in code."
        )
        raise ValueError(msg) from exc

Stage-1 observation wrappers (ADR-007 §Stage 1b).

Two wrappers, applied in fixed order by :func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env:

  1. :class:Stage1ASStateSynthesizer — synthesises obs["agent"][uid]["state"] as concat(qpos, qvel) for AS conditions, bridging ManiSkill v3 obs_mode="state_dict" (which emits per-agent Dict(qpos, qvel, ...)) to :class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer's obs["agent"][ego_uid]["state"] contract. Pass-through for OM conditions and for envs without a recognised condition_id.
  2. :class:Stage1OMChannelFilter — per-condition channel keep-set for the OM axis. The Stage-1 AS conditions don't filter at all (they use obs_mode="state_dict" which already excludes the camera data the OM axis differentiates on); the wrapper passes those conditions through untouched.

For the two OM conditions:

  • stage1_pickplace_vision_only — keep obs["sensor_data"] (the RGB-D camera channels) plus the task-extra fields the policy needs to be functional at the Stage-1b episode budget (tcp_pose, goal_pos). Zero-mask the agent-proprio sub-dicts under obs["agent"][uid] and the synthesised obs["extra"]["force_torque"] channel.
  • stage1_pickplace_vision_plus_force_torque_plus_proprio — pass-through (no filtering); the policy sees everything.

ManiSkill v3 obs layout reminder (mani_skill/envs/sapien_env.py:546-630): obs["agent"][uid] carries proprio (state, joint_pos, joint_vel); obs["extra"] carries task-level fields from :meth:_get_obs_extra; obs["sensor_data"] carries camera channels when the env's obs_mode is image-bearing.

Shape preservation: zero-masked channels become zero arrays of the same shape and dtype (mirrors :class:chamber.envs.texture_filter.TextureFilterObsWrapper's pattern). This keeps downstream policy networks oblivious to which modalities a given uid actually receives at runtime — only the values change across conditions, not the shapes.

References: - ADR-007 §Stage 1b (OM axis spec); spikes/preregistration/OM.yaml (the condition_id strings this wrapper resolves). - :class:chamber.envs.texture_filter.TextureFilterObsWrapper (the parallel pattern for the Stage-0 smoke env's per-uid keep-set). - :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv (the inner env that supplies the synthesised FT channel and the condition_id attribute).

Stage1ASStateSynthesizer

Bases: ObservationWrapper

Synthesise obs["agent"][uid]["state"] for AS conditions (ADR-007 §Stage 1b).

ManiSkill v3.0.1 obs_mode="state_dict" (used by the Stage-1 AS conditions) emits per-agent Dict(qpos: Box, qvel: Box, ...) with no top-level "state" key. But :meth:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config reads env.observation_space["agent"][ego_uid]["state"] as a 1-D Box and derives obs_dim = ego_state_space.shape[0]. ManiSkill v3 obs_mode="state" flattens concat(qpos, qvel) into a single "state" key with that shape; this wrapper replicates that concat in obs-space so callers can keep obs_mode="state_dict" for the synthesised-channel injection (force_torque, tcp_pose, ...) the OM-hetero condition needs while still satisfying the trainer's 1-D state contract.

Pass-through for OM conditions (is_om_condition=True — those go through :class:Stage1OMChannelFilter instead) and for envs whose condition_id is missing or unrecognised (Tier-1 safety: keeps the wrapper a no-op for envs that don't satisfy the Stage-1b condition_id contract).

Parameters:

  • env (Env[Any, Any]) –

    Inner env exposing a condition_id attribute.

Raises:

  • TypeError

    If env.observation_space is not a :class:gym.spaces.Dict.

Source code in src/chamber/envs/stage1_obs_filter.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
class Stage1ASStateSynthesizer(gym.ObservationWrapper):  # type: ignore[type-arg]
    """Synthesise ``obs["agent"][uid]["state"]`` for AS conditions (ADR-007 §Stage 1b).

    ManiSkill v3.0.1 ``obs_mode="state_dict"`` (used by the Stage-1 AS
    conditions) emits per-agent ``Dict(qpos: Box, qvel: Box, ...)`` with
    no top-level ``"state"`` key. But
    :meth:`chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config`
    reads ``env.observation_space["agent"][ego_uid]["state"]`` as a 1-D
    Box and derives ``obs_dim = ego_state_space.shape[0]``. ManiSkill v3
    ``obs_mode="state"`` flattens ``concat(qpos, qvel)`` into a single
    ``"state"`` key with that shape; this wrapper replicates that concat
    in obs-space so callers can keep ``obs_mode="state_dict"`` for the
    synthesised-channel injection (``force_torque``, ``tcp_pose``, ...)
    the OM-hetero condition needs while still satisfying the trainer's
    1-D ``state`` contract.

    Pass-through for OM conditions (``is_om_condition=True`` — those go
    through :class:`Stage1OMChannelFilter` instead) and for envs whose
    ``condition_id`` is missing or unrecognised (Tier-1 safety: keeps
    the wrapper a no-op for envs that don't satisfy the Stage-1b
    ``condition_id`` contract).

    Args:
        env: Inner env exposing a ``condition_id`` attribute.

    Raises:
        TypeError: If ``env.observation_space`` is not a
            :class:`gym.spaces.Dict`.
    """

    def __init__(self, env: gym.Env[Any, Any]) -> None:
        """Detect AS-condition envs at construction (ADR-007 §Stage 1b)."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            msg = (
                "Stage1ASStateSynthesizer requires a gym.spaces.Dict observation "
                f"space; got {type(env.observation_space).__name__}."
            )
            raise TypeError(msg)
        condition_id = getattr(env, "condition_id", None)
        if condition_id is None and hasattr(env, "get_wrapper_attr"):
            try:
                condition_id = env.get_wrapper_attr("condition_id")
            except AttributeError:
                condition_id = None
        # Local import to avoid the chamber.envs.stage1_pickplace ↔
        # chamber.envs.stage1_obs_filter circular import that arises now
        # that make_stage1_pickplace_env applies this wrapper.
        from chamber.envs.stage1_pickplace import resolve_condition

        config = None
        if isinstance(condition_id, str):
            try:
                config = resolve_condition(condition_id)
            except ValueError:
                config = None
        self._active: bool = config is not None and not config.is_om_condition
        if self._active:
            self.observation_space = self._build_observation_space(env.observation_space)

    @staticmethod
    def _build_observation_space(inner: gym.spaces.Dict) -> gym.spaces.Dict:
        """Inject a 1-D ``state`` Box per agent under ``obs["agent"][uid]``.

        The injected Box's shape is ``(qpos_dim + qvel_dim,)`` —
        flattened to 1-D so :attr:`gym.spaces.Box.shape[0]` matches the
        ``obs_dim`` HARL's MLPBase reads at construction.
        """
        agent = inner.spaces.get("agent")
        if not isinstance(agent, gym.spaces.Dict):
            return inner
        new_agent_spaces: dict[str, gym.spaces.Space[Any]] = {}
        for uid, sub in agent.spaces.items():
            if isinstance(sub, gym.spaces.Dict):
                qpos = sub.spaces.get("qpos")
                qvel = sub.spaces.get("qvel")
                if isinstance(qpos, gym.spaces.Box) and isinstance(qvel, gym.spaces.Box):
                    state_dim = int(np.prod(qpos.shape)) + int(np.prod(qvel.shape))
                    state_box = gym.spaces.Box(
                        low=-np.inf,
                        high=np.inf,
                        shape=(state_dim,),
                        dtype=np.float32,
                    )
                    augmented = dict(sub.spaces)
                    augmented["state"] = state_box
                    new_agent_spaces[uid] = gym.spaces.Dict(augmented)
                    continue
            new_agent_spaces[uid] = sub
        new_spaces = dict(inner.spaces)
        new_spaces["agent"] = gym.spaces.Dict(new_agent_spaces)
        return gym.spaces.Dict(new_spaces)

    def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
        """Inject synthesised ``state`` per agent (ADR-007 §Stage 1b)."""
        if not self._active:
            return observation
        agent = observation.get("agent")
        if not isinstance(agent, dict):
            return observation
        new_agent: dict[str, Any] = {}
        for uid, sub in agent.items():
            if isinstance(sub, dict) and "qpos" in sub and "qvel" in sub:
                augmented = dict(sub)
                augmented["state"] = _flat_state_concat(sub["qpos"], sub["qvel"])
                new_agent[uid] = augmented
            else:
                new_agent[uid] = sub
        out = dict(observation)
        out["agent"] = new_agent
        return out

    def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors gym.Wrapper's inherited __getattr__ signature
        """Forward attribute access to the inner env (Tier-2 caller contract).

        Gymnasium 1.3 removed :class:`gym.Wrapper`'s implicit
        ``__getattr__``; existing P1.03 callers of
        :func:`make_stage1_pickplace_env` read SAPIEN-env attributes
        directly (``env.agent``, ``env._jacobian_provider``,
        ``env.condition_config``...). Forwarding here keeps that
        contract without forcing every caller to switch to
        :meth:`gym.Wrapper.get_wrapper_attr`.

        ``__getattr__`` only fires when normal attribute lookup misses;
        ``self.env`` is set by :meth:`gym.Wrapper.__init__` before any
        downstream access, so the delegation is safe. The ``env``-name
        guard is a defensive backstop against recursive lookup on the
        env attr itself (only reachable in pathological half-init
        paths).
        """
        if name == "env":
            raise AttributeError(name)
        return getattr(self.env, name)

__getattr__

__getattr__(name: str) -> Any

Forward attribute access to the inner env (Tier-2 caller contract).

Gymnasium 1.3 removed :class:gym.Wrapper's implicit __getattr__; existing P1.03 callers of :func:make_stage1_pickplace_env read SAPIEN-env attributes directly (env.agent, env._jacobian_provider, env.condition_config...). Forwarding here keeps that contract without forcing every caller to switch to :meth:gym.Wrapper.get_wrapper_attr.

__getattr__ only fires when normal attribute lookup misses; self.env is set by :meth:gym.Wrapper.__init__ before any downstream access, so the delegation is safe. The env-name guard is a defensive backstop against recursive lookup on the env attr itself (only reachable in pathological half-init paths).

Source code in src/chamber/envs/stage1_obs_filter.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors gym.Wrapper's inherited __getattr__ signature
    """Forward attribute access to the inner env (Tier-2 caller contract).

    Gymnasium 1.3 removed :class:`gym.Wrapper`'s implicit
    ``__getattr__``; existing P1.03 callers of
    :func:`make_stage1_pickplace_env` read SAPIEN-env attributes
    directly (``env.agent``, ``env._jacobian_provider``,
    ``env.condition_config``...). Forwarding here keeps that
    contract without forcing every caller to switch to
    :meth:`gym.Wrapper.get_wrapper_attr`.

    ``__getattr__`` only fires when normal attribute lookup misses;
    ``self.env`` is set by :meth:`gym.Wrapper.__init__` before any
    downstream access, so the delegation is safe. The ``env``-name
    guard is a defensive backstop against recursive lookup on the
    env attr itself (only reachable in pathological half-init
    paths).
    """
    if name == "env":
        raise AttributeError(name)
    return getattr(self.env, name)

__init__

__init__(env: Env[Any, Any]) -> None

Detect AS-condition envs at construction (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def __init__(self, env: gym.Env[Any, Any]) -> None:
    """Detect AS-condition envs at construction (ADR-007 §Stage 1b)."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        msg = (
            "Stage1ASStateSynthesizer requires a gym.spaces.Dict observation "
            f"space; got {type(env.observation_space).__name__}."
        )
        raise TypeError(msg)
    condition_id = getattr(env, "condition_id", None)
    if condition_id is None and hasattr(env, "get_wrapper_attr"):
        try:
            condition_id = env.get_wrapper_attr("condition_id")
        except AttributeError:
            condition_id = None
    # Local import to avoid the chamber.envs.stage1_pickplace ↔
    # chamber.envs.stage1_obs_filter circular import that arises now
    # that make_stage1_pickplace_env applies this wrapper.
    from chamber.envs.stage1_pickplace import resolve_condition

    config = None
    if isinstance(condition_id, str):
        try:
            config = resolve_condition(condition_id)
        except ValueError:
            config = None
    self._active: bool = config is not None and not config.is_om_condition
    if self._active:
        self.observation_space = self._build_observation_space(env.observation_space)

observation

observation(observation: dict[str, Any]) -> dict[str, Any]

Inject synthesised state per agent (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
    """Inject synthesised ``state`` per agent (ADR-007 §Stage 1b)."""
    if not self._active:
        return observation
    agent = observation.get("agent")
    if not isinstance(agent, dict):
        return observation
    new_agent: dict[str, Any] = {}
    for uid, sub in agent.items():
        if isinstance(sub, dict) and "qpos" in sub and "qvel" in sub:
            augmented = dict(sub)
            augmented["state"] = _flat_state_concat(sub["qpos"], sub["qvel"])
            new_agent[uid] = augmented
        else:
            new_agent[uid] = sub
    out = dict(observation)
    out["agent"] = new_agent
    return out

Stage1OMChannelFilter

Bases: ObservationWrapper

OM-axis per-condition channel filter (ADR-007 §Stage 1b).

Reads the inner env's condition_id at construction time and captures the resulting filter policy. The wrapper does not re-read condition_id on every step — Stage-1b envs are built once per (seed, condition) cell (see stage1_as.py:253-275) and the condition is fixed for the env's lifetime.

Parameters:

  • env (Env[Any, Any]) –

    Inner env exposing a condition_id attribute. The :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv class satisfies this; the wrapper falls back to a pass-through identity if condition_id is missing or not one of the two OM strings.

Raises:

  • TypeError

    If the inner env's observation space is not a :class:gym.spaces.Dict (the wrapper relies on dict-shaped obs).

Source code in src/chamber/envs/stage1_obs_filter.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
class Stage1OMChannelFilter(gym.ObservationWrapper):  # type: ignore[type-arg]
    """OM-axis per-condition channel filter (ADR-007 §Stage 1b).

    Reads the inner env's ``condition_id`` at construction time and
    captures the resulting filter policy. The wrapper does not re-read
    ``condition_id`` on every step — Stage-1b envs are built once per
    ``(seed, condition)`` cell (see ``stage1_as.py:253-275``) and the
    condition is fixed for the env's lifetime.

    Args:
        env: Inner env exposing a ``condition_id`` attribute. The
            :class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv`
            class satisfies this; the wrapper falls back to a
            pass-through identity if ``condition_id`` is missing or
            not one of the two OM strings.

    Raises:
        TypeError: If the inner env's observation space is not a
            :class:`gym.spaces.Dict` (the wrapper relies on dict-shaped
            obs).
    """

    def __init__(self, env: gym.Env[Any, Any]) -> None:
        """Store the per-condition filter policy (ADR-007 §Stage 1b)."""
        super().__init__(env)
        if not isinstance(env.observation_space, gym.spaces.Dict):
            msg = (
                "Stage1OMChannelFilter requires a gym.spaces.Dict observation "
                f"space; got {type(env.observation_space).__name__}."
            )
            raise TypeError(msg)
        # Read condition_id once at construction; the env-per-cell
        # cadence makes this safe and saves a hot-path attr lookup.
        condition_id = getattr(env, "condition_id", None)
        if condition_id is None and hasattr(env, "get_wrapper_attr"):
            try:
                condition_id = env.get_wrapper_attr("condition_id")
            except AttributeError:
                condition_id = None
        self._condition_id: str | None = condition_id
        self._filter_active: bool = condition_id in _FILTERED_CONDITIONS

    @property
    def condition_id(self) -> str | None:
        """ADR-007 §Stage 1b OM axis: the condition_id resolved at construction."""
        return self._condition_id

    def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
        """Apply the per-condition channel mask (ADR-007 §Stage 1b).

        OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
        and the ``obs["extra"]["force_torque"]`` channel; keeps
        ``obs["extra"]["tcp_pose"]`` + ``obs["extra"]["goal_pos"]``
        (the minimal task-relevant fields) plus ``obs["sensor_data"]``
        (RGB-D camera) untouched. Other conditions pass through.
        """
        if not self._filter_active:
            return observation
        out = dict(observation)
        # Zero-mask per-uid agent proprio (the "no proprioception" half
        # of the vision-only condition_id semantics).
        agent = observation.get("agent")
        if isinstance(agent, dict):
            zeroed_agent: dict[str, Any] = {}
            for uid, sub in agent.items():
                if isinstance(sub, dict):
                    zeroed_agent[uid] = {
                        ch: np.zeros_like(val) if hasattr(val, "shape") else val
                        for ch, val in sub.items()
                    }
                else:
                    zeroed_agent[uid] = sub
            out["agent"] = zeroed_agent
        # Zero-mask task extras NOT in the vision-only keep-set (the
        # "no force-torque" half of the vision-only semantics).
        extra = observation.get("extra")
        if isinstance(extra, dict):
            filtered_extra: dict[str, Any] = {}
            for ch, val in extra.items():
                if ch in _OM_VISION_ONLY_EXTRA_KEEP:
                    filtered_extra[ch] = val
                elif hasattr(val, "shape"):
                    filtered_extra[ch] = np.zeros_like(val)
                else:
                    filtered_extra[ch] = val
            out["extra"] = filtered_extra
        return out

    def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors Stage1ASStateSynthesizer's forwarding contract
        """Forward attribute access to the inner env (Tier-2 caller contract).

        Mirrors :meth:`Stage1ASStateSynthesizer.__getattr__`: Gymnasium 1.3
        removed :class:`gym.Wrapper`'s implicit forwarding, so callers of
        :func:`chamber.envs.stage1_pickplace.make_stage1_pickplace_env`
        — which now applies this wrapper on top of the AS synthesizer —
        would otherwise lose access to SAPIEN-env attributes such as
        ``env.agent``, ``env.obs_mode``, ``env.condition_config``.
        """
        if name == "env":
            raise AttributeError(name)
        return getattr(self.env, name)

condition_id property

condition_id: str | None

ADR-007 §Stage 1b OM axis: the condition_id resolved at construction.

__getattr__

__getattr__(name: str) -> Any

Forward attribute access to the inner env (Tier-2 caller contract).

Mirrors :meth:Stage1ASStateSynthesizer.__getattr__: Gymnasium 1.3 removed :class:gym.Wrapper's implicit forwarding, so callers of :func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env — which now applies this wrapper on top of the AS synthesizer — would otherwise lose access to SAPIEN-env attributes such as env.agent, env.obs_mode, env.condition_config.

Source code in src/chamber/envs/stage1_obs_filter.py
335
336
337
338
339
340
341
342
343
344
345
346
347
def __getattr__(self, name: str) -> Any:  # noqa: ANN401 - mirrors Stage1ASStateSynthesizer's forwarding contract
    """Forward attribute access to the inner env (Tier-2 caller contract).

    Mirrors :meth:`Stage1ASStateSynthesizer.__getattr__`: Gymnasium 1.3
    removed :class:`gym.Wrapper`'s implicit forwarding, so callers of
    :func:`chamber.envs.stage1_pickplace.make_stage1_pickplace_env`
    — which now applies this wrapper on top of the AS synthesizer —
    would otherwise lose access to SAPIEN-env attributes such as
    ``env.agent``, ``env.obs_mode``, ``env.condition_config``.
    """
    if name == "env":
        raise AttributeError(name)
    return getattr(self.env, name)

__init__

__init__(env: Env[Any, Any]) -> None

Store the per-condition filter policy (ADR-007 §Stage 1b).

Source code in src/chamber/envs/stage1_obs_filter.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def __init__(self, env: gym.Env[Any, Any]) -> None:
    """Store the per-condition filter policy (ADR-007 §Stage 1b)."""
    super().__init__(env)
    if not isinstance(env.observation_space, gym.spaces.Dict):
        msg = (
            "Stage1OMChannelFilter requires a gym.spaces.Dict observation "
            f"space; got {type(env.observation_space).__name__}."
        )
        raise TypeError(msg)
    # Read condition_id once at construction; the env-per-cell
    # cadence makes this safe and saves a hot-path attr lookup.
    condition_id = getattr(env, "condition_id", None)
    if condition_id is None and hasattr(env, "get_wrapper_attr"):
        try:
            condition_id = env.get_wrapper_attr("condition_id")
        except AttributeError:
            condition_id = None
    self._condition_id: str | None = condition_id
    self._filter_active: bool = condition_id in _FILTERED_CONDITIONS

observation

observation(observation: dict[str, Any]) -> dict[str, Any]

Apply the per-condition channel mask (ADR-007 §Stage 1b).

OM-vision-only zero-masks the per-uid agent-proprio sub-dicts and the obs["extra"]["force_torque"] channel; keeps obs["extra"]["tcp_pose"] + obs["extra"]["goal_pos"] (the minimal task-relevant fields) plus obs["sensor_data"] (RGB-D camera) untouched. Other conditions pass through.

Source code in src/chamber/envs/stage1_obs_filter.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def observation(self, observation: dict[str, Any]) -> dict[str, Any]:  # type: ignore[override]
    """Apply the per-condition channel mask (ADR-007 §Stage 1b).

    OM-vision-only zero-masks the per-uid agent-proprio sub-dicts
    and the ``obs["extra"]["force_torque"]`` channel; keeps
    ``obs["extra"]["tcp_pose"]`` + ``obs["extra"]["goal_pos"]``
    (the minimal task-relevant fields) plus ``obs["sensor_data"]``
    (RGB-D camera) untouched. Other conditions pass through.
    """
    if not self._filter_active:
        return observation
    out = dict(observation)
    # Zero-mask per-uid agent proprio (the "no proprioception" half
    # of the vision-only condition_id semantics).
    agent = observation.get("agent")
    if isinstance(agent, dict):
        zeroed_agent: dict[str, Any] = {}
        for uid, sub in agent.items():
            if isinstance(sub, dict):
                zeroed_agent[uid] = {
                    ch: np.zeros_like(val) if hasattr(val, "shape") else val
                    for ch, val in sub.items()
                }
            else:
                zeroed_agent[uid] = sub
        out["agent"] = zeroed_agent
    # Zero-mask task extras NOT in the vision-only keep-set (the
    # "no force-torque" half of the vision-only semantics).
    extra = observation.get("extra")
    if isinstance(extra, dict):
        filtered_extra: dict[str, Any] = {}
        for ch, val in extra.items():
            if ch in _OM_VISION_ONLY_EXTRA_KEEP:
                filtered_extra[ch] = val
            elif hasattr(val, "shape"):
                filtered_extra[ch] = np.zeros_like(val)
            else:
                filtered_extra[ch] = val
        out["extra"] = filtered_extra
    return out

Custom errors for CHAMBER environment wrappers.

ADR-001 §Risks — wrappers raise this error rather than silently falling back when a wrapped env does not satisfy the wrapper's structural assumptions.

ChamberEnvCompatibilityError

Bases: RuntimeError

Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.

ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate the gate; failing loudly forces ADR re-review.

Source code in src/chamber/envs/errors.py
11
12
13
14
15
16
class ChamberEnvCompatibilityError(RuntimeError):
    """Raised when a wrapped env does not satisfy CHAMBER wrapper assumptions.

    ADR-001 §Risks: a silent fallback during a Stage-0 spike would invalidate
    the gate; failing loudly forces ADR re-review.
    """

chamber.agents

CHAMBER-side agent extensions (ADR-001 §Decision; ADR-007 §Stage 1b).

Hosts robot-agent classes the project needs in addition to ManiSkill v3's built-in registry. Currently:

  • :class:PandaPartner — a 7-DOF Franka with uid="panda_partner", used by the Stage-1 AS-homogeneous condition pair (panda_wristcam, panda_partner) so the two-panda case resolves to distinct uid strings under the :func:chamber.envs._sapien_compat.load_agent_with_bare_uids strip- suffix path (otherwise ("panda_wristcam", "panda_wristcam") would collapse to a single dict entry post-strip).
  • :class:PandaJacobianProvider — wraps :mod:pytorch_kinematics to expose the 3x7 linear end-effector Jacobian :math:J(q) \\in \\mathbb{R}^{3 \\times 7} that :class:concerto.safety.api.JacobianControlModel needs (ADR-004 §Decision; spike_004A §Per-agent control model).

Wrapper-only discipline (ADR-001 §Decision (wrapper-only)): :class:PandaPartner is registered via the public mani_skill.agents.registration.register_agent decorator. No mani_skill/ source patching; no _* private imports.

Tier-1 safety: this module's top-level imports tolerate missing ManiSkill/SAPIEN at module-load time. The :class:PandaPartner registration side-effect is deferred behind a try/except so python -c "import chamber.agents" succeeds on a Vulkan-less host — :class:chamber.envs.errors.ChamberEnvCompatibilityError only fires at env-construction time, never at module-import time (mirrors the stage0_smoke pattern; ADR-001 §Risks).

PandaJacobianProvider

Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).

Parses the panda URDF once at construction (via :func:pytorch_kinematics.build_serial_chain_from_urdf) and caches the resulting :class:pytorch_kinematics.SerialChain. Each :meth:__call__ evaluates :math:J(q) on the cached chain.

Parameters:

  • urdf_path (str | None, default: None ) –

    Absolute path to the panda URDF on disk. The default (None) selects the wristcam URDF ({PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf) since the Stage-1b panda is by convention the wristcam variant; pass panda_v2.urdf to target the no-camera URDF used by :class:PandaPartner.

  • ee_link (str, default: 'panda_hand_tcp' ) –

    Name of the end-effector link the chain terminates at. Defaults to "panda_hand_tcp" — the tool-center-point link the ManiSkill PickCube task uses for self.agent.tcp_pose (see mani_skill/envs/tasks/tabletop/pick_cube.py:136). The serial chain extracted with this end-link has exactly 7 joints (the 7 arm joints; the two gripper-finger joints are siblings of panda_hand, not in the chain).

Raises:

  • FileNotFoundError

    If urdf_path does not exist on disk.

  • ValueError

    If :func:pytorch_kinematics.build_serial_chain_from_urdf fails to extract a chain ending at ee_link.

Source code in src/chamber/agents/panda_jacobian.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
class PandaJacobianProvider:
    """Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).

    Parses the panda URDF once at construction (via
    :func:`pytorch_kinematics.build_serial_chain_from_urdf`) and caches
    the resulting :class:`pytorch_kinematics.SerialChain`. Each
    :meth:`__call__` evaluates :math:`J(q)` on the cached chain.

    Args:
        urdf_path: Absolute path to the panda URDF on disk. The default
            (``None``) selects the wristcam URDF
            (``{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf``) since
            the Stage-1b panda is by convention the wristcam variant;
            pass ``panda_v2.urdf`` to target the no-camera URDF used by
            :class:`PandaPartner`.
        ee_link: Name of the end-effector link the chain terminates at.
            Defaults to ``"panda_hand_tcp"`` — the tool-center-point
            link the ManiSkill PickCube task uses for
            ``self.agent.tcp_pose`` (see
            ``mani_skill/envs/tasks/tabletop/pick_cube.py:136``). The
            serial chain extracted with this end-link has exactly 7
            joints (the 7 arm joints; the two gripper-finger joints are
            siblings of ``panda_hand``, not in the chain).

    Raises:
        FileNotFoundError: If ``urdf_path`` does not exist on disk.
        ValueError: If :func:`pytorch_kinematics.build_serial_chain_from_urdf`
            fails to extract a chain ending at ``ee_link``.
    """

    def __init__(
        self,
        urdf_path: str | None = None,
        *,
        ee_link: str = "panda_hand_tcp",
    ) -> None:
        """Parse the URDF and cache the serial chain (ADR-007 §Stage 1b)."""
        import pytorch_kinematics as pk

        if urdf_path is None:
            from mani_skill import PACKAGE_ASSET_DIR

            urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf"
        with open(urdf_path, "rb") as fh:
            urdf_blob = fh.read()
        self._chain = pk.build_serial_chain_from_urdf(urdf_blob, end_link_name=ee_link)
        self._ee_link: str = ee_link
        if self._chain.n_joints != PANDA_ARM_DOF:
            msg = (
                f"PandaJacobianProvider: expected {PANDA_ARM_DOF}-joint serial chain "
                f"to {ee_link!r}; pytorch_kinematics returned {self._chain.n_joints} "
                "joints. Check URDF integrity or pick a different end-effector link."
            )
            raise ValueError(msg)

    @property
    def ee_link(self) -> str:
        """The end-effector link the chain terminates at (read-only)."""
        return self._ee_link

    def __call__(
        self,
        snap: AgentSnapshot,
        qpos_arm: NDArray[np.floating],
    ) -> NDArray[np.float64]:
        """Compute the 3x7 linear end-effector Jacobian at ``qpos_arm`` (ADR-004 §Decision).

        Args:
            snap: Current :class:`AgentSnapshot` — unused at the
                provider level (the Jacobian is a function of joint
                state, not Cartesian state); accepted for Protocol
                conformance with
                :class:`concerto.safety.api.JacobianControlModel.jacobian_fn`.
                See module docstring for the closure-via-env-reference
                rationale and the contingent P1.05.5 follow-up that
                folds qpos into :class:`AgentSnapshot` directly.
            qpos_arm: Joint-angle vector for the 7 panda arm joints, in
                the same order as ``mani_skill.agents.robots.panda.Panda.arm_joint_names``.
                Shape ``(7,)``; dtype float-coercible to ``float32``.

        Returns:
            Linear Jacobian rows of the 6x7 SE(3) Jacobian at
            ``qpos_arm``, shape ``(3, 7)``, dtype ``float64`` (matches
            the :class:`JacobianControlModel` arithmetic contract at
            ``concerto/safety/api.py:560-561``).

        Raises:
            ValueError: If ``qpos_arm`` shape mismatches ``(7,)``.
        """
        del snap  # closure consumer reads qpos from env; see module docstring.
        import torch

        q = np.asarray(qpos_arm, dtype=np.float32)
        if q.shape != (PANDA_ARM_DOF,):
            msg = (
                f"PandaJacobianProvider: qpos_arm shape {q.shape} mismatches expected "
                f"({PANDA_ARM_DOF},) for the 7-DOF Panda chain."
            )
            raise ValueError(msg)
        q_torch = torch.from_numpy(q).unsqueeze(0)
        # pk.SerialChain.jacobian returns (N_batch, 6, n_dof); rows 0:3
        # are the linear part (d_pos / d_q), rows 3:6 are angular.
        # The CBF backbone consumes the linear part only — orientation
        # is out of scope for the pairwise distance barrier
        # (ADR-004 §Decision). See module docstring.
        jac_6x7 = self._chain.jacobian(q_torch)[0]
        jac_3x7 = jac_6x7[:CARTESIAN_POSITION_DIM].detach().cpu().numpy()
        return np.ascontiguousarray(jac_3x7, dtype=np.float64)
ee_link: str

The end-effector link the chain terminates at (read-only).

__call__

__call__(
    snap: AgentSnapshot, qpos_arm: NDArray[floating]
) -> NDArray[np.float64]

Compute the 3x7 linear end-effector Jacobian at qpos_arm (ADR-004 §Decision).

Parameters:

  • snap (AgentSnapshot) –

    Current :class:AgentSnapshot — unused at the provider level (the Jacobian is a function of joint state, not Cartesian state); accepted for Protocol conformance with :class:concerto.safety.api.JacobianControlModel.jacobian_fn. See module docstring for the closure-via-env-reference rationale and the contingent P1.05.5 follow-up that folds qpos into :class:AgentSnapshot directly.

  • qpos_arm (NDArray[floating]) –

    Joint-angle vector for the 7 panda arm joints, in the same order as mani_skill.agents.robots.panda.Panda.arm_joint_names. Shape (7,); dtype float-coercible to float32.

Returns:

  • NDArray[float64]

    Linear Jacobian rows of the 6x7 SE(3) Jacobian at

  • NDArray[float64]

    qpos_arm, shape (3, 7), dtype float64 (matches

  • the ( NDArray[float64] ) –

    class:JacobianControlModel arithmetic contract at

  • NDArray[float64]

    concerto/safety/api.py:560-561).

Raises:

  • ValueError

    If qpos_arm shape mismatches (7,).

Source code in src/chamber/agents/panda_jacobian.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def __call__(
    self,
    snap: AgentSnapshot,
    qpos_arm: NDArray[np.floating],
) -> NDArray[np.float64]:
    """Compute the 3x7 linear end-effector Jacobian at ``qpos_arm`` (ADR-004 §Decision).

    Args:
        snap: Current :class:`AgentSnapshot` — unused at the
            provider level (the Jacobian is a function of joint
            state, not Cartesian state); accepted for Protocol
            conformance with
            :class:`concerto.safety.api.JacobianControlModel.jacobian_fn`.
            See module docstring for the closure-via-env-reference
            rationale and the contingent P1.05.5 follow-up that
            folds qpos into :class:`AgentSnapshot` directly.
        qpos_arm: Joint-angle vector for the 7 panda arm joints, in
            the same order as ``mani_skill.agents.robots.panda.Panda.arm_joint_names``.
            Shape ``(7,)``; dtype float-coercible to ``float32``.

    Returns:
        Linear Jacobian rows of the 6x7 SE(3) Jacobian at
        ``qpos_arm``, shape ``(3, 7)``, dtype ``float64`` (matches
        the :class:`JacobianControlModel` arithmetic contract at
        ``concerto/safety/api.py:560-561``).

    Raises:
        ValueError: If ``qpos_arm`` shape mismatches ``(7,)``.
    """
    del snap  # closure consumer reads qpos from env; see module docstring.
    import torch

    q = np.asarray(qpos_arm, dtype=np.float32)
    if q.shape != (PANDA_ARM_DOF,):
        msg = (
            f"PandaJacobianProvider: qpos_arm shape {q.shape} mismatches expected "
            f"({PANDA_ARM_DOF},) for the 7-DOF Panda chain."
        )
        raise ValueError(msg)
    q_torch = torch.from_numpy(q).unsqueeze(0)
    # pk.SerialChain.jacobian returns (N_batch, 6, n_dof); rows 0:3
    # are the linear part (d_pos / d_q), rows 3:6 are angular.
    # The CBF backbone consumes the linear part only — orientation
    # is out of scope for the pairwise distance barrier
    # (ADR-004 §Decision). See module docstring.
    jac_6x7 = self._chain.jacobian(q_torch)[0]
    jac_3x7 = jac_6x7[:CARTESIAN_POSITION_DIM].detach().cpu().numpy()
    return np.ascontiguousarray(jac_3x7, dtype=np.float64)

__init__

__init__(
    urdf_path: str | None = None,
    *,
    ee_link: str = "panda_hand_tcp",
) -> None

Parse the URDF and cache the serial chain (ADR-007 §Stage 1b).

Source code in src/chamber/agents/panda_jacobian.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def __init__(
    self,
    urdf_path: str | None = None,
    *,
    ee_link: str = "panda_hand_tcp",
) -> None:
    """Parse the URDF and cache the serial chain (ADR-007 §Stage 1b)."""
    import pytorch_kinematics as pk

    if urdf_path is None:
        from mani_skill import PACKAGE_ASSET_DIR

        urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf"
    with open(urdf_path, "rb") as fh:
        urdf_blob = fh.read()
    self._chain = pk.build_serial_chain_from_urdf(urdf_blob, end_link_name=ee_link)
    self._ee_link: str = ee_link
    if self._chain.n_joints != PANDA_ARM_DOF:
        msg = (
            f"PandaJacobianProvider: expected {PANDA_ARM_DOF}-joint serial chain "
            f"to {ee_link!r}; pytorch_kinematics returned {self._chain.n_joints} "
            "joints. Check URDF integrity or pick a different end-effector link."
        )
        raise ValueError(msg)

PandaPartner

Bases: Panda

Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).

Inherits every Panda hook (arm joints, gripper joints, TCP pose, grasp / static predicates, controller configs) from :class:mani_skill.agents.robots.panda.panda.Panda. The only difference is the URDF: :class:PandaPartner uses panda_v2.urdf (no on-gripper camera mount), since the AS-homo partner is driven by the scripted-heuristic / trained policy rather than by camera-fed perception (the ego owns the camera via panda_wristcam).

Source code in src/chamber/agents/panda_partner.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
@register_agent()
class PandaPartner(Panda):
    """Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).

    Inherits every Panda hook (arm joints, gripper joints, TCP pose,
    grasp / static predicates, controller configs) from
    :class:`mani_skill.agents.robots.panda.panda.Panda`. The only
    difference is the URDF: :class:`PandaPartner` uses ``panda_v2.urdf``
    (no on-gripper camera mount), since the AS-homo partner is driven by
    the scripted-heuristic / trained policy rather than by camera-fed
    perception (the ego owns the camera via ``panda_wristcam``).
    """

    uid = "panda_partner"
    urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v2.urdf"

PandaPartner — second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).

The Stage-1 AS axis pre-registration (spikes/preregistration/AS.yaml, git tag prereg-stage1-AS-2026-05-15) names the homogeneous condition's agent tuple as ("panda_wristcam", "panda_partner"). ManiSkill v3.0.1's built-in agent registry covers panda, panda_wristcam, panda_stick, fetch, and the Allegro hands (see mani_skill.agents.registration.REGISTERED_AGENTS) — there is no panda_partner. The strip-suffix path in :func:chamber.envs._sapien_compat.load_agent_with_bare_uids would collapse two panda_wristcam entries to a single agents_dict key, so the AS-homo "two pandas" case needs a second, distinct uid.

This module defines that uid: a thin :class:Panda subclass mounted on the no-wristcam panda_v2.urdf (the AS-homo partner doesn't need its own perception — the ego owns the camera mount via panda_wristcam).

Wrapper-only discipline (ADR-001 §Decision (wrapper-only)): :class:PandaPartner is registered via the public @register_agent() decorator from mani_skill.agents.registration. No mani_skill/ source patching; no _* private imports. The base :class:Panda class's controller_configs, arm_joint_names, gripper_joint_names, tcp_pose property, is_grasping, and is_static are inherited unchanged — the only difference from panda_wristcam is the URDF choice (no on-gripper camera mount).

References: - ADR-007 §Stage 1b (P1.03 introduces the partner agent for the AS axis). - spikes/preregistration/AS.yaml (the pre-registration that names the ("panda_wristcam", "panda_partner") tuple). - mani_skill.agents.robots.panda.panda_wristcam.PandaWristCam (the parallel uid; pattern this class mirrors).

PandaPartner

Bases: Panda

Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).

Inherits every Panda hook (arm joints, gripper joints, TCP pose, grasp / static predicates, controller configs) from :class:mani_skill.agents.robots.panda.panda.Panda. The only difference is the URDF: :class:PandaPartner uses panda_v2.urdf (no on-gripper camera mount), since the AS-homo partner is driven by the scripted-heuristic / trained policy rather than by camera-fed perception (the ego owns the camera via panda_wristcam).

Source code in src/chamber/agents/panda_partner.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
@register_agent()
class PandaPartner(Panda):
    """Second-panda agent for the AS-homogeneous condition (ADR-007 §Stage 1b).

    Inherits every Panda hook (arm joints, gripper joints, TCP pose,
    grasp / static predicates, controller configs) from
    :class:`mani_skill.agents.robots.panda.panda.Panda`. The only
    difference is the URDF: :class:`PandaPartner` uses ``panda_v2.urdf``
    (no on-gripper camera mount), since the AS-homo partner is driven by
    the scripted-heuristic / trained policy rather than by camera-fed
    perception (the ego owns the camera via ``panda_wristcam``).
    """

    uid = "panda_partner"
    urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v2.urdf"

Panda 7-DOF end-effector Jacobian provider (ADR-004 §Decision; ADR-007 §Stage 1b).

Wraps :mod:pytorch_kinematics (already a transitive dep of mani-skill==3.0.1 — :mod:mani_skill.agents.controllers.utils.kinematics uses it for built-in IK; P1.03 promotes it to a load-bearing direct dep in pyproject.toml so :class:PandaJacobianProvider has a stable surface under wrapper-only discipline) to compute the analytical 3x7 linear end-effector Jacobian :math:J(q) \\in \\mathbb{R}^{3 \\times 7} the :class:concerto.safety.api.JacobianControlModel needs.

Why the 3x7 linear slice rather than the full 6x7 SE(3) Jacobian: :class:concerto.safety.cbf_qp.AgentSnapshot carries Cartesian position (shape (d,), typically d == 3) and velocity only — no orientation channel. The CBF backbone's pairwise barrier :math:h_{ij} is a Euclidean-distance constraint; angular Jacobian rows would be dimensionally orthogonal to it. The damped-pseudoinverse in :meth:JacobianControlModel.cartesian_accel_to_action (Nakamura & Hanafusa 1986) operates on the 3x7 slice; consistency with the safety-stack contract is the design constraint.

Note on the closure-via-env-reference workaround (Q3 mod 3 in the P1.03 design conversation): :class:AgentSnapshot does not carry qpos (extension is sequenced to slice P1.05.5 per ADR-004 §Open questions; the contingent trigger is Stage-1b λ-telemetry saturation). The :class:PandaJacobianProvider.__call__ signature therefore takes (snap, qpos_arm) — the env's :meth:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv.build_control_models constructs a closure lambda snap: provider(snap, env._latest_qpos["panda_wristcam"]) that captures a fresh env reference per (seed, condition) cell (safe because :func:chamber.benchmarks.stage1_as._run_axis_with_factories builds the env once per cell and reuses it across the 20 evaluation episodes within the cell — the closure stays valid for the cell's lifetime; cited at stage1_as.py:253-275).

References: - ADR-004 §Decision (the JacobianControlModel contract). - ADR-004 §Open questions (the AgentSnapshot.qpos extension as the long-term fix; slice P1.05.5). - ADR-007 §Stage 1b (the slice introducing this provider). - :class:concerto.safety.api.JacobianControlModel (the consumer). - mani_skill.agents.controllers.utils.kinematics.Kinematics (the precedent for using pytorch_kinematics in ManiSkill's own IK).

PandaJacobianProvider

Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).

Parses the panda URDF once at construction (via :func:pytorch_kinematics.build_serial_chain_from_urdf) and caches the resulting :class:pytorch_kinematics.SerialChain. Each :meth:__call__ evaluates :math:J(q) on the cached chain.

Parameters:

  • urdf_path (str | None, default: None ) –

    Absolute path to the panda URDF on disk. The default (None) selects the wristcam URDF ({PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf) since the Stage-1b panda is by convention the wristcam variant; pass panda_v2.urdf to target the no-camera URDF used by :class:PandaPartner.

  • ee_link (str, default: 'panda_hand_tcp' ) –

    Name of the end-effector link the chain terminates at. Defaults to "panda_hand_tcp" — the tool-center-point link the ManiSkill PickCube task uses for self.agent.tcp_pose (see mani_skill/envs/tasks/tabletop/pick_cube.py:136). The serial chain extracted with this end-link has exactly 7 joints (the 7 arm joints; the two gripper-finger joints are siblings of panda_hand, not in the chain).

Raises:

  • FileNotFoundError

    If urdf_path does not exist on disk.

  • ValueError

    If :func:pytorch_kinematics.build_serial_chain_from_urdf fails to extract a chain ending at ee_link.

Source code in src/chamber/agents/panda_jacobian.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
class PandaJacobianProvider:
    """Frozen URDF chain -> linear Jacobian callable (ADR-004 §Decision; §Stage 1b).

    Parses the panda URDF once at construction (via
    :func:`pytorch_kinematics.build_serial_chain_from_urdf`) and caches
    the resulting :class:`pytorch_kinematics.SerialChain`. Each
    :meth:`__call__` evaluates :math:`J(q)` on the cached chain.

    Args:
        urdf_path: Absolute path to the panda URDF on disk. The default
            (``None``) selects the wristcam URDF
            (``{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf``) since
            the Stage-1b panda is by convention the wristcam variant;
            pass ``panda_v2.urdf`` to target the no-camera URDF used by
            :class:`PandaPartner`.
        ee_link: Name of the end-effector link the chain terminates at.
            Defaults to ``"panda_hand_tcp"`` — the tool-center-point
            link the ManiSkill PickCube task uses for
            ``self.agent.tcp_pose`` (see
            ``mani_skill/envs/tasks/tabletop/pick_cube.py:136``). The
            serial chain extracted with this end-link has exactly 7
            joints (the 7 arm joints; the two gripper-finger joints are
            siblings of ``panda_hand``, not in the chain).

    Raises:
        FileNotFoundError: If ``urdf_path`` does not exist on disk.
        ValueError: If :func:`pytorch_kinematics.build_serial_chain_from_urdf`
            fails to extract a chain ending at ``ee_link``.
    """

    def __init__(
        self,
        urdf_path: str | None = None,
        *,
        ee_link: str = "panda_hand_tcp",
    ) -> None:
        """Parse the URDF and cache the serial chain (ADR-007 §Stage 1b)."""
        import pytorch_kinematics as pk

        if urdf_path is None:
            from mani_skill import PACKAGE_ASSET_DIR

            urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf"
        with open(urdf_path, "rb") as fh:
            urdf_blob = fh.read()
        self._chain = pk.build_serial_chain_from_urdf(urdf_blob, end_link_name=ee_link)
        self._ee_link: str = ee_link
        if self._chain.n_joints != PANDA_ARM_DOF:
            msg = (
                f"PandaJacobianProvider: expected {PANDA_ARM_DOF}-joint serial chain "
                f"to {ee_link!r}; pytorch_kinematics returned {self._chain.n_joints} "
                "joints. Check URDF integrity or pick a different end-effector link."
            )
            raise ValueError(msg)

    @property
    def ee_link(self) -> str:
        """The end-effector link the chain terminates at (read-only)."""
        return self._ee_link

    def __call__(
        self,
        snap: AgentSnapshot,
        qpos_arm: NDArray[np.floating],
    ) -> NDArray[np.float64]:
        """Compute the 3x7 linear end-effector Jacobian at ``qpos_arm`` (ADR-004 §Decision).

        Args:
            snap: Current :class:`AgentSnapshot` — unused at the
                provider level (the Jacobian is a function of joint
                state, not Cartesian state); accepted for Protocol
                conformance with
                :class:`concerto.safety.api.JacobianControlModel.jacobian_fn`.
                See module docstring for the closure-via-env-reference
                rationale and the contingent P1.05.5 follow-up that
                folds qpos into :class:`AgentSnapshot` directly.
            qpos_arm: Joint-angle vector for the 7 panda arm joints, in
                the same order as ``mani_skill.agents.robots.panda.Panda.arm_joint_names``.
                Shape ``(7,)``; dtype float-coercible to ``float32``.

        Returns:
            Linear Jacobian rows of the 6x7 SE(3) Jacobian at
            ``qpos_arm``, shape ``(3, 7)``, dtype ``float64`` (matches
            the :class:`JacobianControlModel` arithmetic contract at
            ``concerto/safety/api.py:560-561``).

        Raises:
            ValueError: If ``qpos_arm`` shape mismatches ``(7,)``.
        """
        del snap  # closure consumer reads qpos from env; see module docstring.
        import torch

        q = np.asarray(qpos_arm, dtype=np.float32)
        if q.shape != (PANDA_ARM_DOF,):
            msg = (
                f"PandaJacobianProvider: qpos_arm shape {q.shape} mismatches expected "
                f"({PANDA_ARM_DOF},) for the 7-DOF Panda chain."
            )
            raise ValueError(msg)
        q_torch = torch.from_numpy(q).unsqueeze(0)
        # pk.SerialChain.jacobian returns (N_batch, 6, n_dof); rows 0:3
        # are the linear part (d_pos / d_q), rows 3:6 are angular.
        # The CBF backbone consumes the linear part only — orientation
        # is out of scope for the pairwise distance barrier
        # (ADR-004 §Decision). See module docstring.
        jac_6x7 = self._chain.jacobian(q_torch)[0]
        jac_3x7 = jac_6x7[:CARTESIAN_POSITION_DIM].detach().cpu().numpy()
        return np.ascontiguousarray(jac_3x7, dtype=np.float64)
ee_link: str

The end-effector link the chain terminates at (read-only).

__call__

__call__(
    snap: AgentSnapshot, qpos_arm: NDArray[floating]
) -> NDArray[np.float64]

Compute the 3x7 linear end-effector Jacobian at qpos_arm (ADR-004 §Decision).

Parameters:

  • snap (AgentSnapshot) –

    Current :class:AgentSnapshot — unused at the provider level (the Jacobian is a function of joint state, not Cartesian state); accepted for Protocol conformance with :class:concerto.safety.api.JacobianControlModel.jacobian_fn. See module docstring for the closure-via-env-reference rationale and the contingent P1.05.5 follow-up that folds qpos into :class:AgentSnapshot directly.

  • qpos_arm (NDArray[floating]) –

    Joint-angle vector for the 7 panda arm joints, in the same order as mani_skill.agents.robots.panda.Panda.arm_joint_names. Shape (7,); dtype float-coercible to float32.

Returns:

  • NDArray[float64]

    Linear Jacobian rows of the 6x7 SE(3) Jacobian at

  • NDArray[float64]

    qpos_arm, shape (3, 7), dtype float64 (matches

  • the ( NDArray[float64] ) –

    class:JacobianControlModel arithmetic contract at

  • NDArray[float64]

    concerto/safety/api.py:560-561).

Raises:

  • ValueError

    If qpos_arm shape mismatches (7,).

Source code in src/chamber/agents/panda_jacobian.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def __call__(
    self,
    snap: AgentSnapshot,
    qpos_arm: NDArray[np.floating],
) -> NDArray[np.float64]:
    """Compute the 3x7 linear end-effector Jacobian at ``qpos_arm`` (ADR-004 §Decision).

    Args:
        snap: Current :class:`AgentSnapshot` — unused at the
            provider level (the Jacobian is a function of joint
            state, not Cartesian state); accepted for Protocol
            conformance with
            :class:`concerto.safety.api.JacobianControlModel.jacobian_fn`.
            See module docstring for the closure-via-env-reference
            rationale and the contingent P1.05.5 follow-up that
            folds qpos into :class:`AgentSnapshot` directly.
        qpos_arm: Joint-angle vector for the 7 panda arm joints, in
            the same order as ``mani_skill.agents.robots.panda.Panda.arm_joint_names``.
            Shape ``(7,)``; dtype float-coercible to ``float32``.

    Returns:
        Linear Jacobian rows of the 6x7 SE(3) Jacobian at
        ``qpos_arm``, shape ``(3, 7)``, dtype ``float64`` (matches
        the :class:`JacobianControlModel` arithmetic contract at
        ``concerto/safety/api.py:560-561``).

    Raises:
        ValueError: If ``qpos_arm`` shape mismatches ``(7,)``.
    """
    del snap  # closure consumer reads qpos from env; see module docstring.
    import torch

    q = np.asarray(qpos_arm, dtype=np.float32)
    if q.shape != (PANDA_ARM_DOF,):
        msg = (
            f"PandaJacobianProvider: qpos_arm shape {q.shape} mismatches expected "
            f"({PANDA_ARM_DOF},) for the 7-DOF Panda chain."
        )
        raise ValueError(msg)
    q_torch = torch.from_numpy(q).unsqueeze(0)
    # pk.SerialChain.jacobian returns (N_batch, 6, n_dof); rows 0:3
    # are the linear part (d_pos / d_q), rows 3:6 are angular.
    # The CBF backbone consumes the linear part only — orientation
    # is out of scope for the pairwise distance barrier
    # (ADR-004 §Decision). See module docstring.
    jac_6x7 = self._chain.jacobian(q_torch)[0]
    jac_3x7 = jac_6x7[:CARTESIAN_POSITION_DIM].detach().cpu().numpy()
    return np.ascontiguousarray(jac_3x7, dtype=np.float64)

__init__

__init__(
    urdf_path: str | None = None,
    *,
    ee_link: str = "panda_hand_tcp",
) -> None

Parse the URDF and cache the serial chain (ADR-007 §Stage 1b).

Source code in src/chamber/agents/panda_jacobian.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def __init__(
    self,
    urdf_path: str | None = None,
    *,
    ee_link: str = "panda_hand_tcp",
) -> None:
    """Parse the URDF and cache the serial chain (ADR-007 §Stage 1b)."""
    import pytorch_kinematics as pk

    if urdf_path is None:
        from mani_skill import PACKAGE_ASSET_DIR

        urdf_path = f"{PACKAGE_ASSET_DIR}/robots/panda/panda_v3.urdf"
    with open(urdf_path, "rb") as fh:
        urdf_blob = fh.read()
    self._chain = pk.build_serial_chain_from_urdf(urdf_blob, end_link_name=ee_link)
    self._ee_link: str = ee_link
    if self._chain.n_joints != PANDA_ARM_DOF:
        msg = (
            f"PandaJacobianProvider: expected {PANDA_ARM_DOF}-joint serial chain "
            f"to {ee_link!r}; pytorch_kinematics returned {self._chain.n_joints} "
            "joints. Check URDF integrity or pick a different end-effector link."
        )
        raise ValueError(msg)

chamber.comm

CHAMBER communication stack — fixed-format channel + URLLC degradation.

ADR-003 §Decision (mandatory fixed-format channel + opt-in learned overlay); ADR-006 §Decision (URLLC-anchored numeric bounds for latency, jitter, drop).

Public surface:

  • :class:CommChannel — Protocol (re-exported from :mod:chamber.comm.api).
  • :class:CommPacket, :class:Pose, :class:TaskStatePredicate, :data:SCHEMA_VERSION — wire format.
  • :class:FixedFormatCommChannel — the mandatory fixed-format encoder/decoder.
  • :class:AoIClock — per-uid Age-of-Information accounting.
  • :class:ChamberCommError, :class:ChamberCommQPSaturationWarning — error and warning types.

Subsequent PRs (T2.4-T2.6) add the learned-overlay slot, the :class:CommDegradationWrapper, and the URLLC profile table.

AoIClock

Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).

Usage::

clock = AoIClock()
clock.mark_fresh("panda")  # registers "panda" with AoI=0
clock.tick(0.01)  # +10 ms for every registered uid
clock.aoi("panda")  # -> 0.01

Uids are auto-registered on first :meth:mark_fresh. :meth:reset clears all registrations so a fresh episode begins from a clean slate.

Source code in src/chamber/comm/aoi.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
class AoIClock:
    """Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).

    Usage::

        clock = AoIClock()
        clock.mark_fresh("panda")  # registers "panda" with AoI=0
        clock.tick(0.01)  # +10 ms for every registered uid
        clock.aoi("panda")  # -> 0.01

    Uids are auto-registered on first :meth:`mark_fresh`. :meth:`reset`
    clears all registrations so a fresh episode begins from a clean slate.
    """

    def __init__(self) -> None:
        """Create an empty clock with no uids registered (ADR-003 §Decision)."""
        self._aoi: dict[str, float] = {}

    def reset(self) -> None:
        """Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

        After ``reset``, :meth:`aoi` raises ``KeyError`` for every uid until
        the next :meth:`mark_fresh`. This matches the channel's
        ``CommChannel.reset`` semantics: a new episode starts with no state.
        """
        self._aoi.clear()

    def mark_fresh(self, uid: str) -> None:
        """Mark ``uid`` as having received a fresh sample (AoI -> 0).

        ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
        Auto-registers ``uid`` on first call.
        """
        self._aoi[uid] = 0.0

    def tick(self, dt: float) -> None:
        """Advance time by ``dt`` for every registered uid (ADR-003 §Decision).

        Args:
            dt: Elapsed simulation time in seconds. Must be non-negative;
                AoI cannot run backwards (notes/tier2/41 §V definition).

        Raises:
            ValueError: when ``dt`` is negative.
        """
        if dt < 0.0:
            msg = f"tick dt must be non-negative; got {dt!r}"
            raise ValueError(msg)
        for uid in self._aoi:
            self._aoi[uid] += dt

    def aoi(self, uid: str) -> float:
        """Return the current AoI for ``uid`` (ADR-003 §Decision).

        Raises:
            KeyError: if ``uid`` is not registered (loud-fail per ADR-003).
        """
        return self._aoi[uid]

    def snapshot(self) -> dict[str, float]:
        """Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

        The returned dict is decoupled from the clock's internal state, so
        callers may freely mutate it (typical use: pack into the next
        ``CommPacket["aoi"]`` field).
        """
        return dict(self._aoi)

__init__

__init__() -> None

Create an empty clock with no uids registered (ADR-003 §Decision).

Source code in src/chamber/comm/aoi.py
34
35
36
def __init__(self) -> None:
    """Create an empty clock with no uids registered (ADR-003 §Decision)."""
    self._aoi: dict[str, float] = {}

aoi

aoi(uid: str) -> float

Return the current AoI for uid (ADR-003 §Decision).

Raises:

  • KeyError

    if uid is not registered (loud-fail per ADR-003).

Source code in src/chamber/comm/aoi.py
71
72
73
74
75
76
77
def aoi(self, uid: str) -> float:
    """Return the current AoI for ``uid`` (ADR-003 §Decision).

    Raises:
        KeyError: if ``uid`` is not registered (loud-fail per ADR-003).
    """
    return self._aoi[uid]

mark_fresh

mark_fresh(uid: str) -> None

Mark uid as having received a fresh sample (AoI -> 0).

ADR-003 §Decision: every fresh transmission zeroes that uid's AoI. Auto-registers uid on first call.

Source code in src/chamber/comm/aoi.py
47
48
49
50
51
52
53
def mark_fresh(self, uid: str) -> None:
    """Mark ``uid`` as having received a fresh sample (AoI -> 0).

    ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
    Auto-registers ``uid`` on first call.
    """
    self._aoi[uid] = 0.0

reset

reset() -> None

Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

After reset, :meth:aoi raises KeyError for every uid until the next :meth:mark_fresh. This matches the channel's CommChannel.reset semantics: a new episode starts with no state.

Source code in src/chamber/comm/aoi.py
38
39
40
41
42
43
44
45
def reset(self) -> None:
    """Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

    After ``reset``, :meth:`aoi` raises ``KeyError`` for every uid until
    the next :meth:`mark_fresh`. This matches the channel's
    ``CommChannel.reset`` semantics: a new episode starts with no state.
    """
    self._aoi.clear()

snapshot

snapshot() -> dict[str, float]

Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

The returned dict is decoupled from the clock's internal state, so callers may freely mutate it (typical use: pack into the next CommPacket["aoi"] field).

Source code in src/chamber/comm/aoi.py
79
80
81
82
83
84
85
86
def snapshot(self) -> dict[str, float]:
    """Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

    The returned dict is decoupled from the clock's internal state, so
    callers may freely mutate it (typical use: pack into the next
    ``CommPacket["aoi"]`` field).
    """
    return dict(self._aoi)

tick

tick(dt: float) -> None

Advance time by dt for every registered uid (ADR-003 §Decision).

Parameters:

  • dt (float) –

    Elapsed simulation time in seconds. Must be non-negative; AoI cannot run backwards (notes/tier2/41 §V definition).

Raises:

  • ValueError

    when dt is negative.

Source code in src/chamber/comm/aoi.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def tick(self, dt: float) -> None:
    """Advance time by ``dt`` for every registered uid (ADR-003 §Decision).

    Args:
        dt: Elapsed simulation time in seconds. Must be non-negative;
            AoI cannot run backwards (notes/tier2/41 §V definition).

    Raises:
        ValueError: when ``dt`` is negative.
    """
    if dt < 0.0:
        msg = f"tick dt must be non-negative; got {dt!r}"
        raise ValueError(msg)
    for uid in self._aoi:
        self._aoi[uid] += dt

ChamberCommError

Bases: RuntimeError

Raised when a comm-channel contract is violated (ADR-003 §Decision).

Examples include malformed packets (missing schema_version, unrecognised packet keys), unsupported profile lookups, and channel state that cannot be reconciled (e.g., decoding a packet whose schema version differs from the channel's).

Source code in src/chamber/comm/errors.py
13
14
15
16
17
18
19
20
class ChamberCommError(RuntimeError):
    """Raised when a comm-channel contract is violated (ADR-003 §Decision).

    Examples include malformed packets (missing ``schema_version``,
    unrecognised packet keys), unsupported profile lookups, and channel
    state that cannot be reconciled (e.g., decoding a packet whose
    schema version differs from the channel's).
    """

ChamberCommQPSaturationWarning

Bases: UserWarning

Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).

The QP-saturation guard test in tests/property/test_qp_saturation.py sweeps the :data:chamber.comm.profiles.URLLC_3GPP_R17 table and asserts that the inner CBF QP solve time stays below 1 ms (OSCBF target from ADR-004). If the most aggressive profile saturates, the wrapper raises this warning rather than silently allowing it (ADR-006 Risk register R5).

Source code in src/chamber/comm/errors.py
23
24
25
26
27
28
29
30
31
32
33
class ChamberCommQPSaturationWarning(UserWarning):
    """Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).

    The QP-saturation guard test in
    ``tests/property/test_qp_saturation.py`` sweeps the
    :data:`chamber.comm.profiles.URLLC_3GPP_R17` table and asserts that
    the inner CBF QP solve time stays below 1 ms (OSCBF target from
    ADR-004). If the most aggressive profile saturates, the wrapper
    raises this warning rather than silently allowing it (ADR-006
    Risk register R5).
    """

CommChannel

Bases: Protocol

Contract for an inter-agent communication channel (ADR-003 §Decision).

Implementations encode the env-side state into a :class:CommPacket and decode it back into the dict subset that downstream consumers (safety filter M3, partner stack M4, evaluation harness M5) expect under obs["comm"]. ADR-006 §Decision: implementations honour the URLLC-anchored latency / jitter / drop bounds when composed with :class:chamber.comm.degradation.CommDegradationWrapper.

Source code in src/chamber/comm/api.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
@runtime_checkable
class CommChannel(Protocol):
    """Contract for an inter-agent communication channel (ADR-003 §Decision).

    Implementations encode the env-side state into a :class:`CommPacket` and
    decode it back into the dict subset that downstream consumers (safety
    filter M3, partner stack M4, evaluation harness M5) expect under
    ``obs["comm"]``. ADR-006 §Decision: implementations honour the
    URLLC-anchored latency / jitter / drop bounds when composed with
    :class:`chamber.comm.degradation.CommDegradationWrapper`.
    """

    def reset(self, *, seed: int | None = None) -> None:
        """Reset channel state at episode start (ADR-003 §Decision).

        Args:
            seed: Optional root seed for the channel's deterministic RNG
                substream (P6 reproducibility). Implementations route this
                through ``concerto.training.seeding.derive_substream``.
        """
        ...

    def encode(self, state: object) -> CommPacket:
        """Encode env state into a comm-channel packet (ADR-003 §Decision).

        Args:
            state: Env-side observation or raw physics state (impl-defined).

        Returns:
            A :class:`CommPacket` honouring :data:`SCHEMA_VERSION`.
        """
        ...

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

        Args:
            packet: A :class:`CommPacket` previously produced by :meth:`encode`.

        Returns:
            The schema-typed subset of the env state expressible via the
            packet (pose / task-state predicate / AoI), keyed by uid.
        """
        ...

decode

decode(packet: CommPacket) -> dict[str, object]

Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

Parameters:

  • packet (CommPacket) –

    A :class:CommPacket previously produced by :meth:encode.

Returns:

  • dict[str, object]

    The schema-typed subset of the env state expressible via the

  • dict[str, object]

    packet (pose / task-state predicate / AoI), keyed by uid.

Source code in src/chamber/comm/api.py
113
114
115
116
117
118
119
120
121
122
123
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

    Args:
        packet: A :class:`CommPacket` previously produced by :meth:`encode`.

    Returns:
        The schema-typed subset of the env state expressible via the
        packet (pose / task-state predicate / AoI), keyed by uid.
    """
    ...

encode

encode(state: object) -> CommPacket

Encode env state into a comm-channel packet (ADR-003 §Decision).

Parameters:

  • state (object) –

    Env-side observation or raw physics state (impl-defined).

Returns:

  • A ( CommPacket ) –

    class:CommPacket honouring :data:SCHEMA_VERSION.

Source code in src/chamber/comm/api.py
102
103
104
105
106
107
108
109
110
111
def encode(self, state: object) -> CommPacket:
    """Encode env state into a comm-channel packet (ADR-003 §Decision).

    Args:
        state: Env-side observation or raw physics state (impl-defined).

    Returns:
        A :class:`CommPacket` honouring :data:`SCHEMA_VERSION`.
    """
    ...

reset

reset(*, seed: int | None = None) -> None

Reset channel state at episode start (ADR-003 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional root seed for the channel's deterministic RNG substream (P6 reproducibility). Implementations route this through concerto.training.seeding.derive_substream.

Source code in src/chamber/comm/api.py
 92
 93
 94
 95
 96
 97
 98
 99
100
def reset(self, *, seed: int | None = None) -> None:
    """Reset channel state at episode start (ADR-003 §Decision).

    Args:
        seed: Optional root seed for the channel's deterministic RNG
            substream (P6 reproducibility). Implementations route this
            through ``concerto.training.seeding.derive_substream``.
    """
    ...

CommDegradationStats dataclass

Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).

Each delivered packet contributes one entry to latency_ticks (the in-queue dwell time, in ticks). Each dropped packet increments dropped without producing a latency entry.

Source code in src/chamber/comm/degradation.py
78
79
80
81
82
83
84
85
86
87
88
89
@dataclass(frozen=True)
class CommDegradationStats:
    """Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).

    Each delivered packet contributes one entry to ``latency_ticks``
    (the in-queue dwell time, in ticks). Each dropped packet increments
    ``dropped`` without producing a latency entry.
    """

    dropped: int = 0
    delivered: int = 0
    latency_ticks: tuple[int, ...] = field(default_factory=tuple)

CommDegradationWrapper

Compose-time degradation around a fixed-format channel (ADR-006 §Decision).

Wraps any :class:~chamber.comm.api.CommChannel and injects URLLC-anchored latency / jitter / drop semantics described above. The inner channel is unaware of degradation; the wrapper is itself a :class:CommChannel so it composes transparently with :class:chamber.envs.comm_shaping.CommShapingWrapper.

Parameters:

  • channel (CommChannel) –

    The inner channel whose packets will be degraded.

  • profile (DegradationProfile) –

    The degradation envelope (see :class:DegradationProfile).

  • tick_period_ms (float, default: 1.0 ) –

    How many milliseconds each encode call represents. Defaults to 1.0 so the URLLC ms-denominated profiles map tick-for-millisecond.

  • root_seed (int, default: 0 ) –

    Root seed for the wrapper's RNG (P6 determinism).

Source code in src/chamber/comm/degradation.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
class CommDegradationWrapper:
    """Compose-time degradation around a fixed-format channel (ADR-006 §Decision).

    Wraps any :class:`~chamber.comm.api.CommChannel` and injects
    URLLC-anchored latency / jitter / drop semantics described above. The
    inner channel is unaware of degradation; the wrapper is itself a
    :class:`CommChannel` so it composes transparently with
    :class:`chamber.envs.comm_shaping.CommShapingWrapper`.

    Args:
        channel: The inner channel whose packets will be degraded.
        profile: The degradation envelope (see :class:`DegradationProfile`).
        tick_period_ms: How many milliseconds each ``encode`` call represents.
            Defaults to ``1.0`` so the URLLC ms-denominated profiles map
            tick-for-millisecond.
        root_seed: Root seed for the wrapper's RNG (P6 determinism).
    """

    def __init__(
        self,
        channel: CommChannel,
        profile: DegradationProfile,
        *,
        tick_period_ms: float = 1.0,
        root_seed: int = 0,
    ) -> None:
        """Build the wrapper (ADR-006 §Decision)."""
        if tick_period_ms <= 0.0:
            msg = f"tick_period_ms must be positive; got {tick_period_ms!r}"
            raise ValueError(msg)
        self._channel = channel
        self._profile = profile
        self._tick_period_ms = tick_period_ms
        self._root_seed: int = root_seed
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
        self._tick: int = 0
        self._queue: list[_Queued] = []
        self._visible: CommPacket = _empty_packet()
        self._dropped: int = 0
        self._delivered: int = 0
        self._latency_ticks: list[int] = []

    @property
    def stats(self) -> CommDegradationStats:
        """Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM)."""
        return CommDegradationStats(
            dropped=self._dropped,
            delivered=self._delivered,
            latency_ticks=tuple(self._latency_ticks),
        )

    def reset(self, *, seed: int | None = None) -> None:
        """Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

        Args:
            seed: Optional new root seed; otherwise the previously stored
                root seed is reused so episode resets stay reproducible.
        """
        if seed is not None:
            self._root_seed = seed
        self._channel.reset(seed=self._root_seed)
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
        self._tick = 0
        self._queue = []
        self._visible = _empty_packet()
        self._dropped = 0
        self._delivered = 0
        self._latency_ticks = []

    def encode(self, state: object) -> CommPacket:
        """Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

        See module docstring for the per-step behaviour spec.
        """
        self._tick += 1
        fresh = self._channel.encode(state)
        if self._draw_drop():
            self._dropped += 1
            self._visible = self._age_packet(self._visible, by_ticks=1)
            return _copy_packet(self._visible)
        delay_ticks = self._draw_delay_ticks()
        self._queue.append(
            _Queued(release_tick=self._tick + delay_ticks, send_tick=self._tick, packet=fresh)
        )
        self._release_due()
        return _copy_packet(self._visible)

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Delegate decoding to the inner channel (ADR-003 §Decision)."""
        return self._channel.decode(packet)

    def _draw_drop(self) -> bool:
        """True iff the per-tick Bernoulli drop trial fires (P6 deterministic)."""
        if self._profile.drop_rate <= 0.0:
            return False
        return bool(self._rng.random() < self._profile.drop_rate)

    def _draw_delay_ticks(self) -> int:
        """Sample a clipped-Normal latency and convert to whole ticks (P6)."""
        mean = self._profile.latency_mean_ms
        std = self._profile.latency_std_ms
        upper = 2.0 * mean
        if std == 0.0:
            ms = max(0.0, min(mean, upper))
        else:
            rng_sample = float(self._rng.normal(loc=mean, scale=std))
            ms = max(0.0, min(rng_sample, upper))
        return round(ms / self._tick_period_ms)

    def _release_due(self) -> None:
        """Pop every queued packet whose release tick is <= now (ADR-006)."""
        if not self._queue:
            return
        due: list[_Queued] = [q for q in self._queue if q.release_tick <= self._tick]
        if not due:
            return
        self._queue = [q for q in self._queue if q.release_tick > self._tick]
        # The latest send_tick wins as the visible packet (most recent fresh sample).
        winner = max(due, key=lambda q: q.send_tick)
        for q in due:
            self._delivered += 1
            self._latency_ticks.append(q.release_tick - q.send_tick)
        dwell = self._tick - winner.send_tick
        self._visible = self._age_packet(winner.packet, by_ticks=dwell)

    @staticmethod
    def _age_packet(packet: CommPacket, *, by_ticks: int) -> CommPacket:
        """Return a copy of ``packet`` with all per-uid AoI advanced by ``by_ticks`` (ADR-003)."""
        if not packet["aoi"]:
            return packet
        aged_aoi = {uid: value + float(by_ticks) for uid, value in packet["aoi"].items()}
        aged: CommPacket = CommPacket(
            schema_version=packet["schema_version"],
            pose=dict(packet["pose"]),
            task_state=dict(packet["task_state"]),
            aoi=aged_aoi,
            learned_overlay=packet["learned_overlay"],
        )
        return aged

stats property

stats: CommDegradationStats

Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM).

__init__

__init__(
    channel: CommChannel,
    profile: DegradationProfile,
    *,
    tick_period_ms: float = 1.0,
    root_seed: int = 0,
) -> None

Build the wrapper (ADR-006 §Decision).

Source code in src/chamber/comm/degradation.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def __init__(
    self,
    channel: CommChannel,
    profile: DegradationProfile,
    *,
    tick_period_ms: float = 1.0,
    root_seed: int = 0,
) -> None:
    """Build the wrapper (ADR-006 §Decision)."""
    if tick_period_ms <= 0.0:
        msg = f"tick_period_ms must be positive; got {tick_period_ms!r}"
        raise ValueError(msg)
    self._channel = channel
    self._profile = profile
    self._tick_period_ms = tick_period_ms
    self._root_seed: int = root_seed
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
    self._tick: int = 0
    self._queue: list[_Queued] = []
    self._visible: CommPacket = _empty_packet()
    self._dropped: int = 0
    self._delivered: int = 0
    self._latency_ticks: list[int] = []

decode

decode(packet: CommPacket) -> dict[str, object]

Delegate decoding to the inner channel (ADR-003 §Decision).

Source code in src/chamber/comm/degradation.py
196
197
198
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Delegate decoding to the inner channel (ADR-003 §Decision)."""
    return self._channel.decode(packet)

encode

encode(state: object) -> CommPacket

Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

See module docstring for the per-step behaviour spec.

Source code in src/chamber/comm/degradation.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def encode(self, state: object) -> CommPacket:
    """Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

    See module docstring for the per-step behaviour spec.
    """
    self._tick += 1
    fresh = self._channel.encode(state)
    if self._draw_drop():
        self._dropped += 1
        self._visible = self._age_packet(self._visible, by_ticks=1)
        return _copy_packet(self._visible)
    delay_ticks = self._draw_delay_ticks()
    self._queue.append(
        _Queued(release_tick=self._tick + delay_ticks, send_tick=self._tick, packet=fresh)
    )
    self._release_due()
    return _copy_packet(self._visible)

reset

reset(*, seed: int | None = None) -> None

Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional new root seed; otherwise the previously stored root seed is reused so episode resets stay reproducible.

Source code in src/chamber/comm/degradation.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def reset(self, *, seed: int | None = None) -> None:
    """Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

    Args:
        seed: Optional new root seed; otherwise the previously stored
            root seed is reused so episode resets stay reproducible.
    """
    if seed is not None:
        self._root_seed = seed
    self._channel.reset(seed=self._root_seed)
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
    self._tick = 0
    self._queue = []
    self._visible = _empty_packet()
    self._dropped = 0
    self._delivered = 0
    self._latency_ticks = []

CommPacket

Bases: TypedDict

The mandatory fixed-format wire packet (ADR-003 §Decision).

Every concrete :class:CommChannel produces and consumes packets of exactly this shape. Downstream:

  • The safety filter (M3) reads packet["pose"][uid] and packet["aoi"][uid] deterministically.
  • The partner stack (M4) writes packet["pose"][uid] = self.last_pose.
  • The evaluation harness (M5) logs from the same surface.

ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.

Source code in src/chamber/comm/api.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class CommPacket(TypedDict):
    """The mandatory fixed-format wire packet (ADR-003 §Decision).

    Every concrete :class:`CommChannel` produces and consumes packets of
    exactly this shape. Downstream:

    - The safety filter (M3) reads ``packet["pose"][uid]`` and
      ``packet["aoi"][uid]`` deterministically.
    - The partner stack (M4) writes ``packet["pose"][uid] = self.last_pose``.
    - The evaluation harness (M5) logs from the same surface.

    ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.
    """

    schema_version: int
    pose: dict[str, Pose]
    task_state: dict[str, TaskStatePredicate]
    aoi: dict[str, float]
    learned_overlay: torch.Tensor | None

DegradationProfile dataclass

Latency / jitter / drop bounds for a single experimental condition.

ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial trial corpus. The named profiles in :mod:chamber.comm.profiles cover the Phase-0 Stage-2 CM sweep table.

Parameters:

  • latency_mean_ms (float) –

    Mean of the Gaussian latency draw (ms).

  • latency_std_ms (float) –

    Std of the Gaussian latency draw (ms); the draw is clipped to [0, 2 * latency_mean_ms].

  • drop_rate (float) –

    Bernoulli drop probability per packet (per encode call).

Source code in src/chamber/comm/degradation.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
@dataclass(frozen=True)
class DegradationProfile:
    """Latency / jitter / drop bounds for a single experimental condition.

    ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial
    trial corpus. The named profiles in :mod:`chamber.comm.profiles` cover the
    Phase-0 Stage-2 CM sweep table.

    Args:
        latency_mean_ms: Mean of the Gaussian latency draw (ms).
        latency_std_ms: Std of the Gaussian latency draw (ms); the draw is
            clipped to ``[0, 2 * latency_mean_ms]``.
        drop_rate: Bernoulli drop probability per packet (per encode call).
    """

    latency_mean_ms: float
    latency_std_ms: float
    drop_rate: float

FixedFormatCommChannel

Mandatory fixed-format channel (ADR-003 §Decision).

On every :meth:encode call: 1. The internal :class:AoIClock advances by one env tick (+1.0). 2. Every uid present in state["pose"] / state["task_state"] is :meth:AoIClock.mark_fresh-ed (its AoI drops to zero). 3. The result is bundled into a :class:CommPacket carrying :data:~chamber.comm.api.SCHEMA_VERSION and a snapshot of every registered uid's AoI.

:meth:decode is the inverse projection: it returns the schema-typed subset of the state expressible via the packet (pose / task-state / AoI / learned overlay). A schema-version mismatch raises :class:~chamber.comm.errors.ChamberCommError.

Determinism: an internal numpy Generator is seeded via :func:concerto.training.seeding.derive_substream and re-seeded on :meth:reset. The encoder is currently fully deterministic from input; the RNG is held for future randomised-encoding features (e.g. dropout on optional fields) without changing the public surface.

Parameters:

  • schema_version (int, default: SCHEMA_VERSION ) –

    Version stamp for emitted packets and the value decoders demand. Bumping it requires a new ADR (ADR-003 §"Schema evolution").

Source code in src/chamber/comm/fixed_format.py
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
class FixedFormatCommChannel:
    """Mandatory fixed-format channel (ADR-003 §Decision).

    On every :meth:`encode` call:
      1. The internal :class:`AoIClock` advances by one env tick (``+1.0``).
      2. Every uid present in ``state["pose"]`` / ``state["task_state"]`` is
         :meth:`AoIClock.mark_fresh`-ed (its AoI drops to zero).
      3. The result is bundled into a :class:`CommPacket` carrying
         :data:`~chamber.comm.api.SCHEMA_VERSION` and a snapshot of every
         registered uid's AoI.

    :meth:`decode` is the inverse projection: it returns the schema-typed
    subset of the state expressible via the packet (pose / task-state / AoI /
    learned overlay). A schema-version mismatch raises
    :class:`~chamber.comm.errors.ChamberCommError`.

    Determinism: an internal numpy ``Generator`` is seeded via
    :func:`concerto.training.seeding.derive_substream` and re-seeded on
    :meth:`reset`. The encoder is currently fully deterministic from input;
    the RNG is held for future randomised-encoding features (e.g. dropout on
    optional fields) without changing the public surface.

    Args:
        schema_version: Version stamp for emitted packets and the value
            decoders demand. Bumping it requires a new ADR
            (ADR-003 §"Schema evolution").
    """

    def __init__(self, *, schema_version: int = SCHEMA_VERSION) -> None:
        """Build a fixed-format channel (ADR-003 §Decision).

        Args:
            schema_version: Wire-format version stamp; defaults to the
                project-wide :data:`SCHEMA_VERSION`.
        """
        self._schema_version = schema_version
        self._aoi = AoIClock()
        self._root_seed: int = 0
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

    def reset(self, *, seed: int | None = None) -> None:
        """Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

        Args:
            seed: Optional new root seed for the channel substream (P6
                determinism). When provided, replaces the previously stored
                root seed; when omitted, the existing root seed is reused so
                episode resets remain reproducible without a fresh seed
                argument.
        """
        self._aoi.reset()
        if seed is not None:
            self._root_seed = seed
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

    def encode(self, state: object) -> CommPacket:
        """Encode env state into a :class:`CommPacket` (ADR-003 §Decision).

        Accepts any mapping with optional ``"pose"`` / ``"task_state"`` /
        ``"learned_overlay"`` keys; non-mapping inputs are treated as empty
        state (the channel simply ages every previously-registered uid).

        Args:
            state: Source state mapping. Recognised keys:

                - ``"pose"``: ``dict[str, Pose]`` — fresh pose per uid.
                - ``"task_state"``: ``dict[str, TaskStatePredicate]`` —
                  optional fresh predicate per uid.
                - ``"learned_overlay"``: optional ``torch.Tensor`` for
                  jointly-trained B6 partners; passed through unchanged.

        Returns:
            A :class:`CommPacket` carrying the current schema version, a
            shallow copy of pose / task-state, the AoI snapshot, and the
            opt-in learned overlay.
        """
        pose_in, task_state_in, learned = self._extract(state)
        self._aoi.tick(1.0)
        for uid in pose_in:
            self._aoi.mark_fresh(uid)
        for uid in task_state_in:
            self._aoi.mark_fresh(uid)
        return CommPacket(
            schema_version=self._schema_version,
            pose=dict(pose_in),
            task_state=dict(task_state_in),
            aoi=self._aoi.snapshot(),
            learned_overlay=learned,
        )

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Decode a :class:`CommPacket` back into the typed state subset (ADR-003 §Decision).

        Round-trip guarantee: ``decode(encode(state))`` reproduces every
        schema-typed field of ``state`` (pose / task-state / learned overlay)
        and adds the AoI snapshot the channel computed during encode.

        Args:
            packet: A packet previously emitted by :meth:`encode` (or another
                channel honouring the same :data:`SCHEMA_VERSION`).

        Returns:
            A dict with keys ``"pose"`` / ``"task_state"`` / ``"aoi"`` /
            ``"learned_overlay"`` (each a shallow copy where applicable).

        Raises:
            ChamberCommError: when ``packet["schema_version"]`` differs from
                this channel's schema version (ADR-003 §"Schema evolution").
        """
        if packet["schema_version"] != self._schema_version:
            msg = (
                f"comm packet schema_version mismatch: "
                f"channel expects {self._schema_version}, packet carries "
                f"{packet['schema_version']}"
            )
            raise ChamberCommError(msg)
        return {
            "pose": dict(packet["pose"]),
            "task_state": dict(packet["task_state"]),
            "aoi": dict(packet["aoi"]),
            "learned_overlay": packet["learned_overlay"],
        }

    @staticmethod
    def _extract(
        state: object,
    ) -> tuple[Mapping[str, Pose], Mapping[str, TaskStatePredicate], Any]:
        """Pluck the recognised sub-keys from a state-shaped mapping (ADR-003 §Decision)."""
        if not isinstance(state, Mapping):
            return {}, {}, None
        state_map = cast("Mapping[str, Any]", state)
        pose: Mapping[str, Pose] = state_map.get("pose") or {}
        task_state: Mapping[str, TaskStatePredicate] = state_map.get("task_state") or {}
        learned: Any = state_map.get("learned_overlay")
        return pose, task_state, learned

__init__

__init__(*, schema_version: int = SCHEMA_VERSION) -> None

Build a fixed-format channel (ADR-003 §Decision).

Parameters:

  • schema_version (int, default: SCHEMA_VERSION ) –

    Wire-format version stamp; defaults to the project-wide :data:SCHEMA_VERSION.

Source code in src/chamber/comm/fixed_format.py
57
58
59
60
61
62
63
64
65
66
67
def __init__(self, *, schema_version: int = SCHEMA_VERSION) -> None:
    """Build a fixed-format channel (ADR-003 §Decision).

    Args:
        schema_version: Wire-format version stamp; defaults to the
            project-wide :data:`SCHEMA_VERSION`.
    """
    self._schema_version = schema_version
    self._aoi = AoIClock()
    self._root_seed: int = 0
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

decode

decode(packet: CommPacket) -> dict[str, object]

Decode a :class:CommPacket back into the typed state subset (ADR-003 §Decision).

Round-trip guarantee: decode(encode(state)) reproduces every schema-typed field of state (pose / task-state / learned overlay) and adds the AoI snapshot the channel computed during encode.

Parameters:

  • packet (CommPacket) –

    A packet previously emitted by :meth:encode (or another channel honouring the same :data:SCHEMA_VERSION).

Returns:

  • dict[str, object]

    A dict with keys "pose" / "task_state" / "aoi" /

  • dict[str, object]

    "learned_overlay" (each a shallow copy where applicable).

Raises:

  • ChamberCommError

    when packet["schema_version"] differs from this channel's schema version (ADR-003 §"Schema evolution").

Source code in src/chamber/comm/fixed_format.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Decode a :class:`CommPacket` back into the typed state subset (ADR-003 §Decision).

    Round-trip guarantee: ``decode(encode(state))`` reproduces every
    schema-typed field of ``state`` (pose / task-state / learned overlay)
    and adds the AoI snapshot the channel computed during encode.

    Args:
        packet: A packet previously emitted by :meth:`encode` (or another
            channel honouring the same :data:`SCHEMA_VERSION`).

    Returns:
        A dict with keys ``"pose"`` / ``"task_state"`` / ``"aoi"`` /
        ``"learned_overlay"`` (each a shallow copy where applicable).

    Raises:
        ChamberCommError: when ``packet["schema_version"]`` differs from
            this channel's schema version (ADR-003 §"Schema evolution").
    """
    if packet["schema_version"] != self._schema_version:
        msg = (
            f"comm packet schema_version mismatch: "
            f"channel expects {self._schema_version}, packet carries "
            f"{packet['schema_version']}"
        )
        raise ChamberCommError(msg)
    return {
        "pose": dict(packet["pose"]),
        "task_state": dict(packet["task_state"]),
        "aoi": dict(packet["aoi"]),
        "learned_overlay": packet["learned_overlay"],
    }

encode

encode(state: object) -> CommPacket

Encode env state into a :class:CommPacket (ADR-003 §Decision).

Accepts any mapping with optional "pose" / "task_state" / "learned_overlay" keys; non-mapping inputs are treated as empty state (the channel simply ages every previously-registered uid).

Parameters:

  • state (object) –

    Source state mapping. Recognised keys:

    • "pose": dict[str, Pose] — fresh pose per uid.
    • "task_state": dict[str, TaskStatePredicate] — optional fresh predicate per uid.
    • "learned_overlay": optional torch.Tensor for jointly-trained B6 partners; passed through unchanged.

Returns:

  • A ( CommPacket ) –

    class:CommPacket carrying the current schema version, a

  • CommPacket

    shallow copy of pose / task-state, the AoI snapshot, and the

  • CommPacket

    opt-in learned overlay.

Source code in src/chamber/comm/fixed_format.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def encode(self, state: object) -> CommPacket:
    """Encode env state into a :class:`CommPacket` (ADR-003 §Decision).

    Accepts any mapping with optional ``"pose"`` / ``"task_state"`` /
    ``"learned_overlay"`` keys; non-mapping inputs are treated as empty
    state (the channel simply ages every previously-registered uid).

    Args:
        state: Source state mapping. Recognised keys:

            - ``"pose"``: ``dict[str, Pose]`` — fresh pose per uid.
            - ``"task_state"``: ``dict[str, TaskStatePredicate]`` —
              optional fresh predicate per uid.
            - ``"learned_overlay"``: optional ``torch.Tensor`` for
              jointly-trained B6 partners; passed through unchanged.

    Returns:
        A :class:`CommPacket` carrying the current schema version, a
        shallow copy of pose / task-state, the AoI snapshot, and the
        opt-in learned overlay.
    """
    pose_in, task_state_in, learned = self._extract(state)
    self._aoi.tick(1.0)
    for uid in pose_in:
        self._aoi.mark_fresh(uid)
    for uid in task_state_in:
        self._aoi.mark_fresh(uid)
    return CommPacket(
        schema_version=self._schema_version,
        pose=dict(pose_in),
        task_state=dict(task_state_in),
        aoi=self._aoi.snapshot(),
        learned_overlay=learned,
    )

reset

reset(*, seed: int | None = None) -> None

Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional new root seed for the channel substream (P6 determinism). When provided, replaces the previously stored root seed; when omitted, the existing root seed is reused so episode resets remain reproducible without a fresh seed argument.

Source code in src/chamber/comm/fixed_format.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def reset(self, *, seed: int | None = None) -> None:
    """Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

    Args:
        seed: Optional new root seed for the channel substream (P6
            determinism). When provided, replaces the previously stored
            root seed; when omitted, the existing root seed is reused so
            episode resets remain reproducible without a fresh seed
            argument.
    """
    self._aoi.reset()
    if seed is not None:
        self._root_seed = seed
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

LearnedOverlay

Holds an optional learned-message tensor (ADR-003 §Decision).

The class is a thin slot wrapper: encode returns whatever tensor a jointly-trained partner has set (or None), and decode is null-safe so black-box consumers can call it unconditionally.

Parameters:

  • tensor (Tensor | None, default: None ) –

    Initial overlay value. Defaults to None so the slot is interoperable with the black-box partner contract out of the box.

Source code in src/chamber/comm/learned_overlay.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class LearnedOverlay:
    """Holds an optional learned-message tensor (ADR-003 §Decision).

    The class is a thin slot wrapper: ``encode`` returns whatever tensor a
    jointly-trained partner has set (or ``None``), and ``decode`` is
    null-safe so black-box consumers can call it unconditionally.

    Args:
        tensor: Initial overlay value. Defaults to ``None`` so the slot is
            interoperable with the black-box partner contract out of the box.
    """

    def __init__(self, tensor: torch.Tensor | None = None) -> None:
        """Build an overlay slot, optionally pre-populated (ADR-003 §Decision)."""
        self._tensor: torch.Tensor | None = tensor

    @property
    def tensor(self) -> torch.Tensor | None:
        """Return the currently held tensor, or ``None`` (ADR-003 §Decision)."""
        return self._tensor

    def set(self, tensor: torch.Tensor | None) -> None:
        """Replace the held tensor (ADR-003 §Decision).

        Jointly-trained partners call this from their forward pass; black-box
        partners never invoke it (or invoke it with ``None``), preserving the
        null-safety guarantee.
        """
        self._tensor = tensor

    def reset(self) -> None:
        """Clear the overlay (ADR-003 §Decision; episode boundary).

        Called by the channel on env reset so cross-episode message tensors
        do not leak across episode boundaries.
        """
        self._tensor = None

    def encode(self) -> torch.Tensor | None:
        """No-op encoder: returns the held tensor (ADR-003 §Decision).

        Phase-1 will replace this with the message GNN forward pass; for
        Phase-0 the slot semantics are sufficient for the
        :class:`~chamber.comm.api.CommPacket` ``learned_overlay`` field.
        """
        return self._tensor

    @staticmethod
    def decode(packet_overlay: torch.Tensor | None) -> torch.Tensor | None:
        """Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

        Black-box partners receive a packet whose ``learned_overlay`` is
        ``None``; jointly-trained partners receive the producer's tensor
        unchanged. Either path returns without raising.
        """
        return packet_overlay

tensor property

tensor: Tensor | None

Return the currently held tensor, or None (ADR-003 §Decision).

__init__

__init__(tensor: Tensor | None = None) -> None

Build an overlay slot, optionally pre-populated (ADR-003 §Decision).

Source code in src/chamber/comm/learned_overlay.py
34
35
36
def __init__(self, tensor: torch.Tensor | None = None) -> None:
    """Build an overlay slot, optionally pre-populated (ADR-003 §Decision)."""
    self._tensor: torch.Tensor | None = tensor

decode staticmethod

decode(
    packet_overlay: Tensor | None,
) -> torch.Tensor | None

Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

Black-box partners receive a packet whose learned_overlay is None; jointly-trained partners receive the producer's tensor unchanged. Either path returns without raising.

Source code in src/chamber/comm/learned_overlay.py
69
70
71
72
73
74
75
76
77
@staticmethod
def decode(packet_overlay: torch.Tensor | None) -> torch.Tensor | None:
    """Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

    Black-box partners receive a packet whose ``learned_overlay`` is
    ``None``; jointly-trained partners receive the producer's tensor
    unchanged. Either path returns without raising.
    """
    return packet_overlay

encode

encode() -> torch.Tensor | None

No-op encoder: returns the held tensor (ADR-003 §Decision).

Phase-1 will replace this with the message GNN forward pass; for Phase-0 the slot semantics are sufficient for the :class:~chamber.comm.api.CommPacket learned_overlay field.

Source code in src/chamber/comm/learned_overlay.py
60
61
62
63
64
65
66
67
def encode(self) -> torch.Tensor | None:
    """No-op encoder: returns the held tensor (ADR-003 §Decision).

    Phase-1 will replace this with the message GNN forward pass; for
    Phase-0 the slot semantics are sufficient for the
    :class:`~chamber.comm.api.CommPacket` ``learned_overlay`` field.
    """
    return self._tensor

reset

reset() -> None

Clear the overlay (ADR-003 §Decision; episode boundary).

Called by the channel on env reset so cross-episode message tensors do not leak across episode boundaries.

Source code in src/chamber/comm/learned_overlay.py
52
53
54
55
56
57
58
def reset(self) -> None:
    """Clear the overlay (ADR-003 §Decision; episode boundary).

    Called by the channel on env reset so cross-episode message tensors
    do not leak across episode boundaries.
    """
    self._tensor = None

set

set(tensor: Tensor | None) -> None

Replace the held tensor (ADR-003 §Decision).

Jointly-trained partners call this from their forward pass; black-box partners never invoke it (or invoke it with None), preserving the null-safety guarantee.

Source code in src/chamber/comm/learned_overlay.py
43
44
45
46
47
48
49
50
def set(self, tensor: torch.Tensor | None) -> None:
    """Replace the held tensor (ADR-003 §Decision).

    Jointly-trained partners call this from their forward pass; black-box
    partners never invoke it (or invoke it with ``None``), preserving the
    null-safety guarantee.
    """
    self._tensor = tensor

Pose

Bases: TypedDict

SE(3) pose in the env's base frame (ADR-003 §Decision).

The fixed-format channel ships pose as the minimum every black-box partner can produce. Quaternion convention is wxyz (real part first), matching SAPIEN / ManiSkill v3.

Source code in src/chamber/comm/api.py
34
35
36
37
38
39
40
41
42
43
class Pose(TypedDict):
    """SE(3) pose in the env's base frame (ADR-003 §Decision).

    The fixed-format channel ships pose as the minimum every black-box
    partner can produce. Quaternion convention is wxyz (real part first),
    matching SAPIEN / ManiSkill v3.
    """

    xyz: tuple[float, float, float]
    quat_wxyz: tuple[float, float, float, float]

TaskStatePredicate

Bases: TypedDict

Typed task-state predicates for richer coordination (ADR-003 §Decision).

The fixed-format extension field; all keys are optional so partners that do not produce them remain interoperable. Adding a new key is a schema bump.

Source code in src/chamber/comm/api.py
46
47
48
49
50
51
52
53
54
55
56
class TaskStatePredicate(TypedDict, total=False):
    """Typed task-state predicates for richer coordination (ADR-003 §Decision).

    The fixed-format extension field; all keys are optional so partners that
    do not produce them remain interoperable. Adding a new key is a
    schema bump.
    """

    grasp_side: str
    object_class: str
    contact_force: float

saturation_guard

saturation_guard(
    profile: DegradationProfile,
    qp_solve_fn: Callable[
        ..., tuple[float, float]
    ] = solve_qp_stub,
    *,
    qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None

Check the inner CBF QP solver's headroom under profile (ADR-006 §Risks R5).

Two conditions trigger :class:~chamber.comm.errors.ChamberCommQPSaturationWarning:

  1. The measured QP solve time exceeds qp_time_budget_ms (M3 enforces this on the real solver; the M2 stub returns immediately).
  2. The profile is in the saturation regime per ADR-006 R5 — drop rate >= 10 % or latency mean >= 100 ms — even when timing is fine. The regime check exists so the test exists from M2 and M3 cannot regress it.

Parameters:

  • profile (DegradationProfile) –

    The :class:DegradationProfile whose feasibility against the inner CBF QP we are testing.

  • qp_solve_fn (Callable[..., tuple[float, float]], default: solve_qp_stub ) –

    The QP solver under test. Defaults to :func:concerto.safety.solve_qp_stub; M3 swaps in the real solver. Must return a (decision, elapsed_seconds) pair.

  • qp_time_budget_ms (float, default: _QP_TIME_BUDGET_MS ) –

    Solve-time budget. Defaults to the OSCBF target from ADR-004 (1 ms).

Source code in src/chamber/comm/degradation.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
def saturation_guard(
    profile: DegradationProfile,
    qp_solve_fn: Callable[..., tuple[float, float]] = solve_qp_stub,
    *,
    qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None:
    """Check the inner CBF QP solver's headroom under ``profile`` (ADR-006 §Risks R5).

    Two conditions trigger :class:`~chamber.comm.errors.ChamberCommQPSaturationWarning`:

      1. The measured QP solve time exceeds ``qp_time_budget_ms`` (M3 enforces
         this on the real solver; the M2 stub returns immediately).
      2. The profile is in the saturation regime per ADR-006 R5 — drop rate
         >= 10 % or latency mean >= 100 ms — even when timing is fine. The
         regime check exists so the test exists from M2 and M3 cannot regress
         it.

    Args:
        profile: The :class:`DegradationProfile` whose feasibility against the
            inner CBF QP we are testing.
        qp_solve_fn: The QP solver under test. Defaults to
            :func:`concerto.safety.solve_qp_stub`; M3 swaps in the real
            solver. Must return a ``(decision, elapsed_seconds)`` pair.
        qp_time_budget_ms: Solve-time budget. Defaults to the OSCBF target
            from ADR-004 (1 ms).
    """
    start = time.perf_counter()
    qp_solve_fn()
    elapsed_ms = (time.perf_counter() - start) * 1000.0
    timing_saturated = elapsed_ms > qp_time_budget_ms
    regime_saturated = _profile_in_saturation_regime(profile)
    if not (timing_saturated or regime_saturated):
        return
    reasons: list[str] = []
    if timing_saturated:
        reasons.append(f"QP solve {elapsed_ms:.4f} ms exceeds {qp_time_budget_ms:.4f} ms budget")
    if regime_saturated:
        reasons.append(
            "profile in ADR-006 R5 saturation regime "
            f"(drop_rate={profile.drop_rate}, latency_mean_ms={profile.latency_mean_ms})"
        )
    warnings.warn(
        "ChamberCommQPSaturationWarning: " + "; ".join(reasons),
        ChamberCommQPSaturationWarning,
        stacklevel=2,
    )

Public API for the CHAMBER comm stack — Protocol + packet shape + schema.

ADR-003 §Decision: defines the canonical wire format of the mandatory fixed-format channel (pose per uid, task-state predicate per uid, AoI per uid) plus an opt-in learned-overlay tensor field, and the :class:CommChannel Protocol that every concrete channel implements.

ADR-006 §Decision: schema versioning protects the URLLC-anchored numeric bounds against silent drift; bumping :data:SCHEMA_VERSION requires a new ADR.

The Protocol lives on the benchmark side (this module) rather than under concerto.api because its signatures reference :class:CommPacket, which is the benchmark's wire format. The dependency-direction rule (plan/10 §2) forbids concerto.* from importing chamber.*; placing the Protocol here keeps the rule intact.

CommChannel

Bases: Protocol

Contract for an inter-agent communication channel (ADR-003 §Decision).

Implementations encode the env-side state into a :class:CommPacket and decode it back into the dict subset that downstream consumers (safety filter M3, partner stack M4, evaluation harness M5) expect under obs["comm"]. ADR-006 §Decision: implementations honour the URLLC-anchored latency / jitter / drop bounds when composed with :class:chamber.comm.degradation.CommDegradationWrapper.

Source code in src/chamber/comm/api.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
@runtime_checkable
class CommChannel(Protocol):
    """Contract for an inter-agent communication channel (ADR-003 §Decision).

    Implementations encode the env-side state into a :class:`CommPacket` and
    decode it back into the dict subset that downstream consumers (safety
    filter M3, partner stack M4, evaluation harness M5) expect under
    ``obs["comm"]``. ADR-006 §Decision: implementations honour the
    URLLC-anchored latency / jitter / drop bounds when composed with
    :class:`chamber.comm.degradation.CommDegradationWrapper`.
    """

    def reset(self, *, seed: int | None = None) -> None:
        """Reset channel state at episode start (ADR-003 §Decision).

        Args:
            seed: Optional root seed for the channel's deterministic RNG
                substream (P6 reproducibility). Implementations route this
                through ``concerto.training.seeding.derive_substream``.
        """
        ...

    def encode(self, state: object) -> CommPacket:
        """Encode env state into a comm-channel packet (ADR-003 §Decision).

        Args:
            state: Env-side observation or raw physics state (impl-defined).

        Returns:
            A :class:`CommPacket` honouring :data:`SCHEMA_VERSION`.
        """
        ...

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

        Args:
            packet: A :class:`CommPacket` previously produced by :meth:`encode`.

        Returns:
            The schema-typed subset of the env state expressible via the
            packet (pose / task-state predicate / AoI), keyed by uid.
        """
        ...

decode

decode(packet: CommPacket) -> dict[str, object]

Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

Parameters:

  • packet (CommPacket) –

    A :class:CommPacket previously produced by :meth:encode.

Returns:

  • dict[str, object]

    The schema-typed subset of the env state expressible via the

  • dict[str, object]

    packet (pose / task-state predicate / AoI), keyed by uid.

Source code in src/chamber/comm/api.py
113
114
115
116
117
118
119
120
121
122
123
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Decode a comm-channel packet back into the typed state subset (ADR-003 §Decision).

    Args:
        packet: A :class:`CommPacket` previously produced by :meth:`encode`.

    Returns:
        The schema-typed subset of the env state expressible via the
        packet (pose / task-state predicate / AoI), keyed by uid.
    """
    ...

encode

encode(state: object) -> CommPacket

Encode env state into a comm-channel packet (ADR-003 §Decision).

Parameters:

  • state (object) –

    Env-side observation or raw physics state (impl-defined).

Returns:

  • A ( CommPacket ) –

    class:CommPacket honouring :data:SCHEMA_VERSION.

Source code in src/chamber/comm/api.py
102
103
104
105
106
107
108
109
110
111
def encode(self, state: object) -> CommPacket:
    """Encode env state into a comm-channel packet (ADR-003 §Decision).

    Args:
        state: Env-side observation or raw physics state (impl-defined).

    Returns:
        A :class:`CommPacket` honouring :data:`SCHEMA_VERSION`.
    """
    ...

reset

reset(*, seed: int | None = None) -> None

Reset channel state at episode start (ADR-003 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional root seed for the channel's deterministic RNG substream (P6 reproducibility). Implementations route this through concerto.training.seeding.derive_substream.

Source code in src/chamber/comm/api.py
 92
 93
 94
 95
 96
 97
 98
 99
100
def reset(self, *, seed: int | None = None) -> None:
    """Reset channel state at episode start (ADR-003 §Decision).

    Args:
        seed: Optional root seed for the channel's deterministic RNG
            substream (P6 reproducibility). Implementations route this
            through ``concerto.training.seeding.derive_substream``.
    """
    ...

CommPacket

Bases: TypedDict

The mandatory fixed-format wire packet (ADR-003 §Decision).

Every concrete :class:CommChannel produces and consumes packets of exactly this shape. Downstream:

  • The safety filter (M3) reads packet["pose"][uid] and packet["aoi"][uid] deterministically.
  • The partner stack (M4) writes packet["pose"][uid] = self.last_pose.
  • The evaluation harness (M5) logs from the same surface.

ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.

Source code in src/chamber/comm/api.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class CommPacket(TypedDict):
    """The mandatory fixed-format wire packet (ADR-003 §Decision).

    Every concrete :class:`CommChannel` produces and consumes packets of
    exactly this shape. Downstream:

    - The safety filter (M3) reads ``packet["pose"][uid]`` and
      ``packet["aoi"][uid]`` deterministically.
    - The partner stack (M4) writes ``packet["pose"][uid] = self.last_pose``.
    - The evaluation harness (M5) logs from the same surface.

    ADR-006 §Decision: AoI is the URLLC-anchored latency proxy.
    """

    schema_version: int
    pose: dict[str, Pose]
    task_state: dict[str, TaskStatePredicate]
    aoi: dict[str, float]
    learned_overlay: torch.Tensor | None

Pose

Bases: TypedDict

SE(3) pose in the env's base frame (ADR-003 §Decision).

The fixed-format channel ships pose as the minimum every black-box partner can produce. Quaternion convention is wxyz (real part first), matching SAPIEN / ManiSkill v3.

Source code in src/chamber/comm/api.py
34
35
36
37
38
39
40
41
42
43
class Pose(TypedDict):
    """SE(3) pose in the env's base frame (ADR-003 §Decision).

    The fixed-format channel ships pose as the minimum every black-box
    partner can produce. Quaternion convention is wxyz (real part first),
    matching SAPIEN / ManiSkill v3.
    """

    xyz: tuple[float, float, float]
    quat_wxyz: tuple[float, float, float, float]

TaskStatePredicate

Bases: TypedDict

Typed task-state predicates for richer coordination (ADR-003 §Decision).

The fixed-format extension field; all keys are optional so partners that do not produce them remain interoperable. Adding a new key is a schema bump.

Source code in src/chamber/comm/api.py
46
47
48
49
50
51
52
53
54
55
56
class TaskStatePredicate(TypedDict, total=False):
    """Typed task-state predicates for richer coordination (ADR-003 §Decision).

    The fixed-format extension field; all keys are optional so partners that
    do not produce them remain interoperable. Adding a new key is a
    schema bump.
    """

    grasp_side: str
    object_class: str
    contact_force: float

Per-uid Age-of-Information clock (T2.2).

ADR-003 §Decision: AoI is the latency proxy carried in every :class:~chamber.comm.api.CommPacket. ADR-006 §Decision: AoI is the URLLC-anchored latency-axis instrumentation for the ADR-007 Stage-2 CM spike.

Semantics follow notes/tier2/41 (Ballotta & Talak 2024): for each uid, AoI is the elapsed simulation time since the last fresh sample arrived. mark_fresh(uid) resets that uid to zero; tick(dt) advances every registered uid by dt. The clock holds no env-side dependency — wall time is supplied by the caller (typically the env tick) so the class remains pure-Python and trivially deterministic (P6).

AoIClock

Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).

Usage::

clock = AoIClock()
clock.mark_fresh("panda")  # registers "panda" with AoI=0
clock.tick(0.01)  # +10 ms for every registered uid
clock.aoi("panda")  # -> 0.01

Uids are auto-registered on first :meth:mark_fresh. :meth:reset clears all registrations so a fresh episode begins from a clean slate.

Source code in src/chamber/comm/aoi.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
class AoIClock:
    """Per-uid Age-of-Information accounting (ADR-003 §Decision; ADR-006 §Decision).

    Usage::

        clock = AoIClock()
        clock.mark_fresh("panda")  # registers "panda" with AoI=0
        clock.tick(0.01)  # +10 ms for every registered uid
        clock.aoi("panda")  # -> 0.01

    Uids are auto-registered on first :meth:`mark_fresh`. :meth:`reset`
    clears all registrations so a fresh episode begins from a clean slate.
    """

    def __init__(self) -> None:
        """Create an empty clock with no uids registered (ADR-003 §Decision)."""
        self._aoi: dict[str, float] = {}

    def reset(self) -> None:
        """Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

        After ``reset``, :meth:`aoi` raises ``KeyError`` for every uid until
        the next :meth:`mark_fresh`. This matches the channel's
        ``CommChannel.reset`` semantics: a new episode starts with no state.
        """
        self._aoi.clear()

    def mark_fresh(self, uid: str) -> None:
        """Mark ``uid`` as having received a fresh sample (AoI -> 0).

        ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
        Auto-registers ``uid`` on first call.
        """
        self._aoi[uid] = 0.0

    def tick(self, dt: float) -> None:
        """Advance time by ``dt`` for every registered uid (ADR-003 §Decision).

        Args:
            dt: Elapsed simulation time in seconds. Must be non-negative;
                AoI cannot run backwards (notes/tier2/41 §V definition).

        Raises:
            ValueError: when ``dt`` is negative.
        """
        if dt < 0.0:
            msg = f"tick dt must be non-negative; got {dt!r}"
            raise ValueError(msg)
        for uid in self._aoi:
            self._aoi[uid] += dt

    def aoi(self, uid: str) -> float:
        """Return the current AoI for ``uid`` (ADR-003 §Decision).

        Raises:
            KeyError: if ``uid`` is not registered (loud-fail per ADR-003).
        """
        return self._aoi[uid]

    def snapshot(self) -> dict[str, float]:
        """Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

        The returned dict is decoupled from the clock's internal state, so
        callers may freely mutate it (typical use: pack into the next
        ``CommPacket["aoi"]`` field).
        """
        return dict(self._aoi)

__init__

__init__() -> None

Create an empty clock with no uids registered (ADR-003 §Decision).

Source code in src/chamber/comm/aoi.py
34
35
36
def __init__(self) -> None:
    """Create an empty clock with no uids registered (ADR-003 §Decision)."""
    self._aoi: dict[str, float] = {}

aoi

aoi(uid: str) -> float

Return the current AoI for uid (ADR-003 §Decision).

Raises:

  • KeyError

    if uid is not registered (loud-fail per ADR-003).

Source code in src/chamber/comm/aoi.py
71
72
73
74
75
76
77
def aoi(self, uid: str) -> float:
    """Return the current AoI for ``uid`` (ADR-003 §Decision).

    Raises:
        KeyError: if ``uid`` is not registered (loud-fail per ADR-003).
    """
    return self._aoi[uid]

mark_fresh

mark_fresh(uid: str) -> None

Mark uid as having received a fresh sample (AoI -> 0).

ADR-003 §Decision: every fresh transmission zeroes that uid's AoI. Auto-registers uid on first call.

Source code in src/chamber/comm/aoi.py
47
48
49
50
51
52
53
def mark_fresh(self, uid: str) -> None:
    """Mark ``uid`` as having received a fresh sample (AoI -> 0).

    ADR-003 §Decision: every fresh transmission zeroes that uid's AoI.
    Auto-registers ``uid`` on first call.
    """
    self._aoi[uid] = 0.0

reset

reset() -> None

Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

After reset, :meth:aoi raises KeyError for every uid until the next :meth:mark_fresh. This matches the channel's CommChannel.reset semantics: a new episode starts with no state.

Source code in src/chamber/comm/aoi.py
38
39
40
41
42
43
44
45
def reset(self) -> None:
    """Clear every registered uid (ADR-003 §Decision; T2.2 acceptance criterion).

    After ``reset``, :meth:`aoi` raises ``KeyError`` for every uid until
    the next :meth:`mark_fresh`. This matches the channel's
    ``CommChannel.reset`` semantics: a new episode starts with no state.
    """
    self._aoi.clear()

snapshot

snapshot() -> dict[str, float]

Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

The returned dict is decoupled from the clock's internal state, so callers may freely mutate it (typical use: pack into the next CommPacket["aoi"] field).

Source code in src/chamber/comm/aoi.py
79
80
81
82
83
84
85
86
def snapshot(self) -> dict[str, float]:
    """Return an independent copy of the per-uid AoI map (ADR-003 §Decision).

    The returned dict is decoupled from the clock's internal state, so
    callers may freely mutate it (typical use: pack into the next
    ``CommPacket["aoi"]`` field).
    """
    return dict(self._aoi)

tick

tick(dt: float) -> None

Advance time by dt for every registered uid (ADR-003 §Decision).

Parameters:

  • dt (float) –

    Elapsed simulation time in seconds. Must be non-negative; AoI cannot run backwards (notes/tier2/41 §V definition).

Raises:

  • ValueError

    when dt is negative.

Source code in src/chamber/comm/aoi.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def tick(self, dt: float) -> None:
    """Advance time by ``dt`` for every registered uid (ADR-003 §Decision).

    Args:
        dt: Elapsed simulation time in seconds. Must be non-negative;
            AoI cannot run backwards (notes/tier2/41 §V definition).

    Raises:
        ValueError: when ``dt`` is negative.
    """
    if dt < 0.0:
        msg = f"tick dt must be non-negative; got {dt!r}"
        raise ValueError(msg)
    for uid in self._aoi:
        self._aoi[uid] += dt

Fixed-format communication channel — encoder + decoder (T2.3).

ADR-003 §Decision: this channel is the mandatory baseline every black-box partner can speak. It encodes pose + task-state predicates + AoI per uid into a schema-versioned :class:~chamber.comm.api.CommPacket, with an opt-in learned_overlay slot that jointly-trained partners may fill.

ADR-006 §Decision: the channel itself ships no degradation. URLLC-anchored latency / jitter / drop is composed externally via :class:chamber.comm.degradation.CommDegradationWrapper (T2.5) so that the encoding logic stays pure, deterministic, and trivially testable.

FixedFormatCommChannel

Mandatory fixed-format channel (ADR-003 §Decision).

On every :meth:encode call: 1. The internal :class:AoIClock advances by one env tick (+1.0). 2. Every uid present in state["pose"] / state["task_state"] is :meth:AoIClock.mark_fresh-ed (its AoI drops to zero). 3. The result is bundled into a :class:CommPacket carrying :data:~chamber.comm.api.SCHEMA_VERSION and a snapshot of every registered uid's AoI.

:meth:decode is the inverse projection: it returns the schema-typed subset of the state expressible via the packet (pose / task-state / AoI / learned overlay). A schema-version mismatch raises :class:~chamber.comm.errors.ChamberCommError.

Determinism: an internal numpy Generator is seeded via :func:concerto.training.seeding.derive_substream and re-seeded on :meth:reset. The encoder is currently fully deterministic from input; the RNG is held for future randomised-encoding features (e.g. dropout on optional fields) without changing the public surface.

Parameters:

  • schema_version (int, default: SCHEMA_VERSION ) –

    Version stamp for emitted packets and the value decoders demand. Bumping it requires a new ADR (ADR-003 §"Schema evolution").

Source code in src/chamber/comm/fixed_format.py
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
class FixedFormatCommChannel:
    """Mandatory fixed-format channel (ADR-003 §Decision).

    On every :meth:`encode` call:
      1. The internal :class:`AoIClock` advances by one env tick (``+1.0``).
      2. Every uid present in ``state["pose"]`` / ``state["task_state"]`` is
         :meth:`AoIClock.mark_fresh`-ed (its AoI drops to zero).
      3. The result is bundled into a :class:`CommPacket` carrying
         :data:`~chamber.comm.api.SCHEMA_VERSION` and a snapshot of every
         registered uid's AoI.

    :meth:`decode` is the inverse projection: it returns the schema-typed
    subset of the state expressible via the packet (pose / task-state / AoI /
    learned overlay). A schema-version mismatch raises
    :class:`~chamber.comm.errors.ChamberCommError`.

    Determinism: an internal numpy ``Generator`` is seeded via
    :func:`concerto.training.seeding.derive_substream` and re-seeded on
    :meth:`reset`. The encoder is currently fully deterministic from input;
    the RNG is held for future randomised-encoding features (e.g. dropout on
    optional fields) without changing the public surface.

    Args:
        schema_version: Version stamp for emitted packets and the value
            decoders demand. Bumping it requires a new ADR
            (ADR-003 §"Schema evolution").
    """

    def __init__(self, *, schema_version: int = SCHEMA_VERSION) -> None:
        """Build a fixed-format channel (ADR-003 §Decision).

        Args:
            schema_version: Wire-format version stamp; defaults to the
                project-wide :data:`SCHEMA_VERSION`.
        """
        self._schema_version = schema_version
        self._aoi = AoIClock()
        self._root_seed: int = 0
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

    def reset(self, *, seed: int | None = None) -> None:
        """Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

        Args:
            seed: Optional new root seed for the channel substream (P6
                determinism). When provided, replaces the previously stored
                root seed; when omitted, the existing root seed is reused so
                episode resets remain reproducible without a fresh seed
                argument.
        """
        self._aoi.reset()
        if seed is not None:
            self._root_seed = seed
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

    def encode(self, state: object) -> CommPacket:
        """Encode env state into a :class:`CommPacket` (ADR-003 §Decision).

        Accepts any mapping with optional ``"pose"`` / ``"task_state"`` /
        ``"learned_overlay"`` keys; non-mapping inputs are treated as empty
        state (the channel simply ages every previously-registered uid).

        Args:
            state: Source state mapping. Recognised keys:

                - ``"pose"``: ``dict[str, Pose]`` — fresh pose per uid.
                - ``"task_state"``: ``dict[str, TaskStatePredicate]`` —
                  optional fresh predicate per uid.
                - ``"learned_overlay"``: optional ``torch.Tensor`` for
                  jointly-trained B6 partners; passed through unchanged.

        Returns:
            A :class:`CommPacket` carrying the current schema version, a
            shallow copy of pose / task-state, the AoI snapshot, and the
            opt-in learned overlay.
        """
        pose_in, task_state_in, learned = self._extract(state)
        self._aoi.tick(1.0)
        for uid in pose_in:
            self._aoi.mark_fresh(uid)
        for uid in task_state_in:
            self._aoi.mark_fresh(uid)
        return CommPacket(
            schema_version=self._schema_version,
            pose=dict(pose_in),
            task_state=dict(task_state_in),
            aoi=self._aoi.snapshot(),
            learned_overlay=learned,
        )

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Decode a :class:`CommPacket` back into the typed state subset (ADR-003 §Decision).

        Round-trip guarantee: ``decode(encode(state))`` reproduces every
        schema-typed field of ``state`` (pose / task-state / learned overlay)
        and adds the AoI snapshot the channel computed during encode.

        Args:
            packet: A packet previously emitted by :meth:`encode` (or another
                channel honouring the same :data:`SCHEMA_VERSION`).

        Returns:
            A dict with keys ``"pose"`` / ``"task_state"`` / ``"aoi"`` /
            ``"learned_overlay"`` (each a shallow copy where applicable).

        Raises:
            ChamberCommError: when ``packet["schema_version"]`` differs from
                this channel's schema version (ADR-003 §"Schema evolution").
        """
        if packet["schema_version"] != self._schema_version:
            msg = (
                f"comm packet schema_version mismatch: "
                f"channel expects {self._schema_version}, packet carries "
                f"{packet['schema_version']}"
            )
            raise ChamberCommError(msg)
        return {
            "pose": dict(packet["pose"]),
            "task_state": dict(packet["task_state"]),
            "aoi": dict(packet["aoi"]),
            "learned_overlay": packet["learned_overlay"],
        }

    @staticmethod
    def _extract(
        state: object,
    ) -> tuple[Mapping[str, Pose], Mapping[str, TaskStatePredicate], Any]:
        """Pluck the recognised sub-keys from a state-shaped mapping (ADR-003 §Decision)."""
        if not isinstance(state, Mapping):
            return {}, {}, None
        state_map = cast("Mapping[str, Any]", state)
        pose: Mapping[str, Pose] = state_map.get("pose") or {}
        task_state: Mapping[str, TaskStatePredicate] = state_map.get("task_state") or {}
        learned: Any = state_map.get("learned_overlay")
        return pose, task_state, learned

__init__

__init__(*, schema_version: int = SCHEMA_VERSION) -> None

Build a fixed-format channel (ADR-003 §Decision).

Parameters:

  • schema_version (int, default: SCHEMA_VERSION ) –

    Wire-format version stamp; defaults to the project-wide :data:SCHEMA_VERSION.

Source code in src/chamber/comm/fixed_format.py
57
58
59
60
61
62
63
64
65
66
67
def __init__(self, *, schema_version: int = SCHEMA_VERSION) -> None:
    """Build a fixed-format channel (ADR-003 §Decision).

    Args:
        schema_version: Wire-format version stamp; defaults to the
            project-wide :data:`SCHEMA_VERSION`.
    """
    self._schema_version = schema_version
    self._aoi = AoIClock()
    self._root_seed: int = 0
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

decode

decode(packet: CommPacket) -> dict[str, object]

Decode a :class:CommPacket back into the typed state subset (ADR-003 §Decision).

Round-trip guarantee: decode(encode(state)) reproduces every schema-typed field of state (pose / task-state / learned overlay) and adds the AoI snapshot the channel computed during encode.

Parameters:

  • packet (CommPacket) –

    A packet previously emitted by :meth:encode (or another channel honouring the same :data:SCHEMA_VERSION).

Returns:

  • dict[str, object]

    A dict with keys "pose" / "task_state" / "aoi" /

  • dict[str, object]

    "learned_overlay" (each a shallow copy where applicable).

Raises:

  • ChamberCommError

    when packet["schema_version"] differs from this channel's schema version (ADR-003 §"Schema evolution").

Source code in src/chamber/comm/fixed_format.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Decode a :class:`CommPacket` back into the typed state subset (ADR-003 §Decision).

    Round-trip guarantee: ``decode(encode(state))`` reproduces every
    schema-typed field of ``state`` (pose / task-state / learned overlay)
    and adds the AoI snapshot the channel computed during encode.

    Args:
        packet: A packet previously emitted by :meth:`encode` (or another
            channel honouring the same :data:`SCHEMA_VERSION`).

    Returns:
        A dict with keys ``"pose"`` / ``"task_state"`` / ``"aoi"`` /
        ``"learned_overlay"`` (each a shallow copy where applicable).

    Raises:
        ChamberCommError: when ``packet["schema_version"]`` differs from
            this channel's schema version (ADR-003 §"Schema evolution").
    """
    if packet["schema_version"] != self._schema_version:
        msg = (
            f"comm packet schema_version mismatch: "
            f"channel expects {self._schema_version}, packet carries "
            f"{packet['schema_version']}"
        )
        raise ChamberCommError(msg)
    return {
        "pose": dict(packet["pose"]),
        "task_state": dict(packet["task_state"]),
        "aoi": dict(packet["aoi"]),
        "learned_overlay": packet["learned_overlay"],
    }

encode

encode(state: object) -> CommPacket

Encode env state into a :class:CommPacket (ADR-003 §Decision).

Accepts any mapping with optional "pose" / "task_state" / "learned_overlay" keys; non-mapping inputs are treated as empty state (the channel simply ages every previously-registered uid).

Parameters:

  • state (object) –

    Source state mapping. Recognised keys:

    • "pose": dict[str, Pose] — fresh pose per uid.
    • "task_state": dict[str, TaskStatePredicate] — optional fresh predicate per uid.
    • "learned_overlay": optional torch.Tensor for jointly-trained B6 partners; passed through unchanged.

Returns:

  • A ( CommPacket ) –

    class:CommPacket carrying the current schema version, a

  • CommPacket

    shallow copy of pose / task-state, the AoI snapshot, and the

  • CommPacket

    opt-in learned overlay.

Source code in src/chamber/comm/fixed_format.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def encode(self, state: object) -> CommPacket:
    """Encode env state into a :class:`CommPacket` (ADR-003 §Decision).

    Accepts any mapping with optional ``"pose"`` / ``"task_state"`` /
    ``"learned_overlay"`` keys; non-mapping inputs are treated as empty
    state (the channel simply ages every previously-registered uid).

    Args:
        state: Source state mapping. Recognised keys:

            - ``"pose"``: ``dict[str, Pose]`` — fresh pose per uid.
            - ``"task_state"``: ``dict[str, TaskStatePredicate]`` —
              optional fresh predicate per uid.
            - ``"learned_overlay"``: optional ``torch.Tensor`` for
              jointly-trained B6 partners; passed through unchanged.

    Returns:
        A :class:`CommPacket` carrying the current schema version, a
        shallow copy of pose / task-state, the AoI snapshot, and the
        opt-in learned overlay.
    """
    pose_in, task_state_in, learned = self._extract(state)
    self._aoi.tick(1.0)
    for uid in pose_in:
        self._aoi.mark_fresh(uid)
    for uid in task_state_in:
        self._aoi.mark_fresh(uid)
    return CommPacket(
        schema_version=self._schema_version,
        pose=dict(pose_in),
        task_state=dict(task_state_in),
        aoi=self._aoi.snapshot(),
        learned_overlay=learned,
    )

reset

reset(*, seed: int | None = None) -> None

Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional new root seed for the channel substream (P6 determinism). When provided, replaces the previously stored root seed; when omitted, the existing root seed is reused so episode resets remain reproducible without a fresh seed argument.

Source code in src/chamber/comm/fixed_format.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def reset(self, *, seed: int | None = None) -> None:
    """Reset the AoI clock and re-seed the channel RNG (ADR-003 §Decision).

    Args:
        seed: Optional new root seed for the channel substream (P6
            determinism). When provided, replaces the previously stored
            root seed; when omitted, the existing root seed is reused so
            episode resets remain reproducible without a fresh seed
            argument.
    """
    self._aoi.reset()
    if seed is not None:
        self._root_seed = seed
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()

Opt-in learned-message overlay slot (T2.4).

ADR-003 §Decision (the second of the two channels — opt-in, null-safe); ADR-003 §Risks risk-mitigation 1: black-box partners leave this slot as None so the fixed-format channel alone is sufficient for ad-hoc deployment. Jointly-trained partners (HetGPPO, CommFormer) populate it with a message tensor.

Phase-0 ships only the slot itself plus a no-op encoder; the real learned-comm machinery (HetGPPO message GNN) is a Phase-1 concern.

LearnedOverlay

Holds an optional learned-message tensor (ADR-003 §Decision).

The class is a thin slot wrapper: encode returns whatever tensor a jointly-trained partner has set (or None), and decode is null-safe so black-box consumers can call it unconditionally.

Parameters:

  • tensor (Tensor | None, default: None ) –

    Initial overlay value. Defaults to None so the slot is interoperable with the black-box partner contract out of the box.

Source code in src/chamber/comm/learned_overlay.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class LearnedOverlay:
    """Holds an optional learned-message tensor (ADR-003 §Decision).

    The class is a thin slot wrapper: ``encode`` returns whatever tensor a
    jointly-trained partner has set (or ``None``), and ``decode`` is
    null-safe so black-box consumers can call it unconditionally.

    Args:
        tensor: Initial overlay value. Defaults to ``None`` so the slot is
            interoperable with the black-box partner contract out of the box.
    """

    def __init__(self, tensor: torch.Tensor | None = None) -> None:
        """Build an overlay slot, optionally pre-populated (ADR-003 §Decision)."""
        self._tensor: torch.Tensor | None = tensor

    @property
    def tensor(self) -> torch.Tensor | None:
        """Return the currently held tensor, or ``None`` (ADR-003 §Decision)."""
        return self._tensor

    def set(self, tensor: torch.Tensor | None) -> None:
        """Replace the held tensor (ADR-003 §Decision).

        Jointly-trained partners call this from their forward pass; black-box
        partners never invoke it (or invoke it with ``None``), preserving the
        null-safety guarantee.
        """
        self._tensor = tensor

    def reset(self) -> None:
        """Clear the overlay (ADR-003 §Decision; episode boundary).

        Called by the channel on env reset so cross-episode message tensors
        do not leak across episode boundaries.
        """
        self._tensor = None

    def encode(self) -> torch.Tensor | None:
        """No-op encoder: returns the held tensor (ADR-003 §Decision).

        Phase-1 will replace this with the message GNN forward pass; for
        Phase-0 the slot semantics are sufficient for the
        :class:`~chamber.comm.api.CommPacket` ``learned_overlay`` field.
        """
        return self._tensor

    @staticmethod
    def decode(packet_overlay: torch.Tensor | None) -> torch.Tensor | None:
        """Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

        Black-box partners receive a packet whose ``learned_overlay`` is
        ``None``; jointly-trained partners receive the producer's tensor
        unchanged. Either path returns without raising.
        """
        return packet_overlay

tensor property

tensor: Tensor | None

Return the currently held tensor, or None (ADR-003 §Decision).

__init__

__init__(tensor: Tensor | None = None) -> None

Build an overlay slot, optionally pre-populated (ADR-003 §Decision).

Source code in src/chamber/comm/learned_overlay.py
34
35
36
def __init__(self, tensor: torch.Tensor | None = None) -> None:
    """Build an overlay slot, optionally pre-populated (ADR-003 §Decision)."""
    self._tensor: torch.Tensor | None = tensor

decode staticmethod

decode(
    packet_overlay: Tensor | None,
) -> torch.Tensor | None

Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

Black-box partners receive a packet whose learned_overlay is None; jointly-trained partners receive the producer's tensor unchanged. Either path returns without raising.

Source code in src/chamber/comm/learned_overlay.py
69
70
71
72
73
74
75
76
77
@staticmethod
def decode(packet_overlay: torch.Tensor | None) -> torch.Tensor | None:
    """Null-safe decoder for the packet's learned overlay (ADR-003 §Decision).

    Black-box partners receive a packet whose ``learned_overlay`` is
    ``None``; jointly-trained partners receive the producer's tensor
    unchanged. Either path returns without raising.
    """
    return packet_overlay

encode

encode() -> torch.Tensor | None

No-op encoder: returns the held tensor (ADR-003 §Decision).

Phase-1 will replace this with the message GNN forward pass; for Phase-0 the slot semantics are sufficient for the :class:~chamber.comm.api.CommPacket learned_overlay field.

Source code in src/chamber/comm/learned_overlay.py
60
61
62
63
64
65
66
67
def encode(self) -> torch.Tensor | None:
    """No-op encoder: returns the held tensor (ADR-003 §Decision).

    Phase-1 will replace this with the message GNN forward pass; for
    Phase-0 the slot semantics are sufficient for the
    :class:`~chamber.comm.api.CommPacket` ``learned_overlay`` field.
    """
    return self._tensor

reset

reset() -> None

Clear the overlay (ADR-003 §Decision; episode boundary).

Called by the channel on env reset so cross-episode message tensors do not leak across episode boundaries.

Source code in src/chamber/comm/learned_overlay.py
52
53
54
55
56
57
58
def reset(self) -> None:
    """Clear the overlay (ADR-003 §Decision; episode boundary).

    Called by the channel on env reset so cross-episode message tensors
    do not leak across episode boundaries.
    """
    self._tensor = None

set

set(tensor: Tensor | None) -> None

Replace the held tensor (ADR-003 §Decision).

Jointly-trained partners call this from their forward pass; black-box partners never invoke it (or invoke it with None), preserving the null-safety guarantee.

Source code in src/chamber/comm/learned_overlay.py
43
44
45
46
47
48
49
50
def set(self, tensor: torch.Tensor | None) -> None:
    """Replace the held tensor (ADR-003 §Decision).

    Jointly-trained partners call this from their forward pass; black-box
    partners never invoke it (or invoke it with ``None``), preserving the
    null-safety guarantee.
    """
    self._tensor = tensor

URLLC-anchored comm degradation wrapper (T2.5).

ADR-006 §Decision: URLLC-anchored numeric bounds — latency 1-100 ms, jitter μs-10 ms, drop 10⁻⁶ to 10⁻² — are applied externally by this wrapper rather than by the channel itself, so the encoding logic in :class:chamber.comm.fixed_format.FixedFormatCommChannel stays pure and testable. ADR-007 Stage-2 CM sweep ranges are exposed via :data:chamber.comm.profiles.URLLC_3GPP_R17.

Behaviour (plan/02 §3.5):

  1. With probability profile.drop_rate, the freshly-encoded packet is dropped; the wrapper returns the previous visible packet with all per-uid AoI values incremented by one tick.
  2. Otherwise the packet is queued with a delay drawn from Normal(latency_mean_ms, latency_std_ms) clipped to [0, 2 * latency_mean_ms]. Packets due at or before the current tick are released; the most-recently-due packet becomes the visible one, with its per-uid AoI incremented by the time it spent in queue.

Determinism (P6): every random draw is seeded via concerto.training.seeding.derive_substream("comm.degrade", ...) so two wrappers reset with the same seed produce byte-identical traces.

CommDegradationStats dataclass

Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).

Each delivered packet contributes one entry to latency_ticks (the in-queue dwell time, in ticks). Each dropped packet increments dropped without producing a latency entry.

Source code in src/chamber/comm/degradation.py
78
79
80
81
82
83
84
85
86
87
88
89
@dataclass(frozen=True)
class CommDegradationStats:
    """Telemetry counters surfaced for the Stage-2 CM spike (ADR-006 §Decision).

    Each delivered packet contributes one entry to ``latency_ticks``
    (the in-queue dwell time, in ticks). Each dropped packet increments
    ``dropped`` without producing a latency entry.
    """

    dropped: int = 0
    delivered: int = 0
    latency_ticks: tuple[int, ...] = field(default_factory=tuple)

CommDegradationWrapper

Compose-time degradation around a fixed-format channel (ADR-006 §Decision).

Wraps any :class:~chamber.comm.api.CommChannel and injects URLLC-anchored latency / jitter / drop semantics described above. The inner channel is unaware of degradation; the wrapper is itself a :class:CommChannel so it composes transparently with :class:chamber.envs.comm_shaping.CommShapingWrapper.

Parameters:

  • channel (CommChannel) –

    The inner channel whose packets will be degraded.

  • profile (DegradationProfile) –

    The degradation envelope (see :class:DegradationProfile).

  • tick_period_ms (float, default: 1.0 ) –

    How many milliseconds each encode call represents. Defaults to 1.0 so the URLLC ms-denominated profiles map tick-for-millisecond.

  • root_seed (int, default: 0 ) –

    Root seed for the wrapper's RNG (P6 determinism).

Source code in src/chamber/comm/degradation.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
class CommDegradationWrapper:
    """Compose-time degradation around a fixed-format channel (ADR-006 §Decision).

    Wraps any :class:`~chamber.comm.api.CommChannel` and injects
    URLLC-anchored latency / jitter / drop semantics described above. The
    inner channel is unaware of degradation; the wrapper is itself a
    :class:`CommChannel` so it composes transparently with
    :class:`chamber.envs.comm_shaping.CommShapingWrapper`.

    Args:
        channel: The inner channel whose packets will be degraded.
        profile: The degradation envelope (see :class:`DegradationProfile`).
        tick_period_ms: How many milliseconds each ``encode`` call represents.
            Defaults to ``1.0`` so the URLLC ms-denominated profiles map
            tick-for-millisecond.
        root_seed: Root seed for the wrapper's RNG (P6 determinism).
    """

    def __init__(
        self,
        channel: CommChannel,
        profile: DegradationProfile,
        *,
        tick_period_ms: float = 1.0,
        root_seed: int = 0,
    ) -> None:
        """Build the wrapper (ADR-006 §Decision)."""
        if tick_period_ms <= 0.0:
            msg = f"tick_period_ms must be positive; got {tick_period_ms!r}"
            raise ValueError(msg)
        self._channel = channel
        self._profile = profile
        self._tick_period_ms = tick_period_ms
        self._root_seed: int = root_seed
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
        self._tick: int = 0
        self._queue: list[_Queued] = []
        self._visible: CommPacket = _empty_packet()
        self._dropped: int = 0
        self._delivered: int = 0
        self._latency_ticks: list[int] = []

    @property
    def stats(self) -> CommDegradationStats:
        """Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM)."""
        return CommDegradationStats(
            dropped=self._dropped,
            delivered=self._delivered,
            latency_ticks=tuple(self._latency_ticks),
        )

    def reset(self, *, seed: int | None = None) -> None:
        """Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

        Args:
            seed: Optional new root seed; otherwise the previously stored
                root seed is reused so episode resets stay reproducible.
        """
        if seed is not None:
            self._root_seed = seed
        self._channel.reset(seed=self._root_seed)
        self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
        self._tick = 0
        self._queue = []
        self._visible = _empty_packet()
        self._dropped = 0
        self._delivered = 0
        self._latency_ticks = []

    def encode(self, state: object) -> CommPacket:
        """Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

        See module docstring for the per-step behaviour spec.
        """
        self._tick += 1
        fresh = self._channel.encode(state)
        if self._draw_drop():
            self._dropped += 1
            self._visible = self._age_packet(self._visible, by_ticks=1)
            return _copy_packet(self._visible)
        delay_ticks = self._draw_delay_ticks()
        self._queue.append(
            _Queued(release_tick=self._tick + delay_ticks, send_tick=self._tick, packet=fresh)
        )
        self._release_due()
        return _copy_packet(self._visible)

    def decode(self, packet: CommPacket) -> dict[str, object]:
        """Delegate decoding to the inner channel (ADR-003 §Decision)."""
        return self._channel.decode(packet)

    def _draw_drop(self) -> bool:
        """True iff the per-tick Bernoulli drop trial fires (P6 deterministic)."""
        if self._profile.drop_rate <= 0.0:
            return False
        return bool(self._rng.random() < self._profile.drop_rate)

    def _draw_delay_ticks(self) -> int:
        """Sample a clipped-Normal latency and convert to whole ticks (P6)."""
        mean = self._profile.latency_mean_ms
        std = self._profile.latency_std_ms
        upper = 2.0 * mean
        if std == 0.0:
            ms = max(0.0, min(mean, upper))
        else:
            rng_sample = float(self._rng.normal(loc=mean, scale=std))
            ms = max(0.0, min(rng_sample, upper))
        return round(ms / self._tick_period_ms)

    def _release_due(self) -> None:
        """Pop every queued packet whose release tick is <= now (ADR-006)."""
        if not self._queue:
            return
        due: list[_Queued] = [q for q in self._queue if q.release_tick <= self._tick]
        if not due:
            return
        self._queue = [q for q in self._queue if q.release_tick > self._tick]
        # The latest send_tick wins as the visible packet (most recent fresh sample).
        winner = max(due, key=lambda q: q.send_tick)
        for q in due:
            self._delivered += 1
            self._latency_ticks.append(q.release_tick - q.send_tick)
        dwell = self._tick - winner.send_tick
        self._visible = self._age_packet(winner.packet, by_ticks=dwell)

    @staticmethod
    def _age_packet(packet: CommPacket, *, by_ticks: int) -> CommPacket:
        """Return a copy of ``packet`` with all per-uid AoI advanced by ``by_ticks`` (ADR-003)."""
        if not packet["aoi"]:
            return packet
        aged_aoi = {uid: value + float(by_ticks) for uid, value in packet["aoi"].items()}
        aged: CommPacket = CommPacket(
            schema_version=packet["schema_version"],
            pose=dict(packet["pose"]),
            task_state=dict(packet["task_state"]),
            aoi=aged_aoi,
            learned_overlay=packet["learned_overlay"],
        )
        return aged

stats property

stats: CommDegradationStats

Return a snapshot of telemetry counters (ADR-006 §Decision; ADR-007 Stage 2 CM).

__init__

__init__(
    channel: CommChannel,
    profile: DegradationProfile,
    *,
    tick_period_ms: float = 1.0,
    root_seed: int = 0,
) -> None

Build the wrapper (ADR-006 §Decision).

Source code in src/chamber/comm/degradation.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def __init__(
    self,
    channel: CommChannel,
    profile: DegradationProfile,
    *,
    tick_period_ms: float = 1.0,
    root_seed: int = 0,
) -> None:
    """Build the wrapper (ADR-006 §Decision)."""
    if tick_period_ms <= 0.0:
        msg = f"tick_period_ms must be positive; got {tick_period_ms!r}"
        raise ValueError(msg)
    self._channel = channel
    self._profile = profile
    self._tick_period_ms = tick_period_ms
    self._root_seed: int = root_seed
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
    self._tick: int = 0
    self._queue: list[_Queued] = []
    self._visible: CommPacket = _empty_packet()
    self._dropped: int = 0
    self._delivered: int = 0
    self._latency_ticks: list[int] = []

decode

decode(packet: CommPacket) -> dict[str, object]

Delegate decoding to the inner channel (ADR-003 §Decision).

Source code in src/chamber/comm/degradation.py
196
197
198
def decode(self, packet: CommPacket) -> dict[str, object]:
    """Delegate decoding to the inner channel (ADR-003 §Decision)."""
    return self._channel.decode(packet)

encode

encode(state: object) -> CommPacket

Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

See module docstring for the per-step behaviour spec.

Source code in src/chamber/comm/degradation.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def encode(self, state: object) -> CommPacket:
    """Apply URLLC-anchored degradation to the inner channel's packet (ADR-006 §Decision).

    See module docstring for the per-step behaviour spec.
    """
    self._tick += 1
    fresh = self._channel.encode(state)
    if self._draw_drop():
        self._dropped += 1
        self._visible = self._age_packet(self._visible, by_ticks=1)
        return _copy_packet(self._visible)
    delay_ticks = self._draw_delay_ticks()
    self._queue.append(
        _Queued(release_tick=self._tick + delay_ticks, send_tick=self._tick, packet=fresh)
    )
    self._release_due()
    return _copy_packet(self._visible)

reset

reset(*, seed: int | None = None) -> None

Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

Parameters:

  • seed (int | None, default: None ) –

    Optional new root seed; otherwise the previously stored root seed is reused so episode resets stay reproducible.

Source code in src/chamber/comm/degradation.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
def reset(self, *, seed: int | None = None) -> None:
    """Reset the wrapper, the inner channel, and the RNG (ADR-006 §Decision).

    Args:
        seed: Optional new root seed; otherwise the previously stored
            root seed is reused so episode resets stay reproducible.
    """
    if seed is not None:
        self._root_seed = seed
    self._channel.reset(seed=self._root_seed)
    self._rng = derive_substream(_SUBSTREAM_NAME, root_seed=self._root_seed).default_rng()
    self._tick = 0
    self._queue = []
    self._visible = _empty_packet()
    self._dropped = 0
    self._delivered = 0
    self._latency_ticks = []

DegradationProfile dataclass

Latency / jitter / drop bounds for a single experimental condition.

ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial trial corpus. The named profiles in :mod:chamber.comm.profiles cover the Phase-0 Stage-2 CM sweep table.

Parameters:

  • latency_mean_ms (float) –

    Mean of the Gaussian latency draw (ms).

  • latency_std_ms (float) –

    Std of the Gaussian latency draw (ms); the draw is clipped to [0, 2 * latency_mean_ms].

  • drop_rate (float) –

    Bernoulli drop probability per packet (per encode call).

Source code in src/chamber/comm/degradation.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
@dataclass(frozen=True)
class DegradationProfile:
    """Latency / jitter / drop bounds for a single experimental condition.

    ADR-006 §Decision: anchor values come from the URLLC + 3GPP R17 industrial
    trial corpus. The named profiles in :mod:`chamber.comm.profiles` cover the
    Phase-0 Stage-2 CM sweep table.

    Args:
        latency_mean_ms: Mean of the Gaussian latency draw (ms).
        latency_std_ms: Std of the Gaussian latency draw (ms); the draw is
            clipped to ``[0, 2 * latency_mean_ms]``.
        drop_rate: Bernoulli drop probability per packet (per encode call).
    """

    latency_mean_ms: float
    latency_std_ms: float
    drop_rate: float

saturation_guard

saturation_guard(
    profile: DegradationProfile,
    qp_solve_fn: Callable[
        ..., tuple[float, float]
    ] = solve_qp_stub,
    *,
    qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None

Check the inner CBF QP solver's headroom under profile (ADR-006 §Risks R5).

Two conditions trigger :class:~chamber.comm.errors.ChamberCommQPSaturationWarning:

  1. The measured QP solve time exceeds qp_time_budget_ms (M3 enforces this on the real solver; the M2 stub returns immediately).
  2. The profile is in the saturation regime per ADR-006 R5 — drop rate >= 10 % or latency mean >= 100 ms — even when timing is fine. The regime check exists so the test exists from M2 and M3 cannot regress it.

Parameters:

  • profile (DegradationProfile) –

    The :class:DegradationProfile whose feasibility against the inner CBF QP we are testing.

  • qp_solve_fn (Callable[..., tuple[float, float]], default: solve_qp_stub ) –

    The QP solver under test. Defaults to :func:concerto.safety.solve_qp_stub; M3 swaps in the real solver. Must return a (decision, elapsed_seconds) pair.

  • qp_time_budget_ms (float, default: _QP_TIME_BUDGET_MS ) –

    Solve-time budget. Defaults to the OSCBF target from ADR-004 (1 ms).

Source code in src/chamber/comm/degradation.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
def saturation_guard(
    profile: DegradationProfile,
    qp_solve_fn: Callable[..., tuple[float, float]] = solve_qp_stub,
    *,
    qp_time_budget_ms: float = _QP_TIME_BUDGET_MS,
) -> None:
    """Check the inner CBF QP solver's headroom under ``profile`` (ADR-006 §Risks R5).

    Two conditions trigger :class:`~chamber.comm.errors.ChamberCommQPSaturationWarning`:

      1. The measured QP solve time exceeds ``qp_time_budget_ms`` (M3 enforces
         this on the real solver; the M2 stub returns immediately).
      2. The profile is in the saturation regime per ADR-006 R5 — drop rate
         >= 10 % or latency mean >= 100 ms — even when timing is fine. The
         regime check exists so the test exists from M2 and M3 cannot regress
         it.

    Args:
        profile: The :class:`DegradationProfile` whose feasibility against the
            inner CBF QP we are testing.
        qp_solve_fn: The QP solver under test. Defaults to
            :func:`concerto.safety.solve_qp_stub`; M3 swaps in the real
            solver. Must return a ``(decision, elapsed_seconds)`` pair.
        qp_time_budget_ms: Solve-time budget. Defaults to the OSCBF target
            from ADR-004 (1 ms).
    """
    start = time.perf_counter()
    qp_solve_fn()
    elapsed_ms = (time.perf_counter() - start) * 1000.0
    timing_saturated = elapsed_ms > qp_time_budget_ms
    regime_saturated = _profile_in_saturation_regime(profile)
    if not (timing_saturated or regime_saturated):
        return
    reasons: list[str] = []
    if timing_saturated:
        reasons.append(f"QP solve {elapsed_ms:.4f} ms exceeds {qp_time_budget_ms:.4f} ms budget")
    if regime_saturated:
        reasons.append(
            "profile in ADR-006 R5 saturation regime "
            f"(drop_rate={profile.drop_rate}, latency_mean_ms={profile.latency_mean_ms})"
        )
    warnings.warn(
        "ChamberCommQPSaturationWarning: " + "; ".join(reasons),
        ChamberCommQPSaturationWarning,
        stacklevel=2,
    )

URLLC-anchored degradation sweep profiles (T2.6).

ADR-006 §Decision: latency 1-100 ms, jitter μs-10 ms, drop 10⁻⁶ to 10⁻² (URLLC + 3GPP Release 17 + arXiv 2501.12792 industrial-trial anchors). The six named profiles below cover the sweep range the ADR-007 Stage-2 CM spike will reproduce; the saturation row is held aside so the QP-saturation property test (T2.7) can prove or refute ADR-006 risk #5 (R5).

Profiles are evaluated in production by composition with :class:chamber.comm.degradation.CommDegradationWrapper; the channel itself ships no degradation (ADR-003 §Decision).

Comm-stack error and warning types.

ADR-003 §Decision (every concrete channel raises :class:ChamberCommError on contract violations, never silently degrades). ADR-006 §Risks #5 / Risk register R5 (the QP-saturation guard fires :class:ChamberCommQPSaturationWarning when the URLLC sweep saturates the inner CBF QP solver).

ChamberCommError

Bases: RuntimeError

Raised when a comm-channel contract is violated (ADR-003 §Decision).

Examples include malformed packets (missing schema_version, unrecognised packet keys), unsupported profile lookups, and channel state that cannot be reconciled (e.g., decoding a packet whose schema version differs from the channel's).

Source code in src/chamber/comm/errors.py
13
14
15
16
17
18
19
20
class ChamberCommError(RuntimeError):
    """Raised when a comm-channel contract is violated (ADR-003 §Decision).

    Examples include malformed packets (missing ``schema_version``,
    unrecognised packet keys), unsupported profile lookups, and channel
    state that cannot be reconciled (e.g., decoding a packet whose
    schema version differs from the channel's).
    """

ChamberCommQPSaturationWarning

Bases: UserWarning

Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).

The QP-saturation guard test in tests/property/test_qp_saturation.py sweeps the :data:chamber.comm.profiles.URLLC_3GPP_R17 table and asserts that the inner CBF QP solve time stays below 1 ms (OSCBF target from ADR-004). If the most aggressive profile saturates, the wrapper raises this warning rather than silently allowing it (ADR-006 Risk register R5).

Source code in src/chamber/comm/errors.py
23
24
25
26
27
28
29
30
31
32
33
class ChamberCommQPSaturationWarning(UserWarning):
    """Emitted when a degradation profile saturates the inner CBF QP (ADR-006 R5).

    The QP-saturation guard test in
    ``tests/property/test_qp_saturation.py`` sweeps the
    :data:`chamber.comm.profiles.URLLC_3GPP_R17` table and asserts that
    the inner CBF QP solve time stays below 1 ms (OSCBF target from
    ADR-004). If the most aggressive profile saturates, the wrapper
    raises this warning rather than silently allowing it (ADR-006
    Risk register R5).
    """

chamber.partners

CHAMBER partner zoo — concrete implementations of the FrozenPartner Protocol.

ADR-009 §Decision (zoo construction); ADR-010 §Decision (FM partner selection); plan/04 §3 (architecture). Phase-0 ships:

  • The :class:FrozenPartner Protocol + :class:PartnerSpec dataclass (:mod:chamber.partners.api).
  • The plug-in :func:register_partner decorator + :func:load_partner factory + :func:list_registered (:mod:chamber.partners.registry).
  • :class:PartnerBase with the _FORBIDDEN_ATTRS shield enforcing the black-box AHT no-joint-training constraint at runtime (:mod:chamber.partners.interface).
  • :class:ScriptedHeuristicPartner — Stage-3 draft-zoo entry #1 (:mod:chamber.partners.heuristic).
  • :class:FrozenMAPPOPartner — Stage-3 draft-zoo entry #2 (T4.5; :mod:chamber.partners.frozen_mappo). Loads a torch state-dict via :func:concerto.training.checkpoints.load_checkpoint, freezes every parameter, runs inference under :func:torch.no_grad.
  • :class:FrozenHARLPartner — Stage-3 draft-zoo entry #3 (T4.6; :mod:chamber.partners.frozen_harl). Loads the HARL HAPPO actor checkpoint produced by :class:chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer, rebuilds the :class:StochasticPolicy with the same args, freezes every parameter, runs deterministic=True inference under :func:torch.no_grad.
  • :class:OpenVLAPartner and :class:CrossFormerPartner Phase-1 stubs (:mod:chamber.partners.stubs) — ADR-010 §Decision Option B; both raise :class:NotImplementedError referencing the Phase-1 ticket.
  • The Phase-0 :func:make_phase0_draft_zoo (real) and the Phase-1 :func:select_zoo stub (:mod:chamber.partners.selection).

The M4-gate integration test (T4.9) lands in a follow-up PR (plan/04 §1).

CrossFormerPartner

Bases: PartnerBase

CrossFormer frozen zero-shot generalist — Phase-1 stub (ADR-010 §Decision Option B).

Frozen across all 20 cross-embodiment classes; no per-task LoRA budget (ADR-010 §Decision: zero fine-tuning cost). Per ADR-010 §Risks the zero-shot deployment must pass a per-task ablation before use as a primary partner; the stub does not bypass that gate.

Source code in src/chamber/partners/stubs/crossformer.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
@register_partner("crossformer_zero_shot")
class CrossFormerPartner(PartnerBase):
    """CrossFormer frozen zero-shot generalist — Phase-1 stub (ADR-010 §Decision Option B).

    Frozen across all 20 cross-embodiment classes; no per-task LoRA budget
    (ADR-010 §Decision: zero fine-tuning cost). Per ADR-010 §Risks the
    zero-shot deployment must pass a per-task ablation before use as a
    primary partner; the stub does not bypass that gate.
    """

    def reset(self, *, seed: int | None = None) -> NoReturn:
        """Phase-1 stub — raises (ADR-010 §Decision Option B).

        Args:
            seed: Accepted for Protocol conformance only.

        Raises:
            NotImplementedError: Always; the real zero-shot inference
                harness is Phase-1 work.
        """
        del seed
        raise NotImplementedError(
            f"CrossFormerPartner.reset is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
        )

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Phase-1 stub — raises (ADR-010 §Decision Option B).

        Args:
            obs: Gymnasium observation mapping.
            deterministic: Accepted for Protocol conformance only.

        Raises:
            NotImplementedError: Always; the real action-chunk inference
                is Phase-1 work.
        """
        del obs, deterministic
        raise NotImplementedError(
            f"CrossFormerPartner.act is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
        )

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Phase-1 stub — raises (ADR-010 §Decision Option B).

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation mapping.

  • deterministic (bool, default: True ) –

    Accepted for Protocol conformance only.

Raises:

  • NotImplementedError

    Always; the real action-chunk inference is Phase-1 work.

Source code in src/chamber/partners/stubs/crossformer.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Phase-1 stub — raises (ADR-010 §Decision Option B).

    Args:
        obs: Gymnasium observation mapping.
        deterministic: Accepted for Protocol conformance only.

    Raises:
        NotImplementedError: Always; the real action-chunk inference
            is Phase-1 work.
    """
    del obs, deterministic
    raise NotImplementedError(
        f"CrossFormerPartner.act is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
    )

reset

reset(*, seed: int | None = None) -> NoReturn

Phase-1 stub — raises (ADR-010 §Decision Option B).

Parameters:

  • seed (int | None, default: None ) –

    Accepted for Protocol conformance only.

Raises:

  • NotImplementedError

    Always; the real zero-shot inference harness is Phase-1 work.

Source code in src/chamber/partners/stubs/crossformer.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def reset(self, *, seed: int | None = None) -> NoReturn:
    """Phase-1 stub — raises (ADR-010 §Decision Option B).

    Args:
        seed: Accepted for Protocol conformance only.

    Raises:
        NotImplementedError: Always; the real zero-shot inference
            harness is Phase-1 work.
    """
    del seed
    raise NotImplementedError(
        f"CrossFormerPartner.reset is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
    )

FrozenHARLPartner

Bases: PartnerBase

Frozen HARL HAPPO partner (ADR-009 §Decision; plan/04 §3.6).

Loads a HARL :class:StochasticPolicy checkpoint via :func:concerto.training.checkpoints.load_checkpoint (which verifies the sidecar SHA-256), freezes every parameter (requires_grad=False), and runs inference under :func:torch.no_grad with deterministic=True. The actor's layer widths are inferred from the loaded state-dict tensor shapes.

Spec contract (plan/04 §3.6; ADR-009 §Decision):

  • spec.class_name == "frozen_harl".
  • spec.weights_uri is the local://artifacts/<name>.pt URI.
  • spec.extra["uid"] is the env-side uid the partner acts on; it indexes obs["agent"][uid]["state"] at action time.

Statefulness (plan/04 §2): the partner is stateless across episodes; :meth:reset is a no-op once the checkpoint is loaded. Loading is deferred to the first :meth:reset / :meth:act call so the registry lookup in :func:chamber.partners.registry.load_partner stays cheap.

Device placement (Phase-0): inference runs on CPU. HAPPO actor forward passes are microseconds-scale and not the project's throughput bottleneck (the OpenVLA partner stratum is); partner-side GPU placement is deferred to a future partner-config plane and would need its own ADR amendment.

Source code in src/chamber/partners/frozen_harl.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
@register_partner("frozen_harl")
class FrozenHARLPartner(PartnerBase):
    """Frozen HARL HAPPO partner (ADR-009 §Decision; plan/04 §3.6).

    Loads a HARL :class:`StochasticPolicy` checkpoint via
    :func:`concerto.training.checkpoints.load_checkpoint` (which verifies
    the sidecar SHA-256), freezes every parameter
    (``requires_grad=False``), and runs inference under
    :func:`torch.no_grad` with ``deterministic=True``. The actor's layer
    widths are inferred from the loaded state-dict tensor shapes.

    Spec contract (plan/04 §3.6; ADR-009 §Decision):

    - ``spec.class_name == "frozen_harl"``.
    - ``spec.weights_uri`` is the ``local://artifacts/<name>.pt`` URI.
    - ``spec.extra["uid"]`` is the env-side uid the partner acts on;
      it indexes ``obs["agent"][uid]["state"]`` at action time.

    Statefulness (plan/04 §2): the partner is stateless across episodes;
    :meth:`reset` is a no-op once the checkpoint is loaded. Loading is
    deferred to the first :meth:`reset` / :meth:`act` call so the
    registry lookup in :func:`chamber.partners.registry.load_partner`
    stays cheap.

    Device placement (Phase-0): inference runs on CPU. HAPPO actor
    forward passes are microseconds-scale and not the project's
    throughput bottleneck (the OpenVLA partner stratum is); partner-side
    GPU placement is deferred to a future partner-config plane and would
    need its own ADR amendment.
    """

    def __init__(self, spec: PartnerSpec) -> None:
        """Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

        Args:
            spec: Identity-bearing handle. Reads ``spec.extra["uid"]``
                for the env-side agent uid and ``spec.weights_uri`` for
                the checkpoint URI. Construction does not touch disk.

        Raises:
            ValueError: If ``spec.weights_uri`` is ``None`` or
                ``spec.extra["uid"]`` is missing / empty.
        """
        super().__init__(spec)
        if spec.weights_uri is None:
            msg = (
                f"FrozenHARLPartner requires spec.weights_uri (ADR-009 §Decision); "
                f"got None on spec {spec.class_name!r}."
            )
            raise ValueError(msg)
        uid = spec.extra.get("uid", "")
        if not uid:
            msg = (
                "FrozenHARLPartner requires spec.extra['uid'] (the env-side agent uid; "
                "plan/04 §3.6)."
            )
            raise ValueError(msg)
        self._uid: str = uid
        self._weights_uri: str = spec.weights_uri
        self._actor: StochasticPolicy | None = None
        self._obs_dim: int = 0
        self._action_dim: int = 0
        self._hidden_dim: int = 0

    def reset(self, *, seed: int | None = None) -> None:
        """Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

        Args:
            seed: Accepted for
                :class:`~chamber.partners.api.FrozenPartner` Protocol
                conformance. The partner is fully deterministic given
                the loaded weights and ``obs``; seed is ignored.
        """
        del seed
        self._ensure_loaded()

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Greedy HARL-actor forward pass under no-grad (ADR-009 §Decision).

        Args:
            obs: Gymnasium observation mapping. Reads the ego's flat state
                from ``obs["agent"][uid]["state"]`` (plan/04 §3.4); other
                keys are ignored.
            deterministic: Accepted for Protocol conformance but ignored.
                Phase-0 pins the partner to the distribution mode so a
                frozen partner's actions stay decoupled from torch's
                global RNG (see module docstring).

        Returns:
            ``np.float32`` action vector of length ``action_dim`` (the
            mean head's output dimension, inferred from the loaded
            checkpoint).

        Raises:
            ValueError: If ``obs["agent"][uid]["state"]`` is missing or
                its length disagrees with the loaded actor's input
                dimension.
        """
        del deterministic
        actor = self._ensure_loaded()
        state = _read_flat_state(obs, self._uid)
        if state.shape != (self._obs_dim,):
            msg = (
                f"FrozenHARLPartner: obs['agent'][{self._uid!r}]['state'] has shape "
                f"{tuple(state.shape)}, expected ({self._obs_dim},) matching the loaded "
                f"actor's input dim."
            )
            raise ValueError(msg)
        # HARL's StochasticPolicy.forward signature is
        # ``(obs, rnn_states, masks, available_actions, deterministic)``.
        # rnn_states + masks are placeholders since use_recurrent_policy=False;
        # we pass shape-correct zeros / ones so HARL's internal ``check`` call
        # accepts them. Mirrors EgoPPOTrainer.act's call site.
        obs_t = state.reshape(1, -1)
        rnn_states = np.zeros((1, 1, self._hidden_dim), dtype=np.float32)
        masks = np.ones((1, 1), dtype=np.float32)
        with torch.no_grad():
            action_t, _, _ = actor(obs_t, rnn_states, masks, None, True)
        return action_t.detach().cpu().numpy().squeeze(0).astype(np.float32)

    def _ensure_loaded(self) -> StochasticPolicy:
        """Load + freeze the HARL actor on first use; idempotent (plan/04 §3.3).

        Returns:
            The loaded :class:`StochasticPolicy` (cached after first call).

        Raises:
            CheckpointError: Propagated from
                :func:`concerto.training.checkpoints.load_checkpoint`
                when the sidecar is missing, malformed, or the SHA-256
                disagrees with the on-disk payload.
            ValueError: When the loaded state-dict does not match the
                expected HARL HAPPO actor layout.
        """
        if self._actor is not None:
            return self._actor
        artifacts_root = _resolve_artifacts_root()
        loaded, _meta = load_checkpoint(uri=self._weights_uri, artifacts_root=artifacts_root)
        actor_sd = _extract_actor_state_dict(loaded)
        obs_dim, hidden_dim, action_dim = _infer_shape(actor_sd)
        args = dict(_HARL_INFERENCE_ARGS)
        args["hidden_sizes"] = [hidden_dim, hidden_dim]
        obs_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(obs_dim,), dtype=np.float32)
        act_space = gym.spaces.Box(low=-1.0, high=1.0, shape=(action_dim,), dtype=np.float32)
        actor = StochasticPolicy(args, obs_space, act_space, torch.device("cpu"))
        try:
            actor.load_state_dict(dict(actor_sd))
        except RuntimeError as exc:
            msg = f"FrozenHARLPartner: load_state_dict failed for {self._weights_uri!r}: {exc}"
            raise ValueError(msg) from exc
        for param in actor.parameters():
            param.requires_grad = False
        actor.eval()
        self._actor = actor
        self._obs_dim = obs_dim
        self._action_dim = action_dim
        self._hidden_dim = hidden_dim
        return actor

__init__

__init__(spec: PartnerSpec) -> None

Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

Parameters:

  • spec (PartnerSpec) –

    Identity-bearing handle. Reads spec.extra["uid"] for the env-side agent uid and spec.weights_uri for the checkpoint URI. Construction does not touch disk.

Raises:

  • ValueError

    If spec.weights_uri is None or spec.extra["uid"] is missing / empty.

Source code in src/chamber/partners/frozen_harl.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
def __init__(self, spec: PartnerSpec) -> None:
    """Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

    Args:
        spec: Identity-bearing handle. Reads ``spec.extra["uid"]``
            for the env-side agent uid and ``spec.weights_uri`` for
            the checkpoint URI. Construction does not touch disk.

    Raises:
        ValueError: If ``spec.weights_uri`` is ``None`` or
            ``spec.extra["uid"]`` is missing / empty.
    """
    super().__init__(spec)
    if spec.weights_uri is None:
        msg = (
            f"FrozenHARLPartner requires spec.weights_uri (ADR-009 §Decision); "
            f"got None on spec {spec.class_name!r}."
        )
        raise ValueError(msg)
    uid = spec.extra.get("uid", "")
    if not uid:
        msg = (
            "FrozenHARLPartner requires spec.extra['uid'] (the env-side agent uid; "
            "plan/04 §3.6)."
        )
        raise ValueError(msg)
    self._uid: str = uid
    self._weights_uri: str = spec.weights_uri
    self._actor: StochasticPolicy | None = None
    self._obs_dim: int = 0
    self._action_dim: int = 0
    self._hidden_dim: int = 0

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Greedy HARL-actor forward pass under no-grad (ADR-009 §Decision).

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation mapping. Reads the ego's flat state from obs["agent"][uid]["state"] (plan/04 §3.4); other keys are ignored.

  • deterministic (bool, default: True ) –

    Accepted for Protocol conformance but ignored. Phase-0 pins the partner to the distribution mode so a frozen partner's actions stay decoupled from torch's global RNG (see module docstring).

Returns:

  • NDArray[floating]

    np.float32 action vector of length action_dim (the

  • NDArray[floating]

    mean head's output dimension, inferred from the loaded

  • NDArray[floating]

    checkpoint).

Raises:

  • ValueError

    If obs["agent"][uid]["state"] is missing or its length disagrees with the loaded actor's input dimension.

Source code in src/chamber/partners/frozen_harl.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Greedy HARL-actor forward pass under no-grad (ADR-009 §Decision).

    Args:
        obs: Gymnasium observation mapping. Reads the ego's flat state
            from ``obs["agent"][uid]["state"]`` (plan/04 §3.4); other
            keys are ignored.
        deterministic: Accepted for Protocol conformance but ignored.
            Phase-0 pins the partner to the distribution mode so a
            frozen partner's actions stay decoupled from torch's
            global RNG (see module docstring).

    Returns:
        ``np.float32`` action vector of length ``action_dim`` (the
        mean head's output dimension, inferred from the loaded
        checkpoint).

    Raises:
        ValueError: If ``obs["agent"][uid]["state"]`` is missing or
            its length disagrees with the loaded actor's input
            dimension.
    """
    del deterministic
    actor = self._ensure_loaded()
    state = _read_flat_state(obs, self._uid)
    if state.shape != (self._obs_dim,):
        msg = (
            f"FrozenHARLPartner: obs['agent'][{self._uid!r}]['state'] has shape "
            f"{tuple(state.shape)}, expected ({self._obs_dim},) matching the loaded "
            f"actor's input dim."
        )
        raise ValueError(msg)
    # HARL's StochasticPolicy.forward signature is
    # ``(obs, rnn_states, masks, available_actions, deterministic)``.
    # rnn_states + masks are placeholders since use_recurrent_policy=False;
    # we pass shape-correct zeros / ones so HARL's internal ``check`` call
    # accepts them. Mirrors EgoPPOTrainer.act's call site.
    obs_t = state.reshape(1, -1)
    rnn_states = np.zeros((1, 1, self._hidden_dim), dtype=np.float32)
    masks = np.ones((1, 1), dtype=np.float32)
    with torch.no_grad():
        action_t, _, _ = actor(obs_t, rnn_states, masks, None, True)
    return action_t.detach().cpu().numpy().squeeze(0).astype(np.float32)

reset

reset(*, seed: int | None = None) -> None

Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

Parameters:

  • seed (int | None, default: None ) –

    Accepted for :class:~chamber.partners.api.FrozenPartner Protocol conformance. The partner is fully deterministic given the loaded weights and obs; seed is ignored.

Source code in src/chamber/partners/frozen_harl.py
203
204
205
206
207
208
209
210
211
212
213
def reset(self, *, seed: int | None = None) -> None:
    """Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

    Args:
        seed: Accepted for
            :class:`~chamber.partners.api.FrozenPartner` Protocol
            conformance. The partner is fully deterministic given
            the loaded weights and ``obs``; seed is ignored.
    """
    del seed
    self._ensure_loaded()

FrozenMAPPOPartner

Bases: PartnerBase

Frozen shared-parameter MAPPO partner (ADR-009 §Decision; plan/04 §3.5).

Loads a :class:_MAPPOActor checkpoint via :func:concerto.training.checkpoints.load_checkpoint (which verifies the sidecar SHA-256), freezes every parameter (requires_grad=False), and runs inference under :func:torch.no_grad. The actor's layer widths are inferred from the loaded state-dict tensor shapes — no architecture-metadata sidecar is required (ADR-002 §Decisions: provenance lives on the sidecar; shape lives in the tensors).

Spec contract (plan/04 §3.5; ADR-009 §Decision):

  • spec.class_name == "frozen_mappo".
  • spec.weights_uri is the local://artifacts/<name>.pt URI.
  • spec.extra["uid"] is the env-side uid the partner acts on; it indexes obs["agent"][uid]["state"] at action time.

Statefulness (plan/04 §2): the partner is stateless across episodes; :meth:reset is a no-op once the checkpoint is loaded. Loading is deferred to the first :meth:reset / :meth:act call so the registry lookup in :func:chamber.partners.registry.load_partner stays cheap and so test fixtures can construct + monkey-patch the env var before any I/O happens.

Source code in src/chamber/partners/frozen_mappo.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
@register_partner("frozen_mappo")
class FrozenMAPPOPartner(PartnerBase):
    """Frozen shared-parameter MAPPO partner (ADR-009 §Decision; plan/04 §3.5).

    Loads a :class:`_MAPPOActor` checkpoint via
    :func:`concerto.training.checkpoints.load_checkpoint` (which verifies
    the sidecar SHA-256), freezes every parameter
    (``requires_grad=False``), and runs inference under
    :func:`torch.no_grad`. The actor's layer widths are inferred from the
    loaded state-dict tensor shapes — no architecture-metadata sidecar is
    required (ADR-002 §Decisions: provenance lives on the sidecar; shape
    lives in the tensors).

    Spec contract (plan/04 §3.5; ADR-009 §Decision):

    - ``spec.class_name == "frozen_mappo"``.
    - ``spec.weights_uri`` is the ``local://artifacts/<name>.pt`` URI.
    - ``spec.extra["uid"]`` is the env-side uid the partner acts on;
      it indexes ``obs["agent"][uid]["state"]`` at action time.

    Statefulness (plan/04 §2): the partner is stateless across episodes;
    :meth:`reset` is a no-op once the checkpoint is loaded. Loading is
    deferred to the first :meth:`reset` / :meth:`act` call so the
    registry lookup in :func:`chamber.partners.registry.load_partner`
    stays cheap and so test fixtures can construct + monkey-patch the
    env var before any I/O happens.
    """

    def __init__(self, spec: PartnerSpec) -> None:
        """Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

        Args:
            spec: Identity-bearing handle. Reads ``spec.extra["uid"]``
                for the env-side agent uid and ``spec.weights_uri`` for
                the checkpoint URI. Construction does not touch disk —
                the checkpoint is loaded lazily on first action / reset.

        Raises:
            ValueError: If ``spec.weights_uri`` is ``None`` or
                ``spec.extra["uid"]`` is missing / empty.
        """
        super().__init__(spec)
        if spec.weights_uri is None:
            msg = (
                f"FrozenMAPPOPartner requires spec.weights_uri (ADR-009 §Decision); "
                f"got None on spec {spec.class_name!r}."
            )
            raise ValueError(msg)
        uid = spec.extra.get("uid", "")
        if not uid:
            msg = (
                "FrozenMAPPOPartner requires spec.extra['uid'] (the env-side agent uid; "
                "plan/04 §3.5)."
            )
            raise ValueError(msg)
        self._uid: str = uid
        self._weights_uri: str = spec.weights_uri
        self._actor: _MAPPOActor | None = None

    def reset(self, *, seed: int | None = None) -> None:
        """Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

        Args:
            seed: Accepted for :class:`~chamber.partners.api.FrozenPartner`
                Protocol conformance. The partner is fully deterministic
                given the loaded weights and ``obs``; seed is ignored.
        """
        del seed
        self._ensure_loaded()

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Greedy MAPPO-actor forward pass under no-grad (ADR-009 §Decision).

        Args:
            obs: Gymnasium observation mapping. Reads the ego's flat state
                from ``obs["agent"][uid]["state"]`` (plan/04 §3.4); other
                keys are ignored.
            deterministic: Accepted for Protocol conformance. The
                Phase-0 actor head is a deterministic linear layer with
                no sampled noise, so this kwarg has no effect.

        Returns:
            ``np.float32`` action vector of length ``action_dim`` (the
            head's output dimension, inferred from the loaded
            checkpoint).

        Raises:
            ValueError: If ``obs["agent"][uid]["state"]`` is missing or
                its length disagrees with the loaded actor's input
                dimension.
        """
        del deterministic
        actor = self._ensure_loaded()
        state = _read_flat_state(obs, self._uid)
        expected = actor.fc1.in_features
        if state.shape != (expected,):
            msg = (
                f"FrozenMAPPOPartner: obs['agent'][{self._uid!r}]['state'] has shape "
                f"{tuple(state.shape)}, expected ({expected},) matching the loaded "
                f"actor's fc1 input dim."
            )
            raise ValueError(msg)
        with torch.no_grad():
            input_t = torch.from_numpy(state).unsqueeze(0)
            action_t = actor(input_t)
        return action_t.squeeze(0).detach().cpu().numpy().astype(np.float32)

    def _ensure_loaded(self) -> _MAPPOActor:
        """Load + freeze the actor on first use; idempotent (plan/04 §3.3).

        Returns:
            The loaded :class:`_MAPPOActor` (cached after first call).

        Raises:
            CheckpointError: Propagated from
                :func:`concerto.training.checkpoints.load_checkpoint`
                when the sidecar is missing, malformed, or the SHA-256
                disagrees with the on-disk payload.
            ValueError: When the loaded state-dict does not match the
                expected MAPPO actor layout.
        """
        if self._actor is not None:
            return self._actor
        artifacts_root = _resolve_artifacts_root()
        loaded, _meta = load_checkpoint(uri=self._weights_uri, artifacts_root=artifacts_root)
        actor = self._build_actor_from_state_dict(loaded)
        for param in actor.parameters():
            param.requires_grad = False
        actor.eval()
        self._actor = actor
        return actor

    def _build_actor_from_state_dict(self, loaded: Mapping[str, object]) -> _MAPPOActor:
        """Reconstruct the MLP actor from a loaded checkpoint dict (plan/04 §3.5).

        Accepts both nested (``{"actor": <flat_state_dict>}``) and flat
        layouts. The nested layout is the canonical Phase-0 format
        produced by :meth:`chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.state_dict`
        (the ``"actor"`` sub-key); the flat layout is the convenience
        format the unit tests + manual reproduction scripts use.

        Args:
            loaded: The dict :func:`load_checkpoint` returned.

        Returns:
            A fresh :class:`_MAPPOActor` with weights copied from
            ``loaded``.

        Raises:
            ValueError: When required layer keys are missing or shapes
                disagree.
        """
        state_dict = _extract_actor_state_dict(loaded)
        try:
            fc1_weight = state_dict[_FC1_WEIGHT_KEY]
            fc2_weight = state_dict[_FC2_WEIGHT_KEY]
            head_weight = state_dict[_HEAD_WEIGHT_KEY]
        except KeyError as exc:
            msg = (
                f"FrozenMAPPOPartner: loaded actor state-dict is missing required "
                f"layer key {exc.args[0]!r}; expected the fc1/fc2/head MLP layout "
                f"(plan/04 §3.5). Got keys: {sorted(state_dict.keys())}."
            )
            raise ValueError(msg) from exc
        if not (
            isinstance(fc1_weight, torch.Tensor)
            and isinstance(fc2_weight, torch.Tensor)
            and isinstance(head_weight, torch.Tensor)
        ):
            msg = (
                "FrozenMAPPOPartner: actor state-dict layer values must be "
                "torch.Tensor; got non-tensor entries."
            )
            raise ValueError(msg)
        hidden_dim, obs_dim = int(fc1_weight.shape[0]), int(fc1_weight.shape[1])
        action_dim = int(head_weight.shape[0])
        if fc2_weight.shape != (hidden_dim, hidden_dim):
            msg = (
                f"FrozenMAPPOPartner: fc2 shape {tuple(fc2_weight.shape)} disagrees "
                f"with inferred hidden_dim {hidden_dim} (plan/04 §3.5)."
            )
            raise ValueError(msg)
        if head_weight.shape[1] != hidden_dim:
            msg = (
                f"FrozenMAPPOPartner: head input dim {head_weight.shape[1]} disagrees "
                f"with inferred hidden_dim {hidden_dim} (plan/04 §3.5)."
            )
            raise ValueError(msg)
        actor = _MAPPOActor(obs_dim=obs_dim, hidden_dim=hidden_dim, action_dim=action_dim)
        try:
            actor.load_state_dict(dict(state_dict))
        except RuntimeError as exc:
            msg = f"FrozenMAPPOPartner: load_state_dict failed for {self._weights_uri!r}: {exc}"
            raise ValueError(msg) from exc
        return actor

__init__

__init__(spec: PartnerSpec) -> None

Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

Parameters:

  • spec (PartnerSpec) –

    Identity-bearing handle. Reads spec.extra["uid"] for the env-side agent uid and spec.weights_uri for the checkpoint URI. Construction does not touch disk — the checkpoint is loaded lazily on first action / reset.

Raises:

  • ValueError

    If spec.weights_uri is None or spec.extra["uid"] is missing / empty.

Source code in src/chamber/partners/frozen_mappo.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def __init__(self, spec: PartnerSpec) -> None:
    """Bind the spec; defer checkpoint I/O (ADR-009 §Decision; plan/04 §3.3).

    Args:
        spec: Identity-bearing handle. Reads ``spec.extra["uid"]``
            for the env-side agent uid and ``spec.weights_uri`` for
            the checkpoint URI. Construction does not touch disk —
            the checkpoint is loaded lazily on first action / reset.

    Raises:
        ValueError: If ``spec.weights_uri`` is ``None`` or
            ``spec.extra["uid"]`` is missing / empty.
    """
    super().__init__(spec)
    if spec.weights_uri is None:
        msg = (
            f"FrozenMAPPOPartner requires spec.weights_uri (ADR-009 §Decision); "
            f"got None on spec {spec.class_name!r}."
        )
        raise ValueError(msg)
    uid = spec.extra.get("uid", "")
    if not uid:
        msg = (
            "FrozenMAPPOPartner requires spec.extra['uid'] (the env-side agent uid; "
            "plan/04 §3.5)."
        )
        raise ValueError(msg)
    self._uid: str = uid
    self._weights_uri: str = spec.weights_uri
    self._actor: _MAPPOActor | None = None

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Greedy MAPPO-actor forward pass under no-grad (ADR-009 §Decision).

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation mapping. Reads the ego's flat state from obs["agent"][uid]["state"] (plan/04 §3.4); other keys are ignored.

  • deterministic (bool, default: True ) –

    Accepted for Protocol conformance. The Phase-0 actor head is a deterministic linear layer with no sampled noise, so this kwarg has no effect.

Returns:

  • NDArray[floating]

    np.float32 action vector of length action_dim (the

  • NDArray[floating]

    head's output dimension, inferred from the loaded

  • NDArray[floating]

    checkpoint).

Raises:

  • ValueError

    If obs["agent"][uid]["state"] is missing or its length disagrees with the loaded actor's input dimension.

Source code in src/chamber/partners/frozen_mappo.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Greedy MAPPO-actor forward pass under no-grad (ADR-009 §Decision).

    Args:
        obs: Gymnasium observation mapping. Reads the ego's flat state
            from ``obs["agent"][uid]["state"]`` (plan/04 §3.4); other
            keys are ignored.
        deterministic: Accepted for Protocol conformance. The
            Phase-0 actor head is a deterministic linear layer with
            no sampled noise, so this kwarg has no effect.

    Returns:
        ``np.float32`` action vector of length ``action_dim`` (the
        head's output dimension, inferred from the loaded
        checkpoint).

    Raises:
        ValueError: If ``obs["agent"][uid]["state"]`` is missing or
            its length disagrees with the loaded actor's input
            dimension.
    """
    del deterministic
    actor = self._ensure_loaded()
    state = _read_flat_state(obs, self._uid)
    expected = actor.fc1.in_features
    if state.shape != (expected,):
        msg = (
            f"FrozenMAPPOPartner: obs['agent'][{self._uid!r}]['state'] has shape "
            f"{tuple(state.shape)}, expected ({expected},) matching the loaded "
            f"actor's fc1 input dim."
        )
        raise ValueError(msg)
    with torch.no_grad():
        input_t = torch.from_numpy(state).unsqueeze(0)
        action_t = actor(input_t)
    return action_t.squeeze(0).detach().cpu().numpy().astype(np.float32)

reset

reset(*, seed: int | None = None) -> None

Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

Parameters:

  • seed (int | None, default: None ) –

    Accepted for :class:~chamber.partners.api.FrozenPartner Protocol conformance. The partner is fully deterministic given the loaded weights and obs; seed is ignored.

Source code in src/chamber/partners/frozen_mappo.py
187
188
189
190
191
192
193
194
195
196
def reset(self, *, seed: int | None = None) -> None:
    """Ensure the checkpoint is loaded; no per-episode state (ADR-009 §Decision; plan/04 §2).

    Args:
        seed: Accepted for :class:`~chamber.partners.api.FrozenPartner`
            Protocol conformance. The partner is fully deterministic
            given the loaded weights and ``obs``; seed is ignored.
    """
    del seed
    self._ensure_loaded()

FrozenPartner

Bases: Protocol

A partner the ego agent treats as a black box (ADR-009 §Decision).

Implementations MUST NOT expose any update / train / learn methods at runtime. The base class :class:chamber.partners.interface.PartnerBase enforces this via a __getattr__ shield over _FORBIDDEN_ATTRS. Type checkers reject Protocol-typed access to train / learn / update_params because those names are not part of this Protocol.

Stateless across episodes: reset(*, seed) clears every internal cache (plan/04 §2 "Partner state" — Phase-0 deliberately excludes recurrent partner policies; if Phase 1 needs them, ADR amendment).

Source code in src/chamber/partners/api.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
@runtime_checkable
class FrozenPartner(Protocol):
    """A partner the ego agent treats as a black box (ADR-009 §Decision).

    Implementations MUST NOT expose any update / train / learn methods at
    runtime. The base class :class:`chamber.partners.interface.PartnerBase`
    enforces this via a ``__getattr__`` shield over ``_FORBIDDEN_ATTRS``.
    Type checkers reject Protocol-typed access to ``train`` / ``learn`` /
    ``update_params`` because those names are not part of this Protocol.

    Stateless across episodes: ``reset(*, seed)`` clears every internal
    cache (plan/04 §2 "Partner state" — Phase-0 deliberately excludes
    recurrent partner policies; if Phase 1 needs them, ADR amendment).
    """

    spec: PartnerSpec

    def reset(self, *, seed: int | None = None) -> None:
        """Reset internal episode state (ADR-009 §Decision; P6 reproducibility).

        Args:
            seed: Optional root seed for the partner's deterministic RNG
                substream. Implementations route this through
                :func:`concerto.training.seeding.derive_substream`.
        """
        ...  # pragma: no cover

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Map observation to action (ADR-009 §Decision; ADR-003 §Decision).

        The observation mapping is the Gymnasium-conformant
        ``{"agent": {uid: proprio}, "comm": CommPacket, "meta": {...}}``
        produced by :mod:`chamber.envs` wrappers. Partners read
        ``obs["agent"][uid]`` for proprioception and ``obs["comm"]`` for the
        ADR-003 fixed-format channel; they MUST NOT touch any other key.

        The accepted type is :class:`~collections.abc.Mapping` rather than
        :class:`dict` so that callers can pass concrete nested dicts without
        running into Mapping-vs-dict invariance issues at the call site.

        Args:
            obs: Gymnasium observation dict.
            deterministic: Whether to sample greedily (default ``True``);
                stochastic policies pass ``False``. Scripted partners ignore.

        Returns:
            Action vector for the partner's own uid, shape and dtype defined
            by the env's action space slice for that uid.
        """
        ...  # pragma: no cover

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Map observation to action (ADR-009 §Decision; ADR-003 §Decision).

The observation mapping is the Gymnasium-conformant {"agent": {uid: proprio}, "comm": CommPacket, "meta": {...}} produced by :mod:chamber.envs wrappers. Partners read obs["agent"][uid] for proprioception and obs["comm"] for the ADR-003 fixed-format channel; they MUST NOT touch any other key.

The accepted type is :class:~collections.abc.Mapping rather than :class:dict so that callers can pass concrete nested dicts without running into Mapping-vs-dict invariance issues at the call site.

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation dict.

  • deterministic (bool, default: True ) –

    Whether to sample greedily (default True); stochastic policies pass False. Scripted partners ignore.

Returns:

  • NDArray[floating]

    Action vector for the partner's own uid, shape and dtype defined

  • NDArray[floating]

    by the env's action space slice for that uid.

Source code in src/chamber/partners/api.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Map observation to action (ADR-009 §Decision; ADR-003 §Decision).

    The observation mapping is the Gymnasium-conformant
    ``{"agent": {uid: proprio}, "comm": CommPacket, "meta": {...}}``
    produced by :mod:`chamber.envs` wrappers. Partners read
    ``obs["agent"][uid]`` for proprioception and ``obs["comm"]`` for the
    ADR-003 fixed-format channel; they MUST NOT touch any other key.

    The accepted type is :class:`~collections.abc.Mapping` rather than
    :class:`dict` so that callers can pass concrete nested dicts without
    running into Mapping-vs-dict invariance issues at the call site.

    Args:
        obs: Gymnasium observation dict.
        deterministic: Whether to sample greedily (default ``True``);
            stochastic policies pass ``False``. Scripted partners ignore.

    Returns:
        Action vector for the partner's own uid, shape and dtype defined
        by the env's action space slice for that uid.
    """
    ...  # pragma: no cover

reset

reset(*, seed: int | None = None) -> None

Reset internal episode state (ADR-009 §Decision; P6 reproducibility).

Parameters:

  • seed (int | None, default: None ) –

    Optional root seed for the partner's deterministic RNG substream. Implementations route this through :func:concerto.training.seeding.derive_substream.

Source code in src/chamber/partners/api.py
110
111
112
113
114
115
116
117
118
def reset(self, *, seed: int | None = None) -> None:
    """Reset internal episode state (ADR-009 §Decision; P6 reproducibility).

    Args:
        seed: Optional root seed for the partner's deterministic RNG
            substream. Implementations route this through
            :func:`concerto.training.seeding.derive_substream`.
    """
    ...  # pragma: no cover

OpenVLAPartner

Bases: PartnerBase

OpenVLA LoRA-adapted specialist — Phase-1 stub (ADR-010 §Decision Option B).

LoRA fine-tune protocol (ADR-010 §Decision; ADR-009 §Decision): 10-150 demos, single GPU at ≤7.0 GB VRAM, 256-bin-per-DoF action tokenisation. The 5-15 Hz throughput ceiling is enforced at the env wrapper layer (chamber.envs.action_repeat), not here.

Source code in src/chamber/partners/stubs/openvla.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
@register_partner("openvla_lora_specialist")
class OpenVLAPartner(PartnerBase):
    """OpenVLA LoRA-adapted specialist — Phase-1 stub (ADR-010 §Decision Option B).

    LoRA fine-tune protocol (ADR-010 §Decision; ADR-009 §Decision):
    10-150 demos, single GPU at ≤7.0 GB VRAM, 256-bin-per-DoF action
    tokenisation. The 5-15 Hz throughput ceiling is enforced at the env
    wrapper layer (``chamber.envs.action_repeat``), not here.
    """

    def reset(self, *, seed: int | None = None) -> NoReturn:
        """Phase-1 stub — raises (ADR-010 §Decision Option B).

        Args:
            seed: Accepted for Protocol conformance only.

        Raises:
            NotImplementedError: Always; the real LoRA inference harness is
                Phase-1 work.
        """
        del seed
        raise NotImplementedError(
            f"OpenVLAPartner.reset is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
        )

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Phase-1 stub — raises (ADR-010 §Decision Option B).

        Args:
            obs: Gymnasium observation mapping.
            deterministic: Accepted for Protocol conformance only.

        Raises:
            NotImplementedError: Always; the real 256-bin action-token
                inference is Phase-1 work.
        """
        del obs, deterministic
        raise NotImplementedError(
            f"OpenVLAPartner.act is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
        )

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Phase-1 stub — raises (ADR-010 §Decision Option B).

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation mapping.

  • deterministic (bool, default: True ) –

    Accepted for Protocol conformance only.

Raises:

  • NotImplementedError

    Always; the real 256-bin action-token inference is Phase-1 work.

Source code in src/chamber/partners/stubs/openvla.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Phase-1 stub — raises (ADR-010 §Decision Option B).

    Args:
        obs: Gymnasium observation mapping.
        deterministic: Accepted for Protocol conformance only.

    Raises:
        NotImplementedError: Always; the real 256-bin action-token
            inference is Phase-1 work.
    """
    del obs, deterministic
    raise NotImplementedError(
        f"OpenVLAPartner.act is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
    )

reset

reset(*, seed: int | None = None) -> NoReturn

Phase-1 stub — raises (ADR-010 §Decision Option B).

Parameters:

  • seed (int | None, default: None ) –

    Accepted for Protocol conformance only.

Raises:

  • NotImplementedError

    Always; the real LoRA inference harness is Phase-1 work.

Source code in src/chamber/partners/stubs/openvla.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def reset(self, *, seed: int | None = None) -> NoReturn:
    """Phase-1 stub — raises (ADR-010 §Decision Option B).

    Args:
        seed: Accepted for Protocol conformance only.

    Raises:
        NotImplementedError: Always; the real LoRA inference harness is
            Phase-1 work.
    """
    del seed
    raise NotImplementedError(
        f"OpenVLAPartner.reset is Phase-1 work; see ADR-010 §Decision and {_PHASE1_TICKET}."
    )

PartnerBase

Common base class for every concrete partner (ADR-009 §Consequences).

Subclasses MUST implement reset(*, seed) and act(obs, *, deterministic) per the :class:~chamber.partners.api.FrozenPartner Protocol. The base class blocks any attribute lookup in :data:_FORBIDDEN_ATTRS so the AHT no-joint-training constraint is enforced even when a careless caller imports the underlying torch module directly.

Attributes:

  • spec (PartnerSpec) –

    The :class:~chamber.partners.api.PartnerSpec the partner was constructed from. Read-only by convention; the registry assumes Class(spec) is the only constructor signature.

Source code in src/chamber/partners/interface.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class PartnerBase:
    """Common base class for every concrete partner (ADR-009 §Consequences).

    Subclasses MUST implement ``reset(*, seed)`` and
    ``act(obs, *, deterministic)`` per the
    :class:`~chamber.partners.api.FrozenPartner` Protocol. The base class
    blocks any attribute lookup in :data:`_FORBIDDEN_ATTRS` so the AHT
    no-joint-training constraint is enforced even when a careless caller
    imports the underlying torch module directly.

    Attributes:
        spec: The :class:`~chamber.partners.api.PartnerSpec` the partner was
            constructed from. Read-only by convention; the registry assumes
            ``Class(spec)`` is the only constructor signature.
    """

    spec: PartnerSpec

    def __init__(self, spec: PartnerSpec) -> None:
        """Bind the partner spec (ADR-009 §Decision).

        Args:
            spec: Identity-bearing handle from
                :class:`~chamber.partners.api.PartnerSpec`. The constructor
                does no checkpoint I/O so subclasses can defer heavy loads
                until :meth:`reset`.
        """
        self.spec = spec

    def __getattr__(self, name: str) -> NoReturn:
        """Block forbidden attribute lookups (ADR-009 §Consequences).

        ``__getattr__`` is only invoked for names not found via the normal
        attribute lookup path, so this method does not interfere with
        legitimate methods defined on subclasses or with ``self.spec``.

        Args:
            name: Attribute name being looked up.

        Raises:
            AttributeError: Always — either with the ADR-009 message for
                forbidden names, or with the Pythonic-default message for
                anything else (so ``hasattr`` works as expected).
        """
        if name in _FORBIDDEN_ATTRS:
            raise AttributeError(
                f"FrozenPartner of class {type(self).__name__!r} forbids attribute "
                f"{name!r} (ADR-009 §Consequences: black-box AHT — no joint training)."
            )
        raise AttributeError(name)

__getattr__

__getattr__(name: str) -> NoReturn

Block forbidden attribute lookups (ADR-009 §Consequences).

__getattr__ is only invoked for names not found via the normal attribute lookup path, so this method does not interfere with legitimate methods defined on subclasses or with self.spec.

Parameters:

  • name (str) –

    Attribute name being looked up.

Raises:

  • AttributeError

    Always — either with the ADR-009 message for forbidden names, or with the Pythonic-default message for anything else (so hasattr works as expected).

Source code in src/chamber/partners/interface.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def __getattr__(self, name: str) -> NoReturn:
    """Block forbidden attribute lookups (ADR-009 §Consequences).

    ``__getattr__`` is only invoked for names not found via the normal
    attribute lookup path, so this method does not interfere with
    legitimate methods defined on subclasses or with ``self.spec``.

    Args:
        name: Attribute name being looked up.

    Raises:
        AttributeError: Always — either with the ADR-009 message for
            forbidden names, or with the Pythonic-default message for
            anything else (so ``hasattr`` works as expected).
    """
    if name in _FORBIDDEN_ATTRS:
        raise AttributeError(
            f"FrozenPartner of class {type(self).__name__!r} forbids attribute "
            f"{name!r} (ADR-009 §Consequences: black-box AHT — no joint training)."
        )
    raise AttributeError(name)

__init__

__init__(spec: PartnerSpec) -> None

Bind the partner spec (ADR-009 §Decision).

Parameters:

  • spec (PartnerSpec) –

    Identity-bearing handle from :class:~chamber.partners.api.PartnerSpec. The constructor does no checkpoint I/O so subclasses can defer heavy loads until :meth:reset.

Source code in src/chamber/partners/interface.py
59
60
61
62
63
64
65
66
67
68
def __init__(self, spec: PartnerSpec) -> None:
    """Bind the partner spec (ADR-009 §Decision).

    Args:
        spec: Identity-bearing handle from
            :class:`~chamber.partners.api.PartnerSpec`. The constructor
            does no checkpoint I/O so subclasses can defer heavy loads
            until :meth:`reset`.
    """
    self.spec = spec

PartnerSpec dataclass

Identity-bearing spec used to build, load, and address a partner (ADR-009 §Decision).

The :attr:partner_id property is the stable hash that the M3 conformal filter reads on every step (ADR-004 §risk-mitigation #2; ADR-006 risk #3). Two specs that hash the same partner_id refer to the same logical partner; a partner-set change between steps re-initialises the conformal lambda.

Attributes:

  • class_name (str) –

    Registry key — the string passed to :func:chamber.partners.registry.register_partner.

  • seed (int) –

    Training-time seed; for scripted partners use 0.

  • checkpoint_step (int | None) –

    Training step at which the checkpoint was taken; None for scripted partners or for stubs.

  • weights_uri (str | None) –

    local:// or remote URI pointing at the frozen checkpoint; None for scripted partners.

  • extra (dict[str, str]) –

    Free-form string→string dict for task / embodiment / uid metadata. Read by the partner implementation, NOT by the hash.

Source code in src/chamber/partners/api.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
@dataclass(frozen=True)
class PartnerSpec:
    """Identity-bearing spec used to build, load, and address a partner (ADR-009 §Decision).

    The :attr:`partner_id` property is the stable hash that the M3 conformal
    filter reads on every step (ADR-004 §risk-mitigation #2; ADR-006 risk #3).
    Two specs that hash the same partner_id refer to the same logical partner;
    a partner-set change between steps re-initialises the conformal ``lambda``.

    Attributes:
        class_name: Registry key — the string passed to
            :func:`chamber.partners.registry.register_partner`.
        seed: Training-time seed; for scripted partners use ``0``.
        checkpoint_step: Training step at which the checkpoint was taken;
            ``None`` for scripted partners or for stubs.
        weights_uri: ``local://`` or remote URI pointing at the frozen
            checkpoint; ``None`` for scripted partners.
        extra: Free-form string→string dict for task / embodiment / uid
            metadata. Read by the partner implementation, NOT by the hash.
    """

    class_name: str
    seed: int
    checkpoint_step: int | None
    weights_uri: str | None
    extra: dict[str, str] = field(default_factory=dict)

    @property
    def partner_id(self) -> str:
        """Stable 16-hex-char hash of ``(class_name, seed, checkpoint_step, weights_uri)``.

        ADR-004 §risk-mitigation #2 / ADR-006 risk #3: surfaced on
        ``obs["meta"]["partner_id"]`` so the conformal filter can re-initialise
        ``lambda`` on partner identity change. The 16-char (64-bit) keyspace
        is sufficient for the Phase-1 zoo target of 144 partners with vast
        margin (plan/04 §8 Notes).

        The hash material uses :func:`repr` of a tuple of the four identity
        fields (in this fixed order) rather than f-string interpolation so
        ``None`` cannot collide with the literal string ``"None"`` — a real
        risk if a Phase-1 partner ever ships with ``weights_uri="None"``
        (ADR-006 risk #3 swap-detection contract). Reordering the fields
        in :class:`PartnerSpec` would silently change every existing
        partner_id; do not reorder without a new ADR.

        Returns:
            Lowercase 16-character hex digest of the SHA-256 of
            ``repr((class_name, seed, checkpoint_step, weights_uri))``.
            ``extra`` is intentionally excluded so per-task / per-embodiment
            metadata does not collide with the conformal filter's
            swap-detection contract (plan/04 §3.1).
        """
        material = repr((self.class_name, self.seed, self.checkpoint_step, self.weights_uri))
        return hashlib.sha256(material.encode("utf-8")).hexdigest()[:16]

partner_id property

partner_id: str

Stable 16-hex-char hash of (class_name, seed, checkpoint_step, weights_uri).

ADR-004 §risk-mitigation #2 / ADR-006 risk #3: surfaced on obs["meta"]["partner_id"] so the conformal filter can re-initialise lambda on partner identity change. The 16-char (64-bit) keyspace is sufficient for the Phase-1 zoo target of 144 partners with vast margin (plan/04 §8 Notes).

The hash material uses :func:repr of a tuple of the four identity fields (in this fixed order) rather than f-string interpolation so None cannot collide with the literal string "None" — a real risk if a Phase-1 partner ever ships with weights_uri="None" (ADR-006 risk #3 swap-detection contract). Reordering the fields in :class:PartnerSpec would silently change every existing partner_id; do not reorder without a new ADR.

Returns:

  • str

    Lowercase 16-character hex digest of the SHA-256 of

  • str

    repr((class_name, seed, checkpoint_step, weights_uri)).

  • str

    extra is intentionally excluded so per-task / per-embodiment

  • str

    metadata does not collide with the conformal filter's

  • str

    swap-detection contract (plan/04 §3.1).

ScriptedHeuristicPartner

Bases: PartnerBase

Greedy planar-reach heuristic partner (ADR-009 §Consequences; plan/04 §3.4).

Reads the partner's own xy from obs["comm"]["pose"][uid]["xyz"] (ADR-003 §Decision) when available, else from obs["agent"][uid]["state"][:2]. Outputs a unit-clipped xy velocity toward the configured target xy.

The partner's own uid is read from spec.extra["uid"] and the target from spec.extra["target_xy"] (default "0.0,0.0"). Action dim is set via spec.extra["action_dim"] (default "2"); components beyond the planar xy are zeroed.

Source code in src/chamber/partners/heuristic.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
@register_partner("scripted_heuristic")
class ScriptedHeuristicPartner(PartnerBase):
    """Greedy planar-reach heuristic partner (ADR-009 §Consequences; plan/04 §3.4).

    Reads the partner's own xy from ``obs["comm"]["pose"][uid]["xyz"]`` (ADR-003
    §Decision) when available, else from ``obs["agent"][uid]["state"][:2]``.
    Outputs a unit-clipped xy velocity toward the configured target xy.

    The partner's own uid is read from ``spec.extra["uid"]`` and the target
    from ``spec.extra["target_xy"]`` (default ``"0.0,0.0"``). Action dim is
    set via ``spec.extra["action_dim"]`` (default ``"2"``); components beyond
    the planar xy are zeroed.
    """

    def __init__(self, spec: PartnerSpec) -> None:
        """Bind the spec and parse the heuristic's tunables (ADR-009 §Decision).

        Args:
            spec: Partner identity. ``spec.extra`` controls the policy:
                ``uid`` selects the env-side agent, ``target_xy`` the goal,
                ``action_dim`` the output vector length.

        Raises:
            ValueError: If ``spec.extra["target_xy"]`` is malformed or
                ``action_dim`` is not a positive integer ≥ 2.
        """
        super().__init__(spec)
        self._uid: str = spec.extra.get("uid", "")
        self._target_xy: tuple[float, float] = _parse_xy(spec.extra.get("target_xy", "0.0,0.0"))
        action_dim_raw = spec.extra.get("action_dim", "2")
        try:
            action_dim = int(action_dim_raw)
        except ValueError as exc:
            raise ValueError(
                f"action_dim must be a positive int ≥ 2; got {action_dim_raw!r}"
            ) from exc
        if action_dim < _MIN_ACTION_DIM:
            raise ValueError(f"action_dim must be a positive int ≥ 2; got {action_dim}")
        self._action_dim: int = action_dim

    def reset(self, *, seed: int | None = None) -> None:
        """No-op reset (ADR-009 §Decision; plan/04 §2 stateless across episodes).

        Phase-0 heuristic carries no episode state. The seed is ignored
        deliberately; the policy is fully deterministic from ``obs``.

        Args:
            seed: Accepted for Protocol conformance only.
        """
        del seed

    def act(
        self,
        obs: Mapping[str, object],
        *,
        deterministic: bool = True,
    ) -> NDArray[np.floating]:
        """Greedy planar-reach action toward ``spec.extra["target_xy"]`` (ADR-009 §Consequences).

        Args:
            obs: Gymnasium observation mapping keyed by ``"agent"`` and
                optionally ``"comm"``. ADR-003 §Decision: when ``"comm"`` is
                present, pose is read from ``obs["comm"]["pose"][uid]``.
            deterministic: Ignored; the heuristic is deterministic regardless.

        Returns:
            Float32 action vector of length ``spec.extra["action_dim"]``;
            components 0/1 hold the unit-clipped xy velocity toward target,
            remaining components are zero.
        """
        del deterministic
        x, y = self._read_agent_xy(obs)
        tx, ty = self._target_xy
        dx = float(np.clip(tx - x, -_ACTION_CLIP, _ACTION_CLIP))
        dy = float(np.clip(ty - y, -_ACTION_CLIP, _ACTION_CLIP))
        action = np.zeros(self._action_dim, dtype=np.float32)
        action[0] = dx
        action[1] = dy
        return action

    def _read_agent_xy(self, obs: Mapping[str, object]) -> tuple[float, float]:
        """Extract the partner's planar xy from ``obs`` (ADR-003 §Decision).

        Reads ``obs["comm"]["pose"][uid]["xyz"]`` when present (the M2
        fixed-format channel surface), falling back to
        ``obs["agent"][uid]["state"][:2]`` for envs without comm wired up
        (e.g. the unit-test :class:`tests.fakes.FakeMultiAgentEnv`).

        Args:
            obs: Gymnasium observation mapping.

        Returns:
            ``(x, y)`` tuple of Python floats.
        """
        comm = obs.get("comm")
        if isinstance(comm, Mapping):
            pose_table = comm.get("pose")
            if isinstance(pose_table, Mapping) and self._uid in pose_table:
                pose = pose_table[self._uid]
                if isinstance(pose, Mapping):
                    xyz = pose.get("xyz")
                    if isinstance(xyz, (tuple, list)) and len(xyz) >= _XY_DIMS:
                        return float(xyz[0]), float(xyz[1])
        agent = obs.get("agent")
        if isinstance(agent, Mapping) and self._uid in agent:
            entry = agent[self._uid]
            if isinstance(entry, Mapping):
                state = entry.get("state")
                if isinstance(state, np.ndarray) and state.shape[0] >= _XY_DIMS:
                    arr = cast("NDArray[np.floating]", state)
                    return float(arr[0]), float(arr[1])
        return 0.0, 0.0

__init__

__init__(spec: PartnerSpec) -> None

Bind the spec and parse the heuristic's tunables (ADR-009 §Decision).

Parameters:

  • spec (PartnerSpec) –

    Partner identity. spec.extra controls the policy: uid selects the env-side agent, target_xy the goal, action_dim the output vector length.

Raises:

  • ValueError

    If spec.extra["target_xy"] is malformed or action_dim is not a positive integer ≥ 2.

Source code in src/chamber/partners/heuristic.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def __init__(self, spec: PartnerSpec) -> None:
    """Bind the spec and parse the heuristic's tunables (ADR-009 §Decision).

    Args:
        spec: Partner identity. ``spec.extra`` controls the policy:
            ``uid`` selects the env-side agent, ``target_xy`` the goal,
            ``action_dim`` the output vector length.

    Raises:
        ValueError: If ``spec.extra["target_xy"]`` is malformed or
            ``action_dim`` is not a positive integer ≥ 2.
    """
    super().__init__(spec)
    self._uid: str = spec.extra.get("uid", "")
    self._target_xy: tuple[float, float] = _parse_xy(spec.extra.get("target_xy", "0.0,0.0"))
    action_dim_raw = spec.extra.get("action_dim", "2")
    try:
        action_dim = int(action_dim_raw)
    except ValueError as exc:
        raise ValueError(
            f"action_dim must be a positive int ≥ 2; got {action_dim_raw!r}"
        ) from exc
    if action_dim < _MIN_ACTION_DIM:
        raise ValueError(f"action_dim must be a positive int ≥ 2; got {action_dim}")
    self._action_dim: int = action_dim

act

act(
    obs: Mapping[str, object], *, deterministic: bool = True
) -> NDArray[np.floating]

Greedy planar-reach action toward spec.extra["target_xy"] (ADR-009 §Consequences).

Parameters:

  • obs (Mapping[str, object]) –

    Gymnasium observation mapping keyed by "agent" and optionally "comm". ADR-003 §Decision: when "comm" is present, pose is read from obs["comm"]["pose"][uid].

  • deterministic (bool, default: True ) –

    Ignored; the heuristic is deterministic regardless.

Returns:

  • NDArray[floating]

    Float32 action vector of length spec.extra["action_dim"];

  • NDArray[floating]

    components 0/1 hold the unit-clipped xy velocity toward target,

  • NDArray[floating]

    remaining components are zero.

Source code in src/chamber/partners/heuristic.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def act(
    self,
    obs: Mapping[str, object],
    *,
    deterministic: bool = True,
) -> NDArray[np.floating]:
    """Greedy planar-reach action toward ``spec.extra["target_xy"]`` (ADR-009 §Consequences).

    Args:
        obs: Gymnasium observation mapping keyed by ``"agent"`` and
            optionally ``"comm"``. ADR-003 §Decision: when ``"comm"`` is
            present, pose is read from ``obs["comm"]["pose"][uid]``.
        deterministic: Ignored; the heuristic is deterministic regardless.

    Returns:
        Float32 action vector of length ``spec.extra["action_dim"]``;
        components 0/1 hold the unit-clipped xy velocity toward target,
        remaining components are zero.
    """
    del deterministic
    x, y = self._read_agent_xy(obs)
    tx, ty = self._target_xy
    dx = float(np.clip(tx - x, -_ACTION_CLIP, _ACTION_CLIP))
    dy = float(np.clip(ty - y, -_ACTION_CLIP, _ACTION_CLIP))
    action = np.zeros(self._action_dim, dtype=np.float32)
    action[0] = dx
    action[1] = dy
    return action

reset

reset(*, seed: int | None = None) -> None

No-op reset (ADR-009 §Decision; plan/04 §2 stateless across episodes).

Phase-0 heuristic carries no episode state. The seed is ignored deliberately; the policy is fully deterministic from obs.

Parameters:

  • seed (int | None, default: None ) –

    Accepted for Protocol conformance only.

Source code in src/chamber/partners/heuristic.py
112
113
114
115
116
117
118
119
120
121
def reset(self, *, seed: int | None = None) -> None:
    """No-op reset (ADR-009 §Decision; plan/04 §2 stateless across episodes).

    Phase-0 heuristic carries no episode state. The seed is ignored
    deliberately; the policy is fully deterministic from ``obs``.

    Args:
        seed: Accepted for Protocol conformance only.
    """
    del seed

list_registered

list_registered() -> list[str]

Return the sorted list of registered partner class names (ADR-009 §Decision).

Used by the eval CLI to enumerate the available zoo strata and by tests to assert the Phase-0 draft zoo classes are wired up.

Returns:

  • list[str]

    A new list, freshly sorted on every call.

Source code in src/chamber/partners/registry.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def list_registered() -> list[str]:
    """Return the sorted list of registered partner class names (ADR-009 §Decision).

    Used by the eval CLI to enumerate the available zoo strata and by tests
    to assert the Phase-0 draft zoo classes are wired up.

    Returns:
        A new list, freshly sorted on every call.
    """
    return sorted(_REGISTRY.keys())

load_partner

load_partner(spec: PartnerSpec) -> FrozenPartner

Build a partner instance from its spec via the registry (ADR-009 §Decision).

Parameters:

  • spec (PartnerSpec) –

    The :class:~chamber.partners.api.PartnerSpec whose :attr:~chamber.partners.api.PartnerSpec.class_name selects the registered class.

Returns:

Raises:

  • KeyError

    If spec.class_name is not registered. The error message lists the currently-registered keys to make typos easy to fix.

Source code in src/chamber/partners/registry.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def load_partner(spec: PartnerSpec) -> FrozenPartner:
    """Build a partner instance from its spec via the registry (ADR-009 §Decision).

    Args:
        spec: The :class:`~chamber.partners.api.PartnerSpec` whose
            :attr:`~chamber.partners.api.PartnerSpec.class_name` selects the
            registered class.

    Returns:
        A fresh :class:`~chamber.partners.api.FrozenPartner` instance built
        by ``Class(spec)``.

    Raises:
        KeyError: If ``spec.class_name`` is not registered. The error message
            lists the currently-registered keys to make typos easy to fix.
    """
    if spec.class_name not in _REGISTRY:
        known = ", ".join(sorted(_REGISTRY.keys())) or "<none>"
        raise KeyError(f"No partner class registered as {spec.class_name!r}; known: {known}")
    cls = _REGISTRY[spec.class_name]
    # ``FrozenPartner`` is a Protocol without a constructor signature, so
    # pyright cannot prove ``cls(spec)`` is well-typed. The registry's
    # contract (plan/04 §3.2) is that every registered class accepts
    # ``__init__(self, spec: PartnerSpec)`` — :class:`PartnerBase` enforces
    # this. Do NOT widen the Protocol with ``__init__`` to silence this.
    return cls(spec)  # type: ignore[call-arg]

make_phase0_draft_zoo

make_phase0_draft_zoo() -> list[PartnerSpec]

Return the 3-partner Stage-3 draft zoo (ADR-009 §Consequences "draft-zoo scoping").

The draft zoo is the minimum viable zoo that lets the ADR-007 rev-3 Stage-3 PF spike measure trained-with vs frozen-novel ≥20pp gap and instrument the partner-swap λ-reset transient (ADR-006 risk #3 / ADR-004 §risk-mitigation #2). All three specs target the Stage-0 smoke task so they share an env shape; the per-partner uid matches the ADR-001 smoke robot tuple (panda_wristcam / fetch / allegro_hand_right).

The frozen-RL checkpoint URIs (local://artifacts/...) are produced by M4b training runs. Both :class:~chamber.partners.frozen_mappo.FrozenMAPPOPartner (T4.5, registered as frozen_mappo) and :class:~chamber.partners.frozen_harl.FrozenHARLPartner (T4.6, registered as frozen_harl) load their specs via :func:chamber.partners.registry.load_partner. The full M4 gate (plan/04 §6 #1, T4.9 draft-zoo round-trip integration test) lands in a follow-up PR.

Returns:

  • list[PartnerSpec]

    A fresh list of 3 :class:~chamber.partners.api.PartnerSpec

  • list[PartnerSpec]

    instances. Callers may append to the list freely; the function

  • list[PartnerSpec]

    rebuilds it every time so the canonical zoo is immutable.

Source code in src/chamber/partners/selection.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def make_phase0_draft_zoo() -> list[PartnerSpec]:
    """Return the 3-partner Stage-3 draft zoo (ADR-009 §Consequences "draft-zoo scoping").

    The draft zoo is the minimum viable zoo that lets the ADR-007 rev-3
    Stage-3 PF spike measure trained-with vs frozen-novel ≥20pp gap and
    instrument the partner-swap λ-reset transient (ADR-006 risk #3 / ADR-004
    §risk-mitigation #2). All three specs target the Stage-0 smoke task so
    they share an env shape; the per-partner ``uid`` matches the ADR-001
    smoke robot tuple (panda_wristcam / fetch / allegro_hand_right).

    The frozen-RL checkpoint URIs (``local://artifacts/...``) are produced
    by M4b training runs. Both
    :class:`~chamber.partners.frozen_mappo.FrozenMAPPOPartner` (T4.5,
    registered as ``frozen_mappo``) and
    :class:`~chamber.partners.frozen_harl.FrozenHARLPartner` (T4.6,
    registered as ``frozen_harl``) load their specs via
    :func:`chamber.partners.registry.load_partner`. The full M4 gate
    (plan/04 §6 #1, T4.9 draft-zoo round-trip integration test) lands in
    a follow-up PR.

    Returns:
        A fresh list of 3 :class:`~chamber.partners.api.PartnerSpec`
        instances. Callers may append to the list freely; the function
        rebuilds it every time so the canonical zoo is immutable.
    """
    return [
        PartnerSpec(
            class_name="scripted_heuristic",
            seed=0,
            checkpoint_step=None,
            weights_uri=None,
            extra={
                "uid": "fetch",
                "task": "stage0_smoke",
                "target_xy": "0.0,0.0",
                "action_dim": "2",
            },
        ),
        PartnerSpec(
            class_name="frozen_mappo",
            seed=42,
            checkpoint_step=100_000,
            weights_uri="local://artifacts/mappo_seed42_step100k.pt",
            extra={
                "uid": "panda_wristcam",
                "task": "stage0_smoke",
            },
        ),
        PartnerSpec(
            class_name="frozen_harl",
            seed=7,
            checkpoint_step=50_000,
            weights_uri="local://artifacts/happo_seed7_step50k.pt",
            extra={
                "uid": "allegro_hand_right",
                "task": "stage0_smoke",
                "checkpoint_tier": "50pct_reward",
            },
        ),
    ]

register_partner

register_partner(
    class_name: str,
) -> Callable[[type[T]], type[T]]

Register a partner class by name (ADR-009 §Decision; plan/04 §3.2).

Used as a class decorator:

.. code-block:: python

@register_partner("scripted_heuristic")
class ScriptedHeuristicPartner(PartnerBase): ...

Parameters:

  • class_name (str) –

    Registry key. Must be unique process-wide; re-registering the same key raises :class:ValueError so name collisions are caught at import time.

Returns:

  • Callable[[type[T]], type[T]]

    A decorator that records the class against class_name and returns

  • Callable[[type[T]], type[T]]

    the class unchanged.

Raises:

  • ValueError

    If class_name is already registered.

Source code in src/chamber/partners/registry.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def register_partner(class_name: str) -> Callable[[type[T]], type[T]]:
    """Register a partner class by name (ADR-009 §Decision; plan/04 §3.2).

    Used as a class decorator:

    .. code-block:: python

        @register_partner("scripted_heuristic")
        class ScriptedHeuristicPartner(PartnerBase): ...

    Args:
        class_name: Registry key. Must be unique process-wide; re-registering
            the same key raises :class:`ValueError` so name collisions are
            caught at import time.

    Returns:
        A decorator that records the class against ``class_name`` and returns
        the class unchanged.

    Raises:
        ValueError: If ``class_name`` is already registered.
    """

    def _decorator(cls: type[T]) -> type[T]:
        if class_name in _REGISTRY:
            raise ValueError(f"Partner class {class_name!r} is already registered")
        _REGISTRY[class_name] = cls
        return cls

    return _decorator

select_zoo

select_zoo(*args: object, **kwargs: object) -> NoReturn

FCP/MEP-style zoo construction — Phase-1 stub (ADR-009 §Decision).

The Phase-1 implementation is the three-axis pipeline (policy class x seed x checkpoint) with MEP Population Entropy as the per-stratum admission filter (ADR-009 §Validation criteria "By Phase-1 end").

Parameters:

  • *args (object, default: () ) –

    Reserved for the Phase-1 signature.

  • **kwargs (object, default: {} ) –

    Reserved for the Phase-1 signature.

Raises:

  • NotImplementedError

    Always. Phase-0 callers must use :func:make_phase0_draft_zoo instead.

Source code in src/chamber/partners/selection.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def select_zoo(*args: object, **kwargs: object) -> NoReturn:
    """FCP/MEP-style zoo construction — Phase-1 stub (ADR-009 §Decision).

    The Phase-1 implementation is the three-axis pipeline (policy class x
    seed x checkpoint) with MEP Population Entropy as the per-stratum
    admission filter (ADR-009 §Validation criteria "By Phase-1 end").

    Args:
        *args: Reserved for the Phase-1 signature.
        **kwargs: Reserved for the Phase-1 signature.

    Raises:
        NotImplementedError: Always. Phase-0 callers must use
            :func:`make_phase0_draft_zoo` instead.
    """
    del args, kwargs
    raise NotImplementedError(
        "Phase-1 work; see ADR-009 §Validation criteria 'By Phase-1 end' for "
        "the FCP/MEP-style construction protocol with Population Entropy "
        "admission filter. Phase-0 callers should use make_phase0_draft_zoo()."
    )

chamber.evaluation

CHAMBER evaluation harness — HRS bundle, leaderboard, safety reports.

Implements the HRS bundle composition (Option D, axis-survival rule) from ADR-008 and consumes the three-table safety reporting format from ADR-014. Statistical tooling (cluster + paired-cluster bootstrap) and the pre-registration discipline (ADR-007 §Discipline) live in sub-modules of this package.

Public modules:

  • :mod:chamber.evaluation.results — Pydantic schema for spike runs and leaderboard entries (ADR-008 §Decision).
  • :mod:chamber.evaluation.prereg — YAML loader + git-tag SHA verification (ADR-007 §Discipline).
  • :mod:chamber.evaluation.bootstrap — cluster + paired-cluster bootstrap with rliable-compatible aggregate metrics (ADR-008 §Decision; reviewer P1-9).
  • :mod:chamber.evaluation.hrs — HRS vector + scalar (ADR-008 §Decision; reviewer P1-8).
  • :mod:chamber.evaluation.render — three-table safety report + leaderboard renderers (ADR-014 §Decision, ADR-008 §Decision).

BootstrapCI dataclass

Bootstrap point estimate + 95% CI (ADR-008 §Decision).

Attributes:

  • iqm (float) –

    Interquartile mean of the resampled point estimates.

  • mean (float) –

    Plain arithmetic mean of the resampled point estimates.

  • ci_low (float) –

    2.5th percentile of the resampled estimates (lower bound of the 95% CI).

  • ci_high (float) –

    97.5th percentile of the resampled estimates (upper bound of the 95% CI).

  • n_resamples (int) –

    Number of bootstrap resamples that produced the interval.

Source code in src/chamber/evaluation/bootstrap.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
@dataclass(frozen=True)
class BootstrapCI:
    """Bootstrap point estimate + 95% CI (ADR-008 §Decision).

    Attributes:
        iqm: Interquartile mean of the resampled point estimates.
        mean: Plain arithmetic mean of the resampled point estimates.
        ci_low: 2.5th percentile of the resampled estimates (lower
            bound of the 95% CI).
        ci_high: 97.5th percentile of the resampled estimates (upper
            bound of the 95% CI).
        n_resamples: Number of bootstrap resamples that produced the
            interval.
    """

    iqm: float
    mean: float
    ci_low: float
    ci_high: float
    n_resamples: int

ConditionPair

Bases: BaseModel

Pre-registered homogeneous-vs-heterogeneous baseline pair (ADR-007 §Validation criteria).

Each Phase-0 spike runs at least one homogeneous baseline (the "control" side of the ≥20pp test) and at least one heterogeneous condition (the "treatment" side). Identifiers are free-form but must match the labels in the pre-registration YAML.

Attributes:

  • homogeneous_id (str) –

    Identifier of the homogeneous baseline.

  • heterogeneous_id (str) –

    Identifier of the heterogeneous condition.

Source code in src/chamber/evaluation/results.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
class ConditionPair(BaseModel):
    """Pre-registered homogeneous-vs-heterogeneous baseline pair (ADR-007 §Validation criteria).

    Each Phase-0 spike runs at least one homogeneous baseline (the
    "control" side of the ≥20pp test) and at least one heterogeneous
    condition (the "treatment" side). Identifiers are free-form but
    must match the labels in the pre-registration YAML.

    Attributes:
        homogeneous_id: Identifier of the homogeneous baseline.
        heterogeneous_id: Identifier of the heterogeneous condition.
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    homogeneous_id: str
    heterogeneous_id: str

ConditionResult

Bases: BaseModel

Aggregated result for one ADR-007 axis (ADR-008 §Decision).

Carries the homogeneous-vs-heterogeneous success-rate gap and the safety signals that feed the HRS-vector entry for this axis. gap_pp is the ≥20pp gap statistic from ADR-007 §Validation criteria; ci_low_pp / ci_high_pp are the cluster-bootstrap 95% bounds from :mod:chamber.evaluation.bootstrap.

Attributes:

  • axis (str) –

    ADR-007 axis name.

  • n_episodes (NonNegativeInt) –

    Total episodes contributing to this row.

  • homogeneous_success (NonNegativeFloat) –

    IQM success rate on the homogeneous side.

  • heterogeneous_success (NonNegativeFloat) –

    IQM success rate on the heterogeneous side.

  • gap_pp (float) –

    (homogeneous_success - heterogeneous_success) * 100 — the percentage-point gap (ADR-007 §Validation criteria).

  • ci_low_pp (float) –

    Lower bound of the 95% cluster-bootstrap CI on gap_pp.

  • ci_high_pp (float) –

    Upper bound of the 95% cluster-bootstrap CI on gap_pp.

  • violation_rate (NonNegativeFloat) –

    Per-step CBF-constraint violation rate across episodes (ADR-014 §Decision Table 2).

  • fallback_rate (NonNegativeFloat) –

    Fraction of episodes in which the braking fallback fired at least once (separate from violation_rate per reviewer P0).

Source code in src/chamber/evaluation/results.py
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
class ConditionResult(BaseModel):
    """Aggregated result for one ADR-007 axis (ADR-008 §Decision).

    Carries the homogeneous-vs-heterogeneous success-rate gap and the
    safety signals that feed the HRS-vector entry for this axis.
    ``gap_pp`` is the ≥20pp gap statistic from ADR-007 §Validation
    criteria; ``ci_low_pp`` / ``ci_high_pp`` are the cluster-bootstrap
    95% bounds from :mod:`chamber.evaluation.bootstrap`.

    Attributes:
        axis: ADR-007 axis name.
        n_episodes: Total episodes contributing to this row.
        homogeneous_success: IQM success rate on the homogeneous side.
        heterogeneous_success: IQM success rate on the heterogeneous
            side.
        gap_pp: ``(homogeneous_success - heterogeneous_success) *
            100`` — the percentage-point gap (ADR-007 §Validation
            criteria).
        ci_low_pp: Lower bound of the 95% cluster-bootstrap CI on
            ``gap_pp``.
        ci_high_pp: Upper bound of the 95% cluster-bootstrap CI on
            ``gap_pp``.
        violation_rate: Per-step CBF-constraint violation rate across
            episodes (ADR-014 §Decision Table 2).
        fallback_rate: Fraction of episodes in which the braking
            fallback fired at least once (separate from
            ``violation_rate`` per reviewer P0).
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    axis: str
    n_episodes: NonNegativeInt
    homogeneous_success: NonNegativeFloat
    heterogeneous_success: NonNegativeFloat
    gap_pp: float
    ci_low_pp: float
    ci_high_pp: float
    violation_rate: NonNegativeFloat
    fallback_rate: NonNegativeFloat

EpisodeResult

Bases: BaseModel

A single evaluation episode of a CHAMBER spike (ADR-007 §Validation criteria).

The fields cover the three observable quantities the project reports per ADR-014 §Decision Table 2: episode success, peak constraint-violation magnitude, and fallback-fire count. The force_peak slot is the SA-axis specific signal used in ADR-014's Table 1 / Table 2 vendor-compliance rows when ADR-007 Stage-3 SA decomposes the safety axis.

Attributes:

  • seed (int) –

    Root seed for this episode (ADR-002 P6 determinism).

  • episode_idx (NonNegativeInt) –

    Zero-based episode index within the (seed, condition) cell.

  • initial_state_seed (int) –

    Sub-stream seed for the initial state reset, used to pair homogeneous-vs-heterogeneous episodes in :func:chamber.evaluation.bootstrap.pacluster_bootstrap (ADR-007 §Validation criteria; reviewer P1-9).

  • success (bool) –

    True iff the task succeeded per the env's terminal reward.

  • constraint_violation_peak (float) –

    Worst-case CBF-constraint gap over the episode (ADR-014 §Decision Table 2 "constraint violation" column).

  • fallback_fired (NonNegativeInt) –

    Number of braking-fallback fires during the episode (ADR-014 §Decision Table 2 "fallback fired" column; separate from violations per reviewer P0).

  • force_peak (float | None) –

    Peak contact force in newtons; None outside the SA spike (ADR-007 Stage 3).

  • metadata (dict[str, Any]) –

    Free-form per-episode metadata.

Source code in src/chamber/evaluation/results.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class EpisodeResult(BaseModel):
    """A single evaluation episode of a CHAMBER spike (ADR-007 §Validation criteria).

    The fields cover the three observable quantities the project
    reports per ADR-014 §Decision Table 2: episode success, peak
    constraint-violation magnitude, and fallback-fire count. The
    ``force_peak`` slot is the SA-axis specific signal used in
    ADR-014's Table 1 / Table 2 vendor-compliance rows when ADR-007
    Stage-3 SA decomposes the safety axis.

    Attributes:
        seed: Root seed for this episode (ADR-002 P6 determinism).
        episode_idx: Zero-based episode index within the ``(seed,
            condition)`` cell.
        initial_state_seed: Sub-stream seed for the initial state
            reset, used to pair homogeneous-vs-heterogeneous episodes
            in :func:`chamber.evaluation.bootstrap.pacluster_bootstrap`
            (ADR-007 §Validation criteria; reviewer P1-9).
        success: ``True`` iff the task succeeded per the env's
            terminal reward.
        constraint_violation_peak: Worst-case CBF-constraint gap over
            the episode (ADR-014 §Decision Table 2 "constraint
            violation" column).
        fallback_fired: Number of braking-fallback fires during the
            episode (ADR-014 §Decision Table 2 "fallback fired"
            column; separate from violations per reviewer P0).
        force_peak: Peak contact force in newtons; ``None`` outside
            the SA spike (ADR-007 Stage 3).
        metadata: Free-form per-episode metadata.
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    seed: int
    episode_idx: NonNegativeInt
    initial_state_seed: int
    success: bool
    constraint_violation_peak: float = 0.0
    fallback_fired: NonNegativeInt = 0
    force_peak: float | None = None
    metadata: dict[str, Any] = Field(default_factory=dict)

HRSVector

Bases: BaseModel

Per-axis HRS vector emitted alongside the scalar (ADR-008 §Decision).

Reviewer P1-8 requires the vector to be emitted unconditionally so consumers can recompute the scalar under a different weighting; the leaderboard renderer enforces this by refusing entries that carry only the scalar.

Attributes:

  • entries (list[HRSVectorEntry]) –

    Per-axis entries in the ADR-008 default order (CM > PF > CR > SA > OM > AS), subject to axis survival.

Source code in src/chamber/evaluation/results.py
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
class HRSVector(BaseModel):
    """Per-axis HRS vector emitted alongside the scalar (ADR-008 §Decision).

    Reviewer P1-8 requires the vector to be emitted *unconditionally*
    so consumers can recompute the scalar under a different
    weighting; the leaderboard renderer enforces this by refusing
    entries that carry only the scalar.

    Attributes:
        entries: Per-axis entries in the ADR-008 default order
            (CM > PF > CR > SA > OM > AS), subject to axis survival.
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    entries: list[HRSVectorEntry]

HRSVectorEntry

Bases: BaseModel

One axis row of the HRS vector (ADR-008 §Decision).

ADR-008 binds the HRS bundle to the surviving ADR-007 axes and fixes the default ordering CM > PF > CR > SA > OM > AS. Each entry carries the per-axis score plus the gap-PP that feeds :func:chamber.evaluation.hrs.compute_hrs_scalar.

Attributes:

  • axis (str) –

    ADR-007 axis name.

  • score (NonNegativeFloat) –

    Normalised per-axis score in [0, 1] (success rate on the heterogeneous condition, by default).

  • gap_pp (float) –

    Percentage-point gap from the matching :class:ConditionResult (kept for traceability).

  • weight (NonNegativeFloat) –

    Weight assigned to this axis in the scalar aggregation (defaults to ADR-008 Option D ordering).

Source code in src/chamber/evaluation/results.py
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
class HRSVectorEntry(BaseModel):
    """One axis row of the HRS vector (ADR-008 §Decision).

    ADR-008 binds the HRS bundle to the surviving ADR-007 axes and
    fixes the default ordering CM > PF > CR > SA > OM > AS. Each
    entry carries the per-axis score plus the gap-PP that feeds
    :func:`chamber.evaluation.hrs.compute_hrs_scalar`.

    Attributes:
        axis: ADR-007 axis name.
        score: Normalised per-axis score in ``[0, 1]`` (success rate
            on the heterogeneous condition, by default).
        gap_pp: Percentage-point gap from the matching
            :class:`ConditionResult` (kept for traceability).
        weight: Weight assigned to this axis in the scalar
            aggregation (defaults to ADR-008 Option D ordering).
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    axis: str
    score: NonNegativeFloat
    gap_pp: float
    weight: NonNegativeFloat

LeaderboardEntry

Bases: BaseModel

A single method/result pair as it appears on the CHAMBER leaderboard (ADR-008 §Decision).

Ranks by hrs_scalar. The accompanying :class:HRSVector is emitted unconditionally per reviewer P1-8; the renderer in :mod:chamber.evaluation.render displays the vector as a sub-row below the scalar.

Attributes:

  • method_id (str) –

    Identifier of the method being evaluated (e.g. "concerto-ego-aht-v0.1").

  • spike_runs (list[str]) –

    Spike-run IDs that contributed to this entry.

  • hrs_vector (HRSVector) –

    Per-axis HRS values (always emitted).

  • hrs_scalar (NonNegativeFloat) –

    Aggregated HRS scalar (ADR-008 §Decision; the ranking metric).

  • violation_rate (NonNegativeFloat) –

    Aggregate per-step violation rate across the contributing spike runs.

  • fallback_rate (NonNegativeFloat) –

    Aggregate fraction of episodes triggering the braking fallback (ADR-014 §Decision).

  • schema_version (int) –

    Result-archive schema version (default :data:SCHEMA_VERSION).

Source code in src/chamber/evaluation/results.py
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
class LeaderboardEntry(BaseModel):
    """A single method/result pair as it appears on the CHAMBER leaderboard (ADR-008 §Decision).

    Ranks by ``hrs_scalar``. The accompanying :class:`HRSVector` is
    emitted unconditionally per reviewer P1-8; the renderer in
    :mod:`chamber.evaluation.render` displays the vector as a sub-row
    below the scalar.

    Attributes:
        method_id: Identifier of the method being evaluated (e.g.
            ``"concerto-ego-aht-v0.1"``).
        spike_runs: Spike-run IDs that contributed to this entry.
        hrs_vector: Per-axis HRS values (always emitted).
        hrs_scalar: Aggregated HRS scalar (ADR-008 §Decision; the
            ranking metric).
        violation_rate: Aggregate per-step violation rate across the
            contributing spike runs.
        fallback_rate: Aggregate fraction of episodes triggering the
            braking fallback (ADR-014 §Decision).
        schema_version: Result-archive schema version
            (default :data:`SCHEMA_VERSION`).
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    method_id: str
    spike_runs: list[str]
    hrs_vector: HRSVector
    hrs_scalar: NonNegativeFloat
    violation_rate: NonNegativeFloat
    fallback_rate: NonNegativeFloat
    schema_version: int = SCHEMA_VERSION

PairedEpisode dataclass

One paired homogeneous-vs-heterogeneous episode (ADR-007 §Validation criteria).

Pairs are matched on (seed, episode_idx, initial_state_seed) so the gap test in :func:pacluster_bootstrap is computed on matched pairs rather than on pooled means (reviewer P1-9).

Attributes:

  • seed (int) –

    Shared root seed.

  • episode_idx (int) –

    Shared episode index within the seed.

  • initial_state_seed (int) –

    Shared env-reset sub-stream seed.

  • homogeneous (float) –

    Per-episode metric on the homogeneous side.

  • heterogeneous (float) –

    Per-episode metric on the heterogeneous side.

Source code in src/chamber/evaluation/bootstrap.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
@dataclass(frozen=True)
class PairedEpisode:
    """One paired homogeneous-vs-heterogeneous episode (ADR-007 §Validation criteria).

    Pairs are matched on ``(seed, episode_idx, initial_state_seed)``
    so the gap test in :func:`pacluster_bootstrap` is computed on
    matched pairs rather than on pooled means (reviewer P1-9).

    Attributes:
        seed: Shared root seed.
        episode_idx: Shared episode index within the seed.
        initial_state_seed: Shared env-reset sub-stream seed.
        homogeneous: Per-episode metric on the homogeneous side.
        heterogeneous: Per-episode metric on the heterogeneous side.
    """

    seed: int
    episode_idx: int
    initial_state_seed: int
    homogeneous: float
    heterogeneous: float

PreregistrationError

Bases: RuntimeError

Raised when a pre-registration fails the ADR-007 §Discipline check (ADR-007 §Discipline).

Source code in src/chamber/evaluation/prereg.py
160
161
class PreregistrationError(RuntimeError):
    """Raised when a pre-registration fails the ADR-007 §Discipline check (ADR-007 §Discipline)."""

PreregistrationSpec

Bases: BaseModel

Schema for an ADR-007 pre-registration YAML (ADR-007 §Discipline).

Required fields mirror the operational contract in docs/reference/evaluation.md §3: the homogeneous-vs- heterogeneous pair, the seed list, the episodes-per-seed budget, the estimator, and the bootstrap method. The default bootstrap is cluster per reviewer P1-9 — pooled IID bootstrap on seed-clustered data understates the CI width.

Attributes:

  • axis (str) –

    One of the ADR-007 §3.4 axes ("CR" / "AS" / "OM" / "CM" / "PF" / "SA").

  • condition_pair (ConditionPair) –

    Homogeneous-vs-heterogeneous baseline pair.

  • seeds (list[int]) –

    List of root seeds for ADR-002 P6 determinism.

  • episodes_per_seed (PositiveInt) –

    Episode budget per (seed, condition) cell.

  • estimator (str) –

    Identifier of the headline estimator (e.g. "iqm_success_rate").

  • bootstrap_method (BootstrapMethod) –

    Default "cluster" (reviewer P1-9); "hierarchical" is an alias accepted for back-compat; "iid" is allowed only for power calculations, not leaderboard entries (reviewer P1-5, enforced by :meth:_check_bootstrap_policy).

  • failure_policy (FailurePolicy) –

    "strict" (the spike fails if any seed errors) or "best_effort" (errored seeds are dropped and reported).

  • git_tag (str) –

    Pre-registration git tag this YAML is locked to (ADR-007 §Discipline; verified by :func:verify_git_tag).

  • notes (str) –

    Free-form notes carried into the leaderboard entry.

  • run_purpose (RunPurpose) –

    "leaderboard" (default), "power", or "debug" (reviewer P1-5). Only "leaderboard" runs are admitted to the public ranking; the other two values relax the iid-bootstrap ban so power calculations and local debug runs can still use a pooled iid baseline.

Source code in src/chamber/evaluation/prereg.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
class PreregistrationSpec(BaseModel):
    """Schema for an ADR-007 pre-registration YAML (ADR-007 §Discipline).

    Required fields mirror the operational contract in
    ``docs/reference/evaluation.md`` §3: the homogeneous-vs-
    heterogeneous pair, the seed list, the episodes-per-seed budget,
    the estimator, and the bootstrap method. The default bootstrap is
    ``cluster`` per reviewer P1-9 — pooled IID bootstrap on
    seed-clustered data understates the CI width.

    Attributes:
        axis: One of the ADR-007 §3.4 axes
            (``"CR"`` / ``"AS"`` / ``"OM"`` / ``"CM"`` / ``"PF"`` /
            ``"SA"``).
        condition_pair: Homogeneous-vs-heterogeneous baseline pair.
        seeds: List of root seeds for ADR-002 P6 determinism.
        episodes_per_seed: Episode budget per ``(seed, condition)``
            cell.
        estimator: Identifier of the headline estimator (e.g.
            ``"iqm_success_rate"``).
        bootstrap_method: Default ``"cluster"`` (reviewer P1-9);
            ``"hierarchical"`` is an alias accepted for back-compat;
            ``"iid"`` is allowed only for power calculations, not
            leaderboard entries (reviewer P1-5, enforced by
            :meth:`_check_bootstrap_policy`).
        failure_policy: ``"strict"`` (the spike fails if any seed
            errors) or ``"best_effort"`` (errored seeds are dropped
            and reported).
        git_tag: Pre-registration git tag this YAML is locked to
            (ADR-007 §Discipline; verified by
            :func:`verify_git_tag`).
        notes: Free-form notes carried into the leaderboard entry.
        run_purpose: ``"leaderboard"`` (default), ``"power"``, or
            ``"debug"`` (reviewer P1-5). Only ``"leaderboard"`` runs
            are admitted to the public ranking; the other two values
            relax the iid-bootstrap ban so power calculations and
            local debug runs can still use a pooled iid baseline.
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    axis: str
    condition_pair: ConditionPair
    seeds: list[int]
    episodes_per_seed: PositiveInt
    estimator: str
    bootstrap_method: BootstrapMethod = "cluster"
    failure_policy: FailurePolicy = "strict"
    git_tag: str
    notes: str = Field(default="")
    run_purpose: RunPurpose = "leaderboard"

    @model_validator(mode="after")
    def _check_bootstrap_policy(self) -> PreregistrationSpec:
        """Enforce the iid-not-allowed-for-leaderboard rule (reviewer P1-5).

        The pooled IID bootstrap on seed-clustered episode data
        understates CI width; admitting it to the leaderboard would
        let entries claim tighter intervals than the data supports.
        Power-analysis and debug runs are exempt because they are not
        published.
        """
        if self.run_purpose == "leaderboard" and self.bootstrap_method == "iid":
            msg = (
                "iid bootstrap is not permitted for leaderboard "
                "entries; use cluster (default) or hierarchical."
            )
            raise ValueError(msg)
        return self

    def normalised_axis(self) -> str:
        """Return the validated ADR-007 axis label (ADR-007 §Decision).

        Raises:
            ValueError: When :attr:`axis` is not one of the ADR-007
                §3.4 Option D shortlist.
        """
        if self.axis not in _AXIS_LABELS:
            msg = (
                f"axis {self.axis!r} is not one of the ADR-007 §3.4 shortlist "
                f"{sorted(_AXIS_LABELS)!r}"
            )
            raise ValueError(msg)
        return self.axis

normalised_axis

normalised_axis() -> str

Return the validated ADR-007 axis label (ADR-007 §Decision).

Raises:

  • ValueError

    When :attr:axis is not one of the ADR-007 §3.4 Option D shortlist.

Source code in src/chamber/evaluation/prereg.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def normalised_axis(self) -> str:
    """Return the validated ADR-007 axis label (ADR-007 §Decision).

    Raises:
        ValueError: When :attr:`axis` is not one of the ADR-007
            §3.4 Option D shortlist.
    """
    if self.axis not in _AXIS_LABELS:
        msg = (
            f"axis {self.axis!r} is not one of the ADR-007 §3.4 shortlist "
            f"{sorted(_AXIS_LABELS)!r}"
        )
        raise ValueError(msg)
    return self.axis

SlackAggregate dataclass

Per-condition OSCBF slack aggregate (ADR-014 §Revision history 2026-05-16).

Carries the two scalar columns ADR-014 Table 2 v2 adds to :class:concerto.safety.reporting.ConditionRow. The values are aggregated by :func:aggregate_oscbf_slack over the per-step :class:concerto.safety.oscbf.OSCBFResult stream within a single ADR-014 Table 2 condition.

Attributes:

  • max_slack (float) –

    Maximum over steps k in the condition of OSCBFResult.max_slack — the worst single-step OSCBF relaxation observed across the condition. Non-negative by construction (the source signal is non-negative).

  • slack_l2 (float) –

    Mean over steps k of OSCBFResult.slack_l2 — the mean L2-norm of the per-step slack vector across the condition. Non-negative by construction.

Source code in src/chamber/evaluation/oscbf_aggregate.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
@dataclass(frozen=True)
class SlackAggregate:
    """Per-condition OSCBF slack aggregate (ADR-014 §Revision history 2026-05-16).

    Carries the two scalar columns ADR-014 Table 2 v2 adds to
    :class:`concerto.safety.reporting.ConditionRow`. The values are
    aggregated by :func:`aggregate_oscbf_slack` over the per-step
    :class:`concerto.safety.oscbf.OSCBFResult` stream within a single
    ADR-014 Table 2 condition.

    Attributes:
        max_slack: Maximum over steps ``k`` in the condition of
            ``OSCBFResult.max_slack`` — the worst single-step OSCBF
            relaxation observed across the condition. Non-negative by
            construction (the source signal is non-negative).
        slack_l2: Mean over steps ``k`` of ``OSCBFResult.slack_l2`` —
            the mean L2-norm of the per-step slack vector across the
            condition. Non-negative by construction.
    """

    max_slack: float
    slack_l2: float

SpikeRun

Bases: BaseModel

Full record of one ADR-007 staged spike (ADR-007 §Discipline; ADR-016 §Decision).

Carries the pre-registration provenance (prereg_sha / git_tag) so the leaderboard renderer can refuse entries that do not match the locked YAML. The episode_results list is the raw observation feed for :mod:chamber.evaluation.bootstrap and :mod:chamber.evaluation.hrs.

Attributes:

  • spike_id (str) –

    Free-form identifier (e.g. "stage2_cm_urllc").

  • prereg_sha (str) –

    Git-blob SHA of the locked pre-registration YAML (ADR-007 §Discipline; verified by :func:chamber.evaluation.prereg.verify_git_tag).

  • git_tag (str) –

    Git tag the pre-registration was locked to (ADR-007 §Discipline).

  • axis (str) –

    Name of the ADR-007 heterogeneity axis under test (one of "CR", "AS", "OM", "CM", "PF", "SA").

  • sub_stage (SubStage) –

    ADR-007 §Implementation-staging sub-stage label (one of "1a" / "1b" / "2" / "3"). Required, no default — see ADR-016 §Decision and §Rationale for why the silent default-by-axis fallback the v1 schema relied on was retired.

  • condition_pair (ConditionPair) –

    Homogeneous-vs-heterogeneous pair.

  • seeds (list[int]) –

    List of root seeds used in this spike (ADR-002 P6).

  • episode_results (list[EpisodeResult]) –

    Flat list of per-episode results across all seeds and both pair sides.

  • schema_version (int) –

    Result-archive schema version (default :data:SCHEMA_VERSION; currently 2 per ADR-016 §Decision).

Source code in src/chamber/evaluation/results.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
class SpikeRun(BaseModel):
    """Full record of one ADR-007 staged spike (ADR-007 §Discipline; ADR-016 §Decision).

    Carries the pre-registration provenance (``prereg_sha`` /
    ``git_tag``) so the leaderboard renderer can refuse entries that
    do not match the locked YAML. The ``episode_results`` list is the
    raw observation feed for
    :mod:`chamber.evaluation.bootstrap` and
    :mod:`chamber.evaluation.hrs`.

    Attributes:
        spike_id: Free-form identifier (e.g. ``"stage2_cm_urllc"``).
        prereg_sha: Git-blob SHA of the locked pre-registration YAML
            (ADR-007 §Discipline; verified by
            :func:`chamber.evaluation.prereg.verify_git_tag`).
        git_tag: Git tag the pre-registration was locked to
            (ADR-007 §Discipline).
        axis: Name of the ADR-007 heterogeneity axis under test
            (one of ``"CR"``, ``"AS"``, ``"OM"``, ``"CM"``,
            ``"PF"``, ``"SA"``).
        sub_stage: ADR-007 §Implementation-staging sub-stage label
            (one of ``"1a"`` / ``"1b"`` / ``"2"`` / ``"3"``).
            Required, no default — see ADR-016 §Decision and
            §Rationale for why the silent default-by-axis fallback
            the v1 schema relied on was retired.
        condition_pair: Homogeneous-vs-heterogeneous pair.
        seeds: List of root seeds used in this spike (ADR-002 P6).
        episode_results: Flat list of per-episode results across all
            seeds and both pair sides.
        schema_version: Result-archive schema version
            (default :data:`SCHEMA_VERSION`; currently 2 per
            ADR-016 §Decision).
    """

    model_config = ConfigDict(extra="forbid", frozen=True)

    spike_id: str
    prereg_sha: str
    git_tag: str
    axis: str
    sub_stage: SubStage
    condition_pair: ConditionPair
    seeds: list[int]
    episode_results: list[EpisodeResult]
    schema_version: int = SCHEMA_VERSION

aggregate_metrics

aggregate_metrics(
    values: Mapping[int, Sequence[float]],
    *,
    metrics: Sequence[str] = (
        "iqm",
        "optimality_gap",
        "performance_profile",
    ),
    threshold: float = 1.0,
) -> dict[str, float | None]

rliable-compatible aggregate metrics (ADR-008 §Decision; ADR-014 §Decision).

If the optional rliable package is installed, delegates to its library.get_interval_estimates / aggregate_func helpers. Otherwise, IQM and optimality gap are computed natively and a runtime warning explains that performance profiles require the optional extra. The native path is sufficient for unit tests and the Phase-0 golden-file integration test.

Parameters:

  • values (Mapping[int, Sequence[float]]) –

    {seed: [per-episode metric]} mapping.

  • metrics (Sequence[str], default: ('iqm', 'optimality_gap', 'performance_profile') ) –

    Subset of {"iqm", "optimality_gap", "performance_profile"} to compute.

  • threshold (float, default: 1.0 ) –

    Reference score for optimality gap (default 1.0 for normalised success rates).

Returns:

  • dict[str, float | None]

    Dict keyed by metric name with float values; the

  • dict[str, float | None]

    "performance_profile" slot is None when the optional

  • dict[str, float | None]

    rliable extra is not installed.

Source code in src/chamber/evaluation/bootstrap.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
def aggregate_metrics(
    values: Mapping[int, Sequence[float]],
    *,
    metrics: Sequence[str] = ("iqm", "optimality_gap", "performance_profile"),
    threshold: float = 1.0,
) -> dict[str, float | None]:
    """rliable-compatible aggregate metrics (ADR-008 §Decision; ADR-014 §Decision).

    If the optional ``rliable`` package is installed, delegates to
    its ``library.get_interval_estimates`` / ``aggregate_func``
    helpers. Otherwise, IQM and optimality gap are computed natively
    and a runtime warning explains that performance profiles require
    the optional extra. The native path is sufficient for unit tests
    and the Phase-0 golden-file integration test.

    Args:
        values: ``{seed: [per-episode metric]}`` mapping.
        metrics: Subset of ``{"iqm", "optimality_gap",
            "performance_profile"}`` to compute.
        threshold: Reference score for optimality gap (default
            ``1.0`` for normalised success rates).

    Returns:
        Dict keyed by metric name with float values; the
        ``"performance_profile"`` slot is ``None`` when the optional
        ``rliable`` extra is not installed.
    """
    flat = np.concatenate(
        [np.asarray(v, dtype=np.float64) for v in values.values() if len(v) > 0]
        or [np.array([], dtype=np.float64)]
    )
    out: dict[str, float | None] = {}
    if "iqm" in metrics:
        out["iqm"] = _interquartile_mean(flat)
    if "optimality_gap" in metrics:
        out["optimality_gap"] = _optimality_gap(flat, threshold=threshold)
    if "performance_profile" in metrics:
        if _RLIABLE_AVAILABLE:
            # The rliable performance-profile API returns a histogram of
            # threshold-vs-coverage; CHAMBER's renderer summarises it to
            # area-under-the-profile to fit a single leaderboard column.
            from rliable import library as rly  # type: ignore[import-not-found]  # noqa: PLC0415

            tau = np.linspace(0.0, 1.0, num=51)
            profile, _ = rly.create_performance_profile({"x": flat[None, :]}, tau)
            # Trapezoidal area under the performance profile; np.trapezoid
            # supersedes np.trapz from NumPy 2.0.
            trapezoid = getattr(np, "trapezoid", None) or np.trapz  # type: ignore[attr-defined]
            out["performance_profile"] = float(trapezoid(profile["x"], tau))
        else:
            warnings.warn(
                "performance_profile requires the optional `rliable` extra "
                "(`uv pip install rliable`); returning None for this column",
                RuntimeWarning,
                stacklevel=2,
            )
            out["performance_profile"] = None
    return out

aggregate_oscbf_slack

aggregate_oscbf_slack(
    results: Iterable[OSCBFResult],
) -> SlackAggregate

Aggregate per-step :class:OSCBFResult values into the ADR-014 contract.

Implements the locked aggregation contract from ADR-014 §Revision history 2026-05-16:

  • max_slack is the max over steps of OSCBFResult.max_slack.
  • slack_l2 is the mean over steps of OSCBFResult.slack_l2.

A consumed empty iterable produces SlackAggregate(0.0, 0.0) so the result composes with the :class:concerto.safety.reporting.ConditionRow defaults (the Stage-1a EGO_ONLY path that does not exercise the OSCBF inner filter; issue #142).

Parameters:

  • results (Iterable[OSCBFResult]) –

    Iterable of per-step :class:concerto.safety.oscbf.OSCBFResult values from the steps within a single ADR-014 Table 2 condition. Order is irrelevant; the aggregator consumes the iterable exactly once.

Returns:

Source code in src/chamber/evaluation/oscbf_aggregate.py
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def aggregate_oscbf_slack(results: Iterable[OSCBFResult]) -> SlackAggregate:
    """Aggregate per-step :class:`OSCBFResult` values into the ADR-014 contract.

    Implements the locked aggregation contract from ADR-014 §Revision
    history 2026-05-16:

    - ``max_slack`` is the max over steps of ``OSCBFResult.max_slack``.
    - ``slack_l2`` is the mean over steps of ``OSCBFResult.slack_l2``.

    A consumed empty iterable produces ``SlackAggregate(0.0, 0.0)`` so
    the result composes with the
    :class:`concerto.safety.reporting.ConditionRow` defaults (the
    Stage-1a EGO_ONLY path that does not exercise the OSCBF inner
    filter; issue #142).

    Args:
        results: Iterable of per-step
            :class:`concerto.safety.oscbf.OSCBFResult` values from the
            steps within a single ADR-014 Table 2 condition. Order is
            irrelevant; the aggregator consumes the iterable exactly
            once.

    Returns:
        The :class:`SlackAggregate` carrying the two ADR-014 v2
        ConditionRow columns.
    """
    max_slack = 0.0
    slack_l2_sum = 0.0
    n_steps = 0
    for result in results:
        max_slack = max(max_slack, result.max_slack)
        slack_l2_sum += result.slack_l2
        n_steps += 1
    slack_l2_mean = slack_l2_sum / n_steps if n_steps > 0 else 0.0
    return SlackAggregate(max_slack=max_slack, slack_l2=slack_l2_mean)

cluster_bootstrap

cluster_bootstrap(
    values: Mapping[int, Sequence[float]],
    *,
    n_resamples: int,
    rng: Generator,
) -> BootstrapCI

Cluster bootstrap over {seed: [per-episode values]} (reviewer P1-9; ADR-008 §Decision).

Resamples seeds with replacement (the cluster level), then resamples episodes with replacement within each resampled seed. Returns IQM, plain mean, and the 95% CI on the IQM estimator. This is the headline aggregator for ADR-014 §Decision Table 2 when the bootstrap method is "cluster".

Parameters:

  • values (Mapping[int, Sequence[float]]) –

    {seed: [per-episode metric]} mapping. Empty seeds are dropped from the resample but a fully empty input yields a degenerate nan interval.

  • n_resamples (int) –

    Number of bootstrap resamples (e.g. 2000).

  • rng (Generator) –

    Deterministic np.random.Generator (ADR-002 P6).

Returns:

  • BootstrapCI

    class:BootstrapCI with the IQM, plain mean, and 95% bounds.

Source code in src/chamber/evaluation/bootstrap.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def cluster_bootstrap(
    values: Mapping[int, Sequence[float]],
    *,
    n_resamples: int,
    rng: np.random.Generator,
) -> BootstrapCI:
    """Cluster bootstrap over ``{seed: [per-episode values]}`` (reviewer P1-9; ADR-008 §Decision).

    Resamples seeds with replacement (the cluster level), then
    resamples episodes with replacement within each resampled seed.
    Returns IQM, plain mean, and the 95% CI on the *IQM* estimator.
    This is the headline aggregator for ADR-014 §Decision Table 2
    when the bootstrap method is ``"cluster"``.

    Args:
        values: ``{seed: [per-episode metric]}`` mapping. Empty
            seeds are dropped from the resample but a fully empty
            input yields a degenerate ``nan`` interval.
        n_resamples: Number of bootstrap resamples (e.g. ``2000``).
        rng: Deterministic ``np.random.Generator`` (ADR-002 P6).

    Returns:
        :class:`BootstrapCI` with the IQM, plain mean, and 95% bounds.
    """
    if n_resamples <= 0:
        msg = f"n_resamples must be positive, got {n_resamples}"
        raise ValueError(msg)

    all_values = np.concatenate(
        [np.asarray(v, dtype=np.float64) for v in values.values() if len(v) > 0]
        or [np.array([], dtype=np.float64)]
    )
    point_iqm = _interquartile_mean(all_values)
    point_mean = float(np.mean(all_values)) if all_values.size else float("nan")

    iqm_samples = np.empty(n_resamples, dtype=np.float64)
    for k in range(n_resamples):
        resample = _resample_cluster(values, rng=rng)
        iqm_samples[k] = _interquartile_mean(resample) if resample.size else np.nan

    finite = iqm_samples[np.isfinite(iqm_samples)]
    if finite.size == 0:
        return BootstrapCI(
            iqm=point_iqm,
            mean=point_mean,
            ci_low=float("nan"),
            ci_high=float("nan"),
            n_resamples=n_resamples,
        )
    ci_low = float(np.percentile(finite, 2.5))
    ci_high = float(np.percentile(finite, 97.5))
    return BootstrapCI(
        iqm=point_iqm,
        mean=point_mean,
        ci_low=ci_low,
        ci_high=ci_high,
        n_resamples=n_resamples,
    )

compute_hrs_scalar

compute_hrs_scalar(
    vector: HRSVector,
    *,
    weights: dict[str, float] | None = None,
) -> float

Aggregate the HRS vector into a single scalar (ADR-008 §Decision).

Defined as sum(w_i * s_i) / sum(w_i) over the entries in vector — a weighted mean over the surviving axes, with the ADR-008 Option D ordering as default. Returns 0.0 for an empty vector (no surviving axes), which makes a method that failed every ≥20pp test sort to the bottom of the leaderboard.

Parameters:

  • vector (HRSVector) –

    Per-axis HRS values from :func:compute_hrs_vector.

  • weights (dict[str, float] | None, default: None ) –

    Override the per-axis weight map; if None, the weights embedded in vector.entries are used (so the scalar matches the vector that ships with it).

Returns:

  • float

    Scalar in [0, 1].

Source code in src/chamber/evaluation/hrs.py
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def compute_hrs_scalar(
    vector: HRSVector,
    *,
    weights: dict[str, float] | None = None,
) -> float:
    """Aggregate the HRS vector into a single scalar (ADR-008 §Decision).

    Defined as ``sum(w_i * s_i) / sum(w_i)`` over the entries in
    ``vector`` — a weighted mean over the surviving axes, with the
    ADR-008 Option D ordering as default. Returns ``0.0`` for an
    empty vector (no surviving axes), which makes a method that
    failed every ≥20pp test sort to the bottom of the leaderboard.

    Args:
        vector: Per-axis HRS values from :func:`compute_hrs_vector`.
        weights: Override the per-axis weight map; if ``None``, the
            weights embedded in ``vector.entries`` are used (so the
            scalar matches the vector that ships with it).

    Returns:
        Scalar in ``[0, 1]``.
    """
    if not vector.entries:
        return 0.0
    if weights is None:
        weight_seq = [e.weight for e in vector.entries]
    else:
        weight_seq = [weights[e.axis] for e in vector.entries]
    score_seq = [e.score for e in vector.entries]
    weight_total = sum(weight_seq)
    if weight_total <= 0.0:
        return 0.0
    return sum(w * s for w, s in zip(weight_seq, score_seq, strict=True)) / weight_total

compute_hrs_vector

compute_hrs_vector(
    condition_results: dict[str, ConditionResult],
    *,
    weights: dict[str, float] | None = None,
) -> HRSVector

Compute the per-axis HRS vector (ADR-008 §Decision; reviewer P1-8).

The per-axis score is the heterogeneous-side success rate (normalised to [0, 1]). Axes missing from condition_results are not emitted as zero-score entries — ADR-007's axis-survival rule means missing axes are cut from the headline, not silently scored as failure.

Parameters:

  • condition_results (dict[str, ConditionResult]) –

    {axis: ConditionResult} covering the surviving ADR-007 axes.

  • weights (dict[str, float] | None, default: None ) –

    Override the per-axis weight map; defaults to :data:DEFAULT_AXIS_WEIGHTS (ADR-008 §Decision Option D).

Returns:

  • HRSVector

    class:HRSVector with one entry per axis in the canonical

  • HRSVector

    ADR-008 ordering (CM > PF > CR > SA > OM > AS), filtered to

  • HRSVector

    the axes present in condition_results.

Source code in src/chamber/evaluation/hrs.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def compute_hrs_vector(
    condition_results: dict[str, ConditionResult],
    *,
    weights: dict[str, float] | None = None,
) -> HRSVector:
    """Compute the per-axis HRS vector (ADR-008 §Decision; reviewer P1-8).

    The per-axis score is the heterogeneous-side success rate
    (normalised to ``[0, 1]``). Axes missing from
    ``condition_results`` are *not* emitted as zero-score entries —
    ADR-007's axis-survival rule means missing axes are cut from the
    headline, not silently scored as failure.

    Args:
        condition_results: ``{axis: ConditionResult}`` covering the
            surviving ADR-007 axes.
        weights: Override the per-axis weight map; defaults to
            :data:`DEFAULT_AXIS_WEIGHTS` (ADR-008 §Decision Option D).

    Returns:
        :class:`HRSVector` with one entry per axis in the canonical
        ADR-008 ordering (CM > PF > CR > SA > OM > AS), filtered to
        the axes present in ``condition_results``.
    """
    chosen_weights = weights if weights is not None else DEFAULT_AXIS_WEIGHTS
    ordered_axes = [a for a in chosen_weights if a in condition_results]
    entries = [
        HRSVectorEntry(
            axis=axis,
            score=float(condition_results[axis].heterogeneous_success),
            gap_pp=float(condition_results[axis].gap_pp),
            weight=float(chosen_weights[axis]),
        )
        for axis in ordered_axes
    ]
    return HRSVector(entries=entries)

load_prereg

load_prereg(path: Path) -> PreregistrationSpec

Load and validate a pre-registration YAML (ADR-007 §Discipline).

Parameters:

  • path (Path) –

    Absolute path to the YAML file.

Returns:

Raises:

  • FileNotFoundError

    When path does not exist.

  • ValidationError

    When the YAML payload does not match the schema.

  • ValueError

    When :attr:PreregistrationSpec.axis is not on the ADR-007 §3.4 shortlist.

Source code in src/chamber/evaluation/prereg.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
def load_prereg(path: Path) -> PreregistrationSpec:
    """Load and validate a pre-registration YAML (ADR-007 §Discipline).

    Args:
        path: Absolute path to the YAML file.

    Returns:
        The validated :class:`PreregistrationSpec`.

    Raises:
        FileNotFoundError: When ``path`` does not exist.
        pydantic.ValidationError: When the YAML payload does not
            match the schema.
        ValueError: When :attr:`PreregistrationSpec.axis` is not on
            the ADR-007 §3.4 shortlist.
    """
    raw = path.read_text(encoding="utf-8")
    data = yaml.safe_load(raw) or {}
    spec = PreregistrationSpec.model_validate(data)
    spec.normalised_axis()
    return spec

make_condition_row

make_condition_row(
    *,
    predictor: str,
    conformal_mode: str,
    vendor_compliance: str | None,
    n_episodes: int,
    violations: int,
    fallback_fires: int,
    oscbf_results: Iterable[OSCBFResult] = (),
) -> ConditionRow

Construct a :class:ConditionRow with slack aggregates wired in (issue #142).

The natural call-site at three_tables.json emission time. Routes the per-step :class:concerto.safety.oscbf.OSCBFResult stream through :func:aggregate_oscbf_slack so the v2 slack columns are populated per the locked ADR-014 §Revision history 2026-05-16 contract. Callers on the EGO_ONLY outer-filter path (the Stage-1a default) pass the default empty tuple and get the schema-v2 0.0 defaults — exactly the behaviour documented on :class:concerto.safety.reporting.ConditionRow.

Parameters:

  • predictor (str) –

    "gt" or "pred" (Huriot & Sibai 2025 Table I labels).

  • conformal_mode (str) –

    "noLearn" or "Learn".

  • vendor_compliance (str | None) –

    ADR-007 rev 3 placeholder; None in Phase-0.

  • n_episodes (int) –

    Number of episodes in this condition.

  • violations (int) –

    Count of CBF-constraint violations.

  • fallback_fires (int) –

    Count of braking-fallback fires.

  • oscbf_results (Iterable[OSCBFResult], default: () ) –

    Per-step :class:OSCBFResult stream for the condition. Defaults to an empty tuple — the Stage-1a EGO_ONLY path that does not exercise the OSCBF inner filter (issue #142).

Returns:

  • A ( ConditionRow ) –

    class:ConditionRow carrying both the Phase-0 columns and

  • ConditionRow

    the ADR-014 v2 slack aggregates.

Source code in src/chamber/evaluation/oscbf_aggregate.py
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
def make_condition_row(
    *,
    predictor: str,
    conformal_mode: str,
    vendor_compliance: str | None,
    n_episodes: int,
    violations: int,
    fallback_fires: int,
    oscbf_results: Iterable[OSCBFResult] = (),
) -> ConditionRow:
    """Construct a :class:`ConditionRow` with slack aggregates wired in (issue #142).

    The natural call-site at ``three_tables.json`` emission time. Routes
    the per-step :class:`concerto.safety.oscbf.OSCBFResult` stream through
    :func:`aggregate_oscbf_slack` so the v2 slack columns are populated
    per the locked ADR-014 §Revision history 2026-05-16 contract. Callers
    on the EGO_ONLY outer-filter path (the Stage-1a default) pass the
    default empty tuple and get the schema-v2 ``0.0`` defaults — exactly
    the behaviour documented on
    :class:`concerto.safety.reporting.ConditionRow`.

    Args:
        predictor: ``"gt"`` or ``"pred"`` (Huriot & Sibai 2025 Table I
            labels).
        conformal_mode: ``"noLearn"`` or ``"Learn"``.
        vendor_compliance: ADR-007 rev 3 placeholder; ``None`` in
            Phase-0.
        n_episodes: Number of episodes in this condition.
        violations: Count of CBF-constraint violations.
        fallback_fires: Count of braking-fallback fires.
        oscbf_results: Per-step :class:`OSCBFResult` stream for the
            condition. Defaults to an empty tuple — the Stage-1a
            EGO_ONLY path that does not exercise the OSCBF inner
            filter (issue #142).

    Returns:
        A :class:`ConditionRow` carrying both the Phase-0 columns and
        the ADR-014 v2 slack aggregates.
    """
    aggregate = aggregate_oscbf_slack(oscbf_results)
    return ConditionRow(
        predictor=predictor,
        conformal_mode=conformal_mode,
        vendor_compliance=vendor_compliance,
        n_episodes=n_episodes,
        violations=violations,
        fallback_fires=fallback_fires,
        max_slack=aggregate.max_slack,
        slack_l2=aggregate.slack_l2,
    )

pacluster_bootstrap

pacluster_bootstrap(
    pairs: Iterable[PairedEpisode],
    *,
    n_resamples: int,
    rng: Generator,
) -> BootstrapCI

Paired-cluster bootstrap for the ≥20pp gap test (ADR-007 §Validation criteria).

Pairs are matched on (seed, episode_idx, initial_state_seed) so the homogeneous - heterogeneous delta is computed on the same initial state, removing one of the dominant sources of cross-condition variance. The cluster level is the seed, exactly as in :func:cluster_bootstrap.

Parameters:

  • pairs (Iterable[PairedEpisode]) –

    Iterable of :class:PairedEpisode records.

  • n_resamples (int) –

    Number of bootstrap resamples.

  • rng (Generator) –

    Deterministic np.random.Generator (ADR-002 P6).

Returns:

  • BootstrapCI

    class:BootstrapCI on the per-pair delta (in raw units,

  • BootstrapCI

    not percentage points — the renderer multiplies by 100 when

  • BootstrapCI

    formatting).

Source code in src/chamber/evaluation/bootstrap.py
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
def pacluster_bootstrap(
    pairs: Iterable[PairedEpisode],
    *,
    n_resamples: int,
    rng: np.random.Generator,
) -> BootstrapCI:
    """Paired-cluster bootstrap for the ≥20pp gap test (ADR-007 §Validation criteria).

    Pairs are matched on ``(seed, episode_idx, initial_state_seed)``
    so the ``homogeneous - heterogeneous`` delta is computed on the
    *same* initial state, removing one of the dominant sources of
    cross-condition variance. The cluster level is the seed, exactly
    as in :func:`cluster_bootstrap`.

    Args:
        pairs: Iterable of :class:`PairedEpisode` records.
        n_resamples: Number of bootstrap resamples.
        rng: Deterministic ``np.random.Generator`` (ADR-002 P6).

    Returns:
        :class:`BootstrapCI` on the per-pair delta (in raw units,
        not percentage points — the renderer multiplies by 100 when
        formatting).
    """
    if n_resamples <= 0:
        msg = f"n_resamples must be positive, got {n_resamples}"
        raise ValueError(msg)

    by_seed: dict[int, list[PairedEpisode]] = {}
    for pair in pairs:
        by_seed.setdefault(pair.seed, []).append(pair)

    all_deltas = np.asarray(
        [p.homogeneous - p.heterogeneous for episodes in by_seed.values() for p in episodes],
        dtype=np.float64,
    )
    point_iqm = _interquartile_mean(all_deltas)
    point_mean = float(np.mean(all_deltas)) if all_deltas.size else float("nan")

    iqm_samples = np.empty(n_resamples, dtype=np.float64)
    for k in range(n_resamples):
        resample = _resample_paired_cluster(by_seed, rng=rng)
        iqm_samples[k] = _interquartile_mean(resample) if resample.size else np.nan

    finite = iqm_samples[np.isfinite(iqm_samples)]
    if finite.size == 0:
        return BootstrapCI(
            iqm=point_iqm,
            mean=point_mean,
            ci_low=float("nan"),
            ci_high=float("nan"),
            n_resamples=n_resamples,
        )
    return BootstrapCI(
        iqm=point_iqm,
        mean=point_mean,
        ci_low=float(np.percentile(finite, 2.5)),
        ci_high=float(np.percentile(finite, 97.5)),
        n_resamples=n_resamples,
    )

render_leaderboard

render_leaderboard(
    entries: Sequence[LeaderboardEntry],
) -> str

Render the CHAMBER leaderboard (ADR-008 §Decision; reviewer P1-8).

Entries are sorted by hrs_scalar descending. Each method has two rows: the headline row carrying the scalar + aggregate violation / fallback rates, and a sub-row showing the HRS vector as axis: score (weight) pairs. The vector sub-row is unconditional per reviewer P1-8 — entries lacking an HRSVector raise :class:ValueError.

Entries built from a single spike-run archive are tagged [PARTIAL: <axis>] in front of the method name (reviewer P1-3) so a one-axis row is never mistaken for a complete HRS-bundle row (ADR-008 §Decision binds the bundle to the surviving-axis set).

Parameters:

  • entries (Sequence[LeaderboardEntry]) –

    Sequence of :class:LeaderboardEntry records.

Returns:

  • str

    Markdown table suitable for embedding in README or

  • str

    leaderboard pages.

Raises:

  • ValueError

    When any entry's :class:HRSVector is empty (no surviving axes) — the leaderboard refuses methods that did not produce a HRS vector per reviewer P1-8.

Source code in src/chamber/evaluation/render.py
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def render_leaderboard(entries: Sequence[LeaderboardEntry]) -> str:
    """Render the CHAMBER leaderboard (ADR-008 §Decision; reviewer P1-8).

    Entries are sorted by ``hrs_scalar`` descending. Each method has
    two rows: the headline row carrying the scalar + aggregate
    violation / fallback rates, and a sub-row showing the HRS
    vector as ``axis: score (weight)`` pairs. The vector sub-row is
    *unconditional* per reviewer P1-8 — entries lacking an HRSVector
    raise :class:`ValueError`.

    Entries built from a single spike-run archive are tagged
    ``[PARTIAL: <axis>]`` in front of the method name (reviewer P1-3)
    so a one-axis row is never mistaken for a complete HRS-bundle row
    (ADR-008 §Decision binds the bundle to the surviving-axis set).

    Args:
        entries: Sequence of :class:`LeaderboardEntry` records.

    Returns:
        Markdown table suitable for embedding in README or
        leaderboard pages.

    Raises:
        ValueError: When any entry's :class:`HRSVector` is empty
            (no surviving axes) — the leaderboard refuses methods
            that did not produce a HRS vector per reviewer P1-8.
    """
    for entry in entries:
        if not entry.hrs_vector.entries:
            msg = (
                f"leaderboard entry for method {entry.method_id!r} has an empty HRS vector; "
                "ADR-008 §Decision requires the vector to be emitted unconditionally "
                "(reviewer P1-8)"
            )
            raise ValueError(msg)

    sorted_entries = sorted(entries, key=lambda e: e.hrs_scalar, reverse=True)
    lines: list[str] = [
        "| Rank | Method | HRS scalar | Violation rate | Fallback rate |",
        "| --- | --- | ---: | ---: | ---: |",
    ]
    for rank, entry in enumerate(sorted_entries, start=1):
        if len(entry.spike_runs) == 1:
            partial_axis = entry.hrs_vector.entries[0].axis
            method_label = f"[PARTIAL: {partial_axis}] `{entry.method_id}`"
        else:
            method_label = f"`{entry.method_id}`"
        lines.append(
            f"| {rank} | {method_label} | {entry.hrs_scalar:.3f} "
            f"| {_fmt_pct(entry.violation_rate)} "
            f"| {_fmt_pct(entry.fallback_rate)} |"
        )
        vector_repr = ", ".join(
            f"{e.axis}={e.score:.3f}(w={e.weight:.2f})" for e in entry.hrs_vector.entries
        )
        lines.append(f"|   | ↳ HRS vector | {vector_repr} | | |")
    return "\n".join(lines) + "\n"

render_three_table_safety_report

render_three_table_safety_report(
    report: Mapping[str, Any], *, fmt: str = "markdown"
) -> str

Render the ADR-014 three-table safety report (ADR-014 §Decision).

Consumes the JSON-serialised payload produced by :func:concerto.safety.reporting.emit_three_tables (or any structurally equivalent ThreeTableReport.to_jsonable() output) and emits Markdown or LaTeX. The fallback-fired column in Table 2 is rendered separately from constraint-violation per ADR-014 §Decision and the reviewer P0 review on safety reporting.

Parameters:

  • report (Mapping[str, Any]) –

    A dict matching the :class:concerto.safety.reporting.ThreeTableReport to_jsonable() shape (table_1, table_2, table_3).

  • fmt (str, default: 'markdown' ) –

    Output format — "markdown" or "latex".

Returns:

  • str

    The rendered three-table report as a string.

Raises:

  • ValueError

    When fmt is not one of the supported formats, or the report dict is missing required keys.

Source code in src/chamber/evaluation/render.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
def render_three_table_safety_report(
    report: Mapping[str, Any],
    *,
    fmt: str = "markdown",
) -> str:
    """Render the ADR-014 three-table safety report (ADR-014 §Decision).

    Consumes the JSON-serialised payload produced by
    :func:`concerto.safety.reporting.emit_three_tables` (or any
    structurally equivalent ``ThreeTableReport.to_jsonable()``
    output) and emits Markdown or LaTeX. The fallback-fired column
    in Table 2 is rendered separately from constraint-violation per
    ADR-014 §Decision and the reviewer P0 review on safety reporting.

    Args:
        report: A dict matching the
            :class:`concerto.safety.reporting.ThreeTableReport`
            ``to_jsonable()`` shape (``table_1``, ``table_2``,
            ``table_3``).
        fmt: Output format — ``"markdown"`` or ``"latex"``.

    Returns:
        The rendered three-table report as a string.

    Raises:
        ValueError: When ``fmt`` is not one of the supported
            formats, or the report dict is missing required keys.
    """
    for key in ("table_1", "table_2", "table_3"):
        if key not in report:
            msg = f"three-table report is missing required key {key!r}"
            raise ValueError(msg)

    if fmt == "markdown":
        return _three_table_markdown(report)
    if fmt == "latex":
        return _three_table_latex(report)
    msg = f"fmt must be one of {{'markdown', 'latex'}}, got {fmt!r}"
    raise ValueError(msg)

verify_git_tag

verify_git_tag(
    spec: PreregistrationSpec,
    prereg_path: Path,
    *,
    repo_path: Path,
) -> str

Verify the pre-registration matches its git tag (ADR-007 §Discipline).

Confirms (i) the tag exists in the repository and (ii) the blob SHA of prereg_path on disk matches the blob SHA of the same file as stored at the tag. The second check is the lock that catches the "edit-the-YAML-after-launch" anti-pattern: any change to the YAML's bytes after the tag was cut shifts the on-disk blob SHA but not the tagged blob SHA.

Parameters:

  • spec (PreregistrationSpec) –

    The loaded :class:PreregistrationSpec.

  • prereg_path (Path) –

    Path to the YAML file on disk (absolute or repo-relative; will be resolved against repo_path).

  • repo_path (Path) –

    Root of the git working tree.

Returns:

  • str

    The verified blob SHA (hex digest, 40-char SHA-1).

Raises:

  • PreregistrationError

    When the tag is missing, the file is outside the repo, or the blob SHAs disagree.

Source code in src/chamber/evaluation/prereg.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
def verify_git_tag(
    spec: PreregistrationSpec,
    prereg_path: Path,
    *,
    repo_path: Path,
) -> str:
    """Verify the pre-registration matches its git tag (ADR-007 §Discipline).

    Confirms (i) the tag exists in the repository and (ii) the blob
    SHA of ``prereg_path`` on disk matches the blob SHA of the same
    file as stored at the tag. The second check is the lock that
    catches the "edit-the-YAML-after-launch" anti-pattern: any change
    to the YAML's bytes after the tag was cut shifts the on-disk
    blob SHA but not the tagged blob SHA.

    Args:
        spec: The loaded :class:`PreregistrationSpec`.
        prereg_path: Path to the YAML file on disk (absolute or
            repo-relative; will be resolved against ``repo_path``).
        repo_path: Root of the git working tree.

    Returns:
        The verified blob SHA (hex digest, 40-char SHA-1).

    Raises:
        PreregistrationError: When the tag is missing, the file is
            outside the repo, or the blob SHAs disagree.
    """
    abs_path = prereg_path if prereg_path.is_absolute() else (repo_path / prereg_path)
    try:
        rel = abs_path.resolve().relative_to(repo_path.resolve())
    except ValueError as exc:
        msg = f"pre-registration path {abs_path} is outside repo {repo_path}"
        raise PreregistrationError(msg) from exc

    try:
        _git("rev-parse", "--verify", f"refs/tags/{spec.git_tag}", repo_path=repo_path)
    except PreregistrationError as exc:
        msg = f"git tag {spec.git_tag!r} does not exist in {repo_path}"
        raise PreregistrationError(msg) from exc

    on_disk = _blob_sha_on_disk(path=abs_path, repo_path=repo_path)
    at_tag = _blob_sha_at_tag(tag=spec.git_tag, file_relpath=str(rel), repo_path=repo_path)
    if on_disk != at_tag:
        msg = (
            f"pre-registration blob SHA mismatch: on-disk {on_disk} != tagged {at_tag} "
            f"(tag={spec.git_tag}, file={rel})"
        )
        raise PreregistrationError(msg)
    return on_disk

chamber.benchmarks

CHAMBER spike runners — Stage 0/1/2/3 benchmark experiments.

Implements the staged heterogeneity-axis spike protocol from ADR-007. Each spike module reads a pre-registered YAML (spikes/preregistration/) and refuses to run if the SHA does not match its git tag (project principle P4).

Chamber-side ego-AHT training-run launcher (T4b.11; ADR-002 §Decisions).

Bridges the method-side :func:concerto.training.ego_aht.train (which intentionally does not import chamber.*) to the concrete env + partner instances that live on the benchmark side. The dependency direction stays clean: chamber.benchmarks.training_runner imports both concerto.* (the algorithm-agnostic loop + config) and chamber.* (the env + partner registry), and orchestrates them.

Public surface:

  • :func:build_env — turns an :class:~concerto.training.config.EnvConfig into a concrete env (Phase-0 dispatch table: "mpe_cooperative_push" → :class:MPECooperativePushEnv; "stage0_smoke" is reserved for T4b.3 and raises :class:NotImplementedError until the HARL fork's concerto_env_adapter lands).
  • :func:build_partner — turns a :class:~concerto.training.config.PartnerConfig into a frozen partner via :func:chamber.partners.registry.load_partner (ADR-009 §Decision).
  • :func:run_training — the end-to-end entry point: builds env, builds partner, calls :func:concerto.training.ego_aht.train, returns the :class:~concerto.training.ego_aht.TrainingResult (P1.04 NamedTuple wrapping the :class:RewardCurve + trained :class:EgoTrainer).

ADR-002 §Decisions / plan/05 §3.5: this module is the natural home for the env-task → env-class dispatch table; concrete env classes live here. M4b-7 will extend the dispatch with the HARL fork's concerto_env_adapter wrapper for stage0_smoke.

build_bounds_for_env

build_bounds_for_env(env: Env[Any, Any]) -> Bounds

Source per-task :class:Bounds from the env (P1.04.5; ADR-006 §Decision).

Delegates to env.build_bounds() when present (:class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv adds this in P1.04.5); falls back to a sensible default for envs that don't expose the method (Tier-1 fake envs, MPE stand-in).

Parameters:

  • env (Env[Any, Any]) –

    A Gymnasium env. The Stage-1b env's :meth:Stage1PickPlaceEnv.build_bounds returns the per-task envelope pinned at construction (cartesian_accel_capacity from PANDA_CARTESIAN_ACCEL_CAPACITY_MS2 = 10.0 m/s²).

Returns:

  • Bounds

    class:concerto.safety.api.Bounds.

Source code in src/chamber/benchmarks/training_runner.py
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
def build_bounds_for_env(env: gym.Env[Any, Any]) -> Bounds:
    """Source per-task :class:`Bounds` from the env (P1.04.5; ADR-006 §Decision).

    Delegates to ``env.build_bounds()`` when present
    (:class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv` adds this
    in P1.04.5); falls back to a sensible default for envs that don't
    expose the method (Tier-1 fake envs, MPE stand-in).

    Args:
        env: A Gymnasium env. The Stage-1b env's
            :meth:`Stage1PickPlaceEnv.build_bounds` returns the per-task
            envelope pinned at construction (cartesian_accel_capacity
            from ``PANDA_CARTESIAN_ACCEL_CAPACITY_MS2 = 10.0 m/s²``).

    Returns:
        :class:`concerto.safety.api.Bounds`.
    """
    from concerto.safety.api import Bounds

    builder = getattr(env, "build_bounds", None)
    if callable(builder):
        return cast("Bounds", builder())
    # Phase-0 fallback for envs without the method (MPE / Tier-1 fakes).
    # Values matched to the Phase-0 empirical-guarantee env (MPE
    # Cooperative-Push); per-task overrides happen via env.build_bounds.
    return Bounds(
        action_linf_component=1.0,
        cartesian_accel_capacity=1.0,
        action_rate=10.0,
        comm_latency_ms=0.0,
        force_limit=50.0,
    )

build_control_models

build_control_models(
    env: Env[Any, Any],
) -> Mapping[str, AgentControlModel]

Build a per-uid :class:AgentControlModel map from an env (ADR-004 §Decision; spike_004A).

Two-tier dispatch (P1.03 / Task B):

  1. Env-provided. If the env exposes a callable build_control_models instance method (the contract :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv adds in P1.03), delegate to it. The Stage-1b env's method returns a :class:JacobianControlModel for the 7-DOF panda uid with a working Jacobian callable, plus :class:DoubleIntegratorControlModel for the fetch uid — the per-embodiment dispatch lives there because the env knows its own URDF and articulation best (ADR-004 §Decision; ADR-007 §Stage 1b).
  2. Default fallback. Read each uid's action_space[uid] Box shape and return a :class:DoubleIntegratorControlModel per uid sized to shape[0]. Phase-0 envs in CHAMBER expose their actions as Cartesian velocities or accelerations (the safety filter's CBF assumes Cartesian acceleration), so the double-integrator identity model is the correct default for the MPE / Stage-0 envs.

Parameters:

  • env (Env[Any, Any]) –

    A multi-agent Gymnasium-conformant env. If the env exposes build_control_models, the helper delegates; otherwise it falls back to the default-DI dispatch over env.action_space.spaces.

Returns:

Raises:

  • TypeError

    If the fallback path is taken and env.action_space is not a :class:gymnasium.spaces.Dict or any action_space[uid] is not a :class:gymnasium.spaces.Box.

  • ValueError

    If the fallback path is taken and any uid's action Box is not 1-D (multi-D action shapes would need a custom control model rather than the double-integrator identity).

Source code in src/chamber/benchmarks/training_runner.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
def build_control_models(env: gym.Env[Any, Any]) -> Mapping[str, AgentControlModel]:
    """Build a per-uid :class:`AgentControlModel` map from an env (ADR-004 §Decision; spike_004A).

    Two-tier dispatch (P1.03 / Task B):

    1. **Env-provided.** If the env exposes a callable
       ``build_control_models`` instance method (the contract
       :class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv` adds
       in P1.03), delegate to it. The Stage-1b env's method returns a
       :class:`JacobianControlModel` for the 7-DOF panda uid with a
       working Jacobian callable, plus
       :class:`DoubleIntegratorControlModel` for the fetch uid — the
       per-embodiment dispatch lives there because the env knows its
       own URDF and articulation best (ADR-004 §Decision; ADR-007
       §Stage 1b).
    2. **Default fallback.** Read each uid's ``action_space[uid]`` Box
       shape and return a :class:`DoubleIntegratorControlModel` per uid
       sized to ``shape[0]``. Phase-0 envs in CHAMBER expose their
       actions as Cartesian velocities or accelerations (the safety
       filter's CBF assumes Cartesian acceleration), so the
       double-integrator identity model is the correct default for the
       MPE / Stage-0 envs.

    Args:
        env: A multi-agent Gymnasium-conformant env. If the env
            exposes ``build_control_models``, the helper delegates;
            otherwise it falls back to the default-DI dispatch over
            ``env.action_space.spaces``.

    Returns:
        ``{uid: AgentControlModel}`` per uid in
        ``env.action_space.spaces`` (or per the env's own
        ``build_control_models()`` if delegated).

    Raises:
        TypeError: If the fallback path is taken and
            ``env.action_space`` is not a :class:`gymnasium.spaces.Dict`
            or any ``action_space[uid]`` is not a :class:`gymnasium.spaces.Box`.
        ValueError: If the fallback path is taken and any uid's
            action Box is not 1-D (multi-D action shapes would need a
            custom control model rather than the double-integrator
            identity).
    """
    env_builder = getattr(env, "build_control_models", None)
    if callable(env_builder):
        # Env-side dispatch — Stage-1b path. The env's method is
        # responsible for sizing each uid's model correctly (panda gets
        # JacobianControlModel with a working jacobian_fn; fetch gets
        # DoubleIntegratorControlModel sized to the action space).
        # pyright sees getattr's return as ``object``; the cast pins
        # the contract Stage1PickPlaceEnv.build_control_models honours.
        return cast("Mapping[str, AgentControlModel]", env_builder())
    action_space = env.action_space
    if not isinstance(action_space, gym.spaces.Dict):
        msg = (
            "build_control_models requires env.action_space to be a "
            f"gym.spaces.Dict; got {type(action_space).__name__}."
        )
        raise TypeError(msg)
    control_models: dict[str, AgentControlModel] = {}
    for uid, sub in action_space.spaces.items():
        if not isinstance(sub, gym.spaces.Box):
            msg = f"action_space[{uid!r}] must be a gym.spaces.Box; got {type(sub).__name__}."
            raise TypeError(msg)
        if len(sub.shape) != 1:
            msg = (
                f"action_space[{uid!r}].shape={sub.shape} is not 1-D; "
                "non-1-D actions require a custom AgentControlModel rather "
                "than the DoubleIntegratorControlModel default."
            )
            raise ValueError(msg)
        model: AgentControlModel = DoubleIntegratorControlModel(
            uid=uid, action_dim=int(sub.shape[0])
        )
        control_models[uid] = model
    return control_models

build_emergency_controllers

build_emergency_controllers(
    env: Env[Any, Any],
) -> dict[str, EmergencyController]

Build per-uid emergency-controller dispatch for the cell's env (P1.04.6).

ADR-007 §Stage 1b Rev 8: the training-time :func:concerto.safety.braking.maybe_brake call routes each dangerous-pair uid's aggregate repulsion through its :class:~concerto.safety.emergency.EmergencyController to translate the Cartesian vector into a control-space override. Heterogeneous embodiments need heterogeneous controllers:

  • 7-DOF panda uids (action_dim != position_dim): dispatch to :class:~concerto.safety.emergency.JacobianEmergencyController with the env-built :class:JacobianControlModel (same instance the outer CBF row uses, so the kinematic convention is configured once at chamber-side wiring time per ADR-004 §Revision history 2026-05-17).
  • Double-integrator uids (action_dim == position_dim, fetch / MPE / Stage-0 default): dispatch to :class:~concerto.safety.emergency.CartesianAccelEmergencyController.

Dispatch on :class:JacobianControlModel instance identity rather than on uid strings or action_dim heuristics — the model is the source-of-truth for the embodiment / control-space mapping.

Parameters:

  • env (Env[Any, Any]) –

    A multi-agent Gymnasium-conformant env. Must expose build_control_models() (the env-side dispatch) or match the default-DI fallback in :func:build_control_models.

Returns:

Source code in src/chamber/benchmarks/training_runner.py
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
def build_emergency_controllers(
    env: gym.Env[Any, Any],
) -> dict[str, EmergencyController]:
    """Build per-uid emergency-controller dispatch for the cell's env (P1.04.6).

    ADR-007 §Stage 1b Rev 8: the training-time
    :func:`concerto.safety.braking.maybe_brake` call routes each
    dangerous-pair uid's aggregate repulsion through its
    :class:`~concerto.safety.emergency.EmergencyController` to translate
    the Cartesian vector into a control-space override. Heterogeneous
    embodiments need heterogeneous controllers:

    - **7-DOF panda uids** (action_dim != position_dim): dispatch to
      :class:`~concerto.safety.emergency.JacobianEmergencyController`
      with the env-built :class:`JacobianControlModel` (same instance
      the outer CBF row uses, so the kinematic convention is configured
      once at chamber-side wiring time per ADR-004 §Revision history
      2026-05-17).
    - **Double-integrator uids** (action_dim == position_dim, fetch /
      MPE / Stage-0 default): dispatch to
      :class:`~concerto.safety.emergency.CartesianAccelEmergencyController`.

    Dispatch on :class:`JacobianControlModel` instance identity rather
    than on ``uid`` strings or ``action_dim`` heuristics — the model is
    the source-of-truth for the embodiment / control-space mapping.

    Args:
        env: A multi-agent Gymnasium-conformant env. Must expose
            ``build_control_models()`` (the env-side dispatch) or match
            the default-DI fallback in :func:`build_control_models`.

    Returns:
        ``{uid: EmergencyController}`` for every uid in the cell.
    """
    from concerto.safety.emergency import (
        CartesianAccelEmergencyController,
        JacobianEmergencyController,
    )

    control_models = build_control_models(env)
    controllers: dict[str, EmergencyController] = {}
    for uid, model in control_models.items():
        if isinstance(model, JacobianControlModel):
            controllers[uid] = JacobianEmergencyController(control_model=model)
        else:
            controllers[uid] = CartesianAccelEmergencyController()
    return controllers

build_env

build_env(env_cfg: EnvConfig, *, root_seed: int) -> EnvLike

Construct the env from :class:EnvConfig (T4b.11; ADR-002 §Decisions; plan/05 §3.5).

Dispatch table:

  • "mpe_cooperative_push" → :class:MPECooperativePushEnv (T4b.13's empirical-guarantee env; Phase-0 stand-in).
  • "stage0_smoke" → :func:chamber.benchmarks.stage0_smoke_adapter.make_stage0_training_env (T4b.3 / plan/05 §3.4). Constructs the rig-validated ADR-001 3-robot env and adapts it to the :class:~concerto.training.ego_aht.EnvLike shape. Requires a Vulkan-capable GPU.
  • "stage1_pickplace" → :func:chamber.envs.stage1_pickplace.make_stage1_pickplace_env (P1.04 / ADR-007 §Stage 1b). The Stage-1b science-evaluation env; panda + fetch / panda + panda_partner tuple. EnvConfig.agent_uids drives the per-condition selection (:func:chamber.envs.stage1_pickplace.resolve_condition resolves via the prereg-driven condition_id table). :class:TrainedPolicyFactory invokes this dispatch from its __call__ body through the chamber-side :func:run_training wrapper. Requires SAPIEN + Vulkan.

Parameters:

  • env_cfg (EnvConfig) –

    The validated :class:~concerto.training.config.EnvConfig.

  • root_seed (int) –

    Project root seed routed to the env's :func:concerto.training.seeding.derive_substream substream for P6 reproducibility.

Returns:

  • EnvLike

    Concrete env instance satisfying the

  • EnvLike

    class:~concerto.training.ego_aht.EnvLike Protocol.

Raises:

  • ValueError

    If env_cfg.task is not in the dispatch table.

  • ChamberEnvCompatibilityError

    If a SAPIEN-backed env (stage0_smoke / stage1_pickplace) is requested and SAPIEN / Vulkan is unavailable (see ADR-001 §Risks).

Source code in src/chamber/benchmarks/training_runner.py
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
def build_env(env_cfg: EnvConfig, *, root_seed: int) -> EnvLike:
    """Construct the env from :class:`EnvConfig` (T4b.11; ADR-002 §Decisions; plan/05 §3.5).

    Dispatch table:

    - ``"mpe_cooperative_push"`` → :class:`MPECooperativePushEnv` (T4b.13's
      empirical-guarantee env; Phase-0 stand-in).
    - ``"stage0_smoke"`` → :func:`chamber.benchmarks.stage0_smoke_adapter.make_stage0_training_env`
      (T4b.3 / plan/05 §3.4). Constructs the rig-validated ADR-001
      3-robot env and adapts it to the
      :class:`~concerto.training.ego_aht.EnvLike` shape. Requires a
      Vulkan-capable GPU.
    - ``"stage1_pickplace"`` → :func:`chamber.envs.stage1_pickplace.make_stage1_pickplace_env`
      (P1.04 / ADR-007 §Stage 1b). The Stage-1b science-evaluation env;
      panda + fetch / panda + panda_partner tuple. ``EnvConfig.agent_uids``
      drives the per-condition selection
      (:func:`chamber.envs.stage1_pickplace.resolve_condition` resolves
      via the prereg-driven ``condition_id`` table).
      :class:`TrainedPolicyFactory` invokes this dispatch from its
      ``__call__`` body through the chamber-side
      :func:`run_training` wrapper. Requires SAPIEN + Vulkan.

    Args:
        env_cfg: The validated :class:`~concerto.training.config.EnvConfig`.
        root_seed: Project root seed routed to the env's
            :func:`concerto.training.seeding.derive_substream` substream
            for P6 reproducibility.

    Returns:
        Concrete env instance satisfying the
        :class:`~concerto.training.ego_aht.EnvLike` Protocol.

    Raises:
        ValueError: If ``env_cfg.task`` is not in the dispatch table.
        ChamberEnvCompatibilityError: If a SAPIEN-backed env
            (``stage0_smoke`` / ``stage1_pickplace``) is requested and
            SAPIEN / Vulkan is unavailable (see ADR-001 §Risks).
    """
    if env_cfg.task == "mpe_cooperative_push":
        return MPECooperativePushEnv(
            agent_uids=env_cfg.agent_uids,
            episode_length=env_cfg.episode_length,
            root_seed=root_seed,
        )
    if env_cfg.task == "stage0_smoke":
        return make_stage0_training_env(
            agent_uids=env_cfg.agent_uids,
            episode_length=env_cfg.episode_length,
            root_seed=root_seed,
        )
    if env_cfg.task == "stage1_pickplace":
        # Lazy import to keep build_env Tier-1-safe — the Stage-1b env
        # defers ManiSkill / SAPIEN imports to its own factory body
        # (ADR-007 §Stage 1b; Tier-1 import-safety on Vulkan-less hosts).
        from chamber.envs.stage1_pickplace import (
            make_stage1_pickplace_env,
        )

        # The condition_id is required for Stage-1b; the OM-homo vs
        # OM-hetero conditions share the same agent_uids tuple
        # ("panda_wristcam", "fetch"), so the tuple alone is insufficient
        # to disambiguate. EnvConfig.condition_id (P1.04) is the explicit
        # disambiguator: the cfg yaml sets a default; the
        # TrainedPolicyFactory overrides per call from env.condition_id.
        if env_cfg.condition_id is None:
            msg = (
                "build_env: task='stage1_pickplace' requires "
                "EnvConfig.condition_id to be set (one of the four "
                "Stage-1 prereg condition_id strings; see "
                "chamber.envs.stage1_pickplace.resolve_condition). The "
                "cfg yaml's env.condition_id field carries a default; "
                "TrainedPolicyFactory overrides per cell."
            )
            raise ValueError(msg)
        # make_stage1_pickplace_env returns gym.Env[Any, Any]; pyright
        # can't verify the structural EnvLike Protocol from that opaque
        # type. The Stage1PickPlaceEnv class (defined inside the factory
        # body) satisfies EnvLike at runtime (the class has the right
        # reset / step signatures); pin via cast so the dispatch typechecks.
        return cast(
            "EnvLike",
            make_stage1_pickplace_env(
                condition_id=env_cfg.condition_id,
                episode_length=env_cfg.episode_length,
                root_seed=root_seed,
            ),
        )
    raise ValueError(
        f"Unknown env task {env_cfg.task!r}; supported: "
        "'mpe_cooperative_push', 'stage0_smoke', 'stage1_pickplace'."
    )

build_partner

build_partner(partner_cfg: PartnerConfig) -> PartnerLike

Construct the frozen partner from :class:PartnerConfig (ADR-009 §Decision).

Builds a :class:~chamber.partners.api.PartnerSpec from the config fields and routes through :func:chamber.partners.registry.load_partner so the M4a registry's loud-fail contract (KeyError listing known classes) applies uniformly.

Parameters:

  • partner_cfg (PartnerConfig) –

    The validated :class:~concerto.training.config.PartnerConfig.

Returns:

  • PartnerLike

    Concrete frozen partner satisfying the

  • PartnerLike

    class:~concerto.training.ego_aht.PartnerLike Protocol.

Raises:

  • KeyError

    If partner_cfg.class_name is not registered in :data:chamber.partners.registry._REGISTRY.

Source code in src/chamber/benchmarks/training_runner.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def build_partner(partner_cfg: PartnerConfig) -> PartnerLike:
    """Construct the frozen partner from :class:`PartnerConfig` (ADR-009 §Decision).

    Builds a :class:`~chamber.partners.api.PartnerSpec` from the config
    fields and routes through :func:`chamber.partners.registry.load_partner`
    so the M4a registry's loud-fail contract (KeyError listing known
    classes) applies uniformly.

    Args:
        partner_cfg: The validated
            :class:`~concerto.training.config.PartnerConfig`.

    Returns:
        Concrete frozen partner satisfying the
        :class:`~concerto.training.ego_aht.PartnerLike` Protocol.

    Raises:
        KeyError: If ``partner_cfg.class_name`` is not registered in
            :data:`chamber.partners.registry._REGISTRY`.
    """
    spec = PartnerSpec(
        class_name=partner_cfg.class_name,
        seed=partner_cfg.seed,
        checkpoint_step=partner_cfg.checkpoint_step,
        weights_uri=partner_cfg.weights_uri,
        extra=dict(partner_cfg.extra),
    )
    return load_partner(spec)

build_safety_dt

build_safety_dt(
    env: Env[Any, Any], *, fallback: float = 0.02
) -> float

Source the control-step dt from the env (P1.04.5; ADR-007 §Stage 1b).

Reads :attr:env.control_timestep (the ManiSkill BaseEnv default; forwarded through the Stage-1b wrapper chain via the :class:Stage1ASStateSynthesizer.__getattr__ forwarder). Falls back to fallback (default 0.02 s — the ManiSkill v3 default) on envs that don't expose control_timestep (Tier-1 fake envs that aren't ManiSkill-backed).

Parameters:

  • env (Env[Any, Any]) –

    The cell's env.

  • fallback (float, default: 0.02 ) –

    Default dt in seconds when the env has no control_timestep. The training loop's safety wiring uses this for the conformal predictor's lookahead horizon and for the numerical-difference velocity computation in :meth:Stage1PickPlaceEnv.build_agent_snapshots.

Returns:

  • float

    The control-step dt in seconds.

Source code in src/chamber/benchmarks/training_runner.py
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
def build_safety_dt(env: gym.Env[Any, Any], *, fallback: float = 0.02) -> float:
    """Source the control-step dt from the env (P1.04.5; ADR-007 §Stage 1b).

    Reads :attr:`env.control_timestep` (the ManiSkill BaseEnv default;
    forwarded through the Stage-1b wrapper chain via the
    :class:`Stage1ASStateSynthesizer.__getattr__` forwarder). Falls
    back to ``fallback`` (default 0.02 s — the ManiSkill v3 default)
    on envs that don't expose ``control_timestep`` (Tier-1 fake envs
    that aren't ManiSkill-backed).

    Args:
        env: The cell's env.
        fallback: Default dt in seconds when the env has no
            ``control_timestep``. The training loop's safety wiring
            uses this for the conformal predictor's lookahead horizon
            and for the numerical-difference velocity computation in
            :meth:`Stage1PickPlaceEnv.build_agent_snapshots`.

    Returns:
        The control-step dt in seconds.
    """
    dt = getattr(env, "control_timestep", None)
    if dt is None and hasattr(env, "get_wrapper_attr"):
        try:
            dt = env.get_wrapper_attr("control_timestep")
        except AttributeError:
            dt = None
    if dt is None:
        return fallback
    try:
        return float(dt)
    except (TypeError, ValueError):
        return fallback

build_safety_filter

build_safety_filter(
    env: Env[Any, Any],
) -> EgoOnlySafetyFilter

Construct the CBF-QP outer filter for the cell's env (P1.04.5; ADR-007 §Stage 1b).

Wraps :meth:concerto.safety.cbf_qp.ExpCBFQP.ego_only with the per-uid control-model map sourced from :func:build_control_models (the env's build_control_models() method when present; otherwise the default-DI dispatch). The filter is constructed once per (seed, condition) cell by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory and handed to :func:concerto.training.ego_aht.train via the safety_filter kwarg.

Construction is cheap (no scene-state assumed); the filter's per-step :meth:filter method is what does the work.

Parameters:

  • env (Env[Any, Any]) –

    A multi-agent Gymnasium-conformant env exposing a build_control_models() method (or matching the default- DI dispatch in :func:build_control_models).

Returns:

Source code in src/chamber/benchmarks/training_runner.py
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
def build_safety_filter(env: gym.Env[Any, Any]) -> EgoOnlySafetyFilter:
    """Construct the CBF-QP outer filter for the cell's env (P1.04.5; ADR-007 §Stage 1b).

    Wraps :meth:`concerto.safety.cbf_qp.ExpCBFQP.ego_only` with the
    per-uid control-model map sourced from
    :func:`build_control_models` (the env's
    ``build_control_models()`` method when present; otherwise the
    default-DI dispatch). The filter is constructed once per
    ``(seed, condition)`` cell by
    :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
    and handed to :func:`concerto.training.ego_aht.train` via the
    ``safety_filter`` kwarg.

    Construction is cheap (no scene-state assumed); the filter's
    per-step :meth:`filter` method is what does the work.

    Args:
        env: A multi-agent Gymnasium-conformant env exposing a
            ``build_control_models()`` method (or matching the default-
            DI dispatch in :func:`build_control_models`).

    Returns:
        :class:`concerto.safety.api.EgoOnlySafetyFilter` instance.
    """
    from concerto.safety.cbf_qp import ExpCBFQP

    control_models = build_control_models(env)
    return ExpCBFQP.ego_only(control_models=control_models)

build_safety_snapshot_builder

build_safety_snapshot_builder(
    env: Env[Any, Any],
) -> (
    Callable[[gym.Env[Any, Any]], dict[str, AgentSnapshot]]
    | None
)

Return the env's build_agent_snapshots method, or None (P1.04.5; ADR-007 §Stage 1b).

The training loop calls the returned callable once per step (when the safety filter is wired) to build the per-uid Cartesian snapshot map the conformal layer consumes. Envs that don't expose build_agent_snapshots return None; the training loop then treats the safety stack as inert (the kwargs validation in :func:concerto.training.ego_aht.train raises if the operator passed a filter without a snapshot builder).

Parameters:

  • env (Env[Any, Any]) –

    A Gymnasium env. The Stage-1b env's :meth:Stage1PickPlaceEnv.build_agent_snapshots returns the per-uid snapshot dict.

Returns:

  • Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None

    The bound method or None. The returned callable's signature

  • Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None

    is (env) -> dict[str, AgentSnapshot] for symmetry with the

  • Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None

    other build_* helpers — implementations may ignore the

  • Callable[[Env[Any, Any]], dict[str, AgentSnapshot]] | None

    env arg if they're bound methods on the env itself.

Source code in src/chamber/benchmarks/training_runner.py
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
def build_safety_snapshot_builder(
    env: gym.Env[Any, Any],
) -> Callable[[gym.Env[Any, Any]], dict[str, AgentSnapshot]] | None:
    """Return the env's ``build_agent_snapshots`` method, or ``None`` (P1.04.5; ADR-007 §Stage 1b).

    The training loop calls the returned callable once per step (when
    the safety filter is wired) to build the per-uid Cartesian snapshot
    map the conformal layer consumes. Envs that don't expose
    ``build_agent_snapshots`` return ``None``; the training loop then
    treats the safety stack as inert (the kwargs validation in
    :func:`concerto.training.ego_aht.train` raises if the operator
    passed a filter without a snapshot builder).

    Args:
        env: A Gymnasium env. The Stage-1b env's
            :meth:`Stage1PickPlaceEnv.build_agent_snapshots` returns
            the per-uid snapshot dict.

    Returns:
        The bound method or ``None``. The returned callable's signature
        is ``(env) -> dict[str, AgentSnapshot]`` for symmetry with the
        other ``build_*`` helpers — implementations may ignore the
        ``env`` arg if they're bound methods on the env itself.
    """
    builder = getattr(env, "build_agent_snapshots", None)
    if not callable(builder):
        return None

    def _adapter(_env: gym.Env[Any, Any]) -> dict[str, AgentSnapshot]:
        del _env  # builder is bound to the env already.
        return cast("dict[str, AgentSnapshot]", builder())

    return _adapter

run_training

run_training(
    cfg: EgoAHTConfig,
    *,
    trainer_factory: TrainerFactory | None = None,
    repo_root: Path | None = None,
    safety_filter: EgoOnlySafetyFilter | None = None,
    safety_state: SafetyState | None = None,
    safety_bounds: Bounds | None = None,
    safety_snapshot_builder: Callable[
        [EnvLike], dict[str, AgentSnapshot]
    ]
    | None = None,
    safety_dt: float | None = None,
    safety_emergency_controllers: Mapping[
        str, EmergencyController
    ]
    | None = None,
) -> TrainingResult

Build env + partner and run the ego-AHT training loop (T4b.11; ADR-002 §Decisions).

Drives :func:concerto.training.ego_aht.train with the chamber-side env + partner instances. The default trainer_factory is :meth:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config (M4b-8a / plan/05 §3.5): the HARL-HAPPO-backed ego-PPO trainer that the empirical-guarantee experiment (M4b-8b / T4b.13) measures. Pass trainer_factory=None is therefore a no-op fallback to the learning trainer; callers that explicitly want :class:~concerto.training.ego_aht.RandomEgoTrainer (test fixtures, smoke runs) must construct it and pass it in.

Returns a :class:~concerto.training.ego_aht.TrainingResult NamedTuple (P1.04 / ADR-007 §Stage 1b) carrying both the diagnostic :class:~concerto.training.ego_aht.RewardCurve and the trained :class:~concerto.training.ego_aht.EgoTrainer instance. The trainer is exposed alongside the curve so Phase-1 callers (:class:chamber.benchmarks.stage1_common.TrainedPolicyFactory) can wrap result.trainer.act in a per-cell closure without paying a checkpoint round-trip cost. Pre-P1.04 the return type was the bare :class:RewardCurve; the NamedTuple wrapper is backward-compatible at the tuple-unpack level (curve, trainer = run_training(cfg)).

Parameters:

  • cfg (EgoAHTConfig) –

    Validated :class:~concerto.training.config.EgoAHTConfig.

  • trainer_factory (TrainerFactory | None, default: None ) –

    Optional :class:~concerto.training.ego_aht.TrainerFactory. None (default) selects the M4b-8a :class:~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer via its from_config classmethod.

  • repo_root (Path | None, default: None ) –

    Working-tree root for run-metadata provenance.

  • safety_filter (EgoOnlySafetyFilter | None, default: None ) –

    Optional CBF-QP outer filter (P1.04.5; ADR-007 §Stage 1b). All five safety kwargs are threaded through to :func:concerto.training.ego_aht.train. The chamber- side :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory constructs them per cell via :meth:_build_safety_for_cell.

  • safety_state (SafetyState | None, default: None ) –

    Optional :class:SafetyState (conformal slack vector; mutated in place per step).

  • safety_bounds (Bounds | None, default: None ) –

    Optional per-task :class:Bounds.

  • safety_snapshot_builder (Callable[[EnvLike], dict[str, AgentSnapshot]] | None, default: None ) –

    Optional callable building the per-uid :class:AgentSnapshot map per step.

  • safety_dt (float | None, default: None ) –

    Optional control-step dt in seconds for the predictor lookahead.

  • safety_emergency_controllers (Mapping[str, EmergencyController] | None, default: None ) –

    Optional per-uid :class:~concerto.safety.emergency.EmergencyController map (P1.04.6; ADR-007 §Stage 1b Rev 8). Threaded through to :func:concerto.training.ego_aht.train for the in-rollout :func:concerto.safety.braking.maybe_brake call. Built per cell by :class:chamber.benchmarks.stage1_common.TrainedPolicyFactory via :func:build_emergency_controllers.

Returns:

Raises:

  • ValueError

    If the env task is unknown.

  • NotImplementedError

    If the env task is stage0_smoke (deferred).

  • KeyError

    If the partner class is not registered.

Source code in src/chamber/benchmarks/training_runner.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def run_training(
    cfg: EgoAHTConfig,
    *,
    trainer_factory: TrainerFactory | None = None,
    repo_root: Path | None = None,
    # P1.04.5 / ADR-007 §Stage 1b: safety-stack kwargs threaded through
    # to train(). All-None ⇒ pre-P1.04.5 unfiltered behaviour. When
    # cfg.safety.enabled is True the caller must populate all five;
    # train() loud-fails on intent mismatch.
    safety_filter: EgoOnlySafetyFilter | None = None,
    safety_state: SafetyState | None = None,
    safety_bounds: Bounds | None = None,
    safety_snapshot_builder: (Callable[[EnvLike], dict[str, AgentSnapshot]] | None) = None,
    safety_dt: float | None = None,
    # P1.04.6 / ADR-007 §Stage 1b Rev 8: per-uid emergency-controller
    # dispatch for :func:`concerto.safety.braking.maybe_brake`. ``None``
    # falls back to the Cartesian default — correct for double-
    # integrator MPE-stand-in cells but a silent dimension-mismatch
    # foot-gun for the Stage-1b 7-DOF panda uid. Stage-1b callers must
    # populate via :func:`build_emergency_controllers`.
    safety_emergency_controllers: Mapping[str, EmergencyController] | None = None,
) -> TrainingResult:
    """Build env + partner and run the ego-AHT training loop (T4b.11; ADR-002 §Decisions).

    Drives :func:`concerto.training.ego_aht.train` with the chamber-side
    env + partner instances. The default ``trainer_factory`` is
    :meth:`~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer.from_config`
    (M4b-8a / plan/05 §3.5): the HARL-HAPPO-backed ego-PPO trainer that
    the empirical-guarantee experiment (M4b-8b / T4b.13) measures.
    Pass ``trainer_factory=None`` is therefore a no-op fallback to the
    learning trainer; callers that explicitly want
    :class:`~concerto.training.ego_aht.RandomEgoTrainer` (test fixtures,
    smoke runs) must construct it and pass it in.

    Returns a :class:`~concerto.training.ego_aht.TrainingResult` NamedTuple
    (P1.04 / ADR-007 §Stage 1b) carrying both the diagnostic
    :class:`~concerto.training.ego_aht.RewardCurve` and the trained
    :class:`~concerto.training.ego_aht.EgoTrainer` instance. The trainer
    is exposed alongside the curve so Phase-1 callers
    (:class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`) can
    wrap ``result.trainer.act`` in a per-cell closure without paying a
    checkpoint round-trip cost. Pre-P1.04 the return type was the bare
    :class:`RewardCurve`; the NamedTuple wrapper is backward-compatible
    at the tuple-unpack level (``curve, trainer = run_training(cfg)``).

    Args:
        cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
        trainer_factory: Optional :class:`~concerto.training.ego_aht.TrainerFactory`.
            ``None`` (default) selects the M4b-8a
            :class:`~chamber.benchmarks.ego_ppo_trainer.EgoPPOTrainer`
            via its ``from_config`` classmethod.
        repo_root: Working-tree root for run-metadata provenance.
        safety_filter: Optional CBF-QP outer filter (P1.04.5; ADR-007
            §Stage 1b). All five safety kwargs are threaded through
            to :func:`concerto.training.ego_aht.train`. The chamber-
            side :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            constructs them per cell via :meth:`_build_safety_for_cell`.
        safety_state: Optional :class:`SafetyState` (conformal slack
            vector; mutated in place per step).
        safety_bounds: Optional per-task :class:`Bounds`.
        safety_snapshot_builder: Optional callable building the per-uid
            :class:`AgentSnapshot` map per step.
        safety_dt: Optional control-step dt in seconds for the
            predictor lookahead.
        safety_emergency_controllers: Optional per-uid
            :class:`~concerto.safety.emergency.EmergencyController`
            map (P1.04.6; ADR-007 §Stage 1b Rev 8). Threaded through to
            :func:`concerto.training.ego_aht.train` for the in-rollout
            :func:`concerto.safety.braking.maybe_brake` call. Built per
            cell by
            :class:`chamber.benchmarks.stage1_common.TrainedPolicyFactory`
            via :func:`build_emergency_controllers`.

    Returns:
        :class:`~concerto.training.ego_aht.TrainingResult` —
        ``result.curve`` is the :class:`RewardCurve`; ``result.trainer``
        is the trained :class:`EgoTrainer` instance.

    Raises:
        ValueError: If the env task is unknown.
        NotImplementedError: If the env task is ``stage0_smoke`` (deferred).
        KeyError: If the partner class is not registered.
    """
    env = build_env(cfg.env, root_seed=cfg.seed)
    partner = build_partner(cfg.partner)
    factory: TrainerFactory = (
        trainer_factory if trainer_factory is not None else EgoPPOTrainer.from_config
    )
    return train(
        cfg,
        env=env,
        partner=partner,
        trainer_factory=factory,
        repo_root=repo_root,
        safety_filter=safety_filter,
        safety_state=safety_state,
        safety_bounds=safety_bounds,
        safety_snapshot_builder=safety_snapshot_builder,
        safety_dt=safety_dt,
        safety_emergency_controllers=safety_emergency_controllers,
    )

Ego-PPO trainer wrapping HARL's HAPPO actor (M4b-8a; ADR-002 §Decisions).

The :class:EgoPPOTrainer is the concrete :class:~concerto.training.ego_aht.EgoTrainer that drives the Phase-0 ego-AHT empirical-guarantee experiment (T4b.13 / plan/05 §3.5). It satisfies the Protocol with a thin layer over HARL's :class:harl.algorithms.actors.happo.HAPPO actor + a hand-rolled MLP critic + a hand-rolled rollout buffer.

Architecture (plan/05 §3.5; ADR-002 §Decisions):

  • The HARL fork's :class:harl.algorithms.actors.ego_aht_happo.EgoAHTHAPPO is a thin reference subclass for external researchers; its :meth:from_config body is intentionally :class:NotImplementedError.
  • CONCERTO's training run binds :meth:EgoPPOTrainer.from_config instead. The bridge from :class:~concerto.training.config.EgoAHTConfig (project- specific Pydantic) to HARL's 18-key args dict lives here, on the CONCERTO side, where project-specific surfaces belong.
  • Rollout / GAE / advantage normalization / PPO update / critic update all run in this module — not in the fork, not via HARL's :class:OnPolicyHARunner. This satisfies the architecture rule in plan/05 §3.5 ("rollout/update logic on the CONCERTO side") and lets the test_advantage_decomposition.py property test compare against a hand-rolled GAE reference without tautology.

Public surface:

  • :func:compute_gae — module-level GAE recurrence (ADR-002 §Decisions; Schulman et al. 2016 §6.3). Used both by :class:EgoPPOTrainer and by the property test (T4b.13 / plan/05 §5).
  • :func:normalize_advantages — module-level mean/std normalization. Same formula HARL applies inside :meth:HAPPO.train for state_type "EP" (without nan-masking, since the ego is always active in our 2-agent task — documented in the function's docstring).
  • :class:EgoPPOTrainer — the trainer.

ADR-002 risk-mitigation #1: the empirical-guarantee assertion in T4b.13 / plan/05 §6 criterion 4 is a trip-wire for the framework choice. Do not change the GAE / normalization / PPO update math without re-running the property test and re-validating the reference implementation against the canonical PPO paper formulation.

EgoPPOTrainer

Ego-PPO trainer wrapping HARL's HAPPO actor (M4b-8a; ADR-002 §Decisions).

Implements the :class:~concerto.training.ego_aht.EgoTrainer Protocol over a HARL :class:HAPPO actor + a hand-rolled :class:_EgoCritic. The training loop in :func:concerto.training.ego_aht.train calls :meth:act per step, :meth:observe per step (with the next obs + reward + done), and :meth:update at every rollout_length boundary. The trainer's internal rollout buffer accumulates (obs, action, log_prob, value, reward, done) tuples between update calls; :meth:update then runs GAE + normalization + a HARL PPO update + a hand-rolled critic update, then clears the buffer.

Determinism caveat (P6 / ADR-002): HARL's :class:DiagGaussian samples actions via :meth:torch.distributions.Normal.sample, which uses torch's global RNG. The trainer seeds that global RNG once at construction time via :func:concerto.training.seeding.derive_substream; this is enough for byte-identical CPU runs at the same seed but means concurrent trainers in the same process would alias each other's draws. Phase-0 runs are single-trainer per process, so this is acceptable.

See plan/05 §3.5 for the architecture rationale (rollout/update logic on the CONCERTO side, not in the fork) and plan/05 §5 for the empirical-guarantee verification contract (T4b.13).

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
class EgoPPOTrainer:
    """Ego-PPO trainer wrapping HARL's HAPPO actor (M4b-8a; ADR-002 §Decisions).

    Implements the :class:`~concerto.training.ego_aht.EgoTrainer` Protocol
    over a HARL :class:`HAPPO` actor + a hand-rolled :class:`_EgoCritic`.
    The training loop in :func:`concerto.training.ego_aht.train` calls
    :meth:`act` per step, :meth:`observe` per step (with the *next* obs +
    reward + done), and :meth:`update` at every ``rollout_length``
    boundary. The trainer's internal rollout buffer accumulates
    ``(obs, action, log_prob, value, reward, done)`` tuples between
    update calls; :meth:`update` then runs GAE + normalization + a HARL
    PPO update + a hand-rolled critic update, then clears the buffer.

    Determinism caveat (P6 / ADR-002): HARL's :class:`DiagGaussian` samples
    actions via :meth:`torch.distributions.Normal.sample`, which uses
    torch's *global* RNG. The trainer seeds that global RNG once at
    construction time via
    :func:`concerto.training.seeding.derive_substream`; this is enough
    for byte-identical CPU runs at the same seed but means concurrent
    trainers in the same process would alias each other's draws.
    Phase-0 runs are single-trainer per process, so this is acceptable.

    See plan/05 §3.5 for the architecture rationale (rollout/update logic
    on the CONCERTO side, not in the fork) and plan/05 §5 for the
    empirical-guarantee verification contract (T4b.13).
    """

    #: Class-level marker for the property test ``test_advantage_decomposition.py``
    #: to assert this trainer routes through the module-level
    #: :func:`compute_gae` + :func:`normalize_advantages` (rather than a
    #: shadow re-implementation). Keeps the test honest if a refactor
    #: ever inlines the GAE math into :meth:`update`.
    USES_MODULE_GAE: ClassVar[bool] = True

    def __init__(
        self,
        *,
        cfg: EgoAHTConfig,
        ego_uid: str,
        ego_obs_space: gym.spaces.Box,  # type: ignore[type-arg]
        ego_act_space: gym.spaces.Box,  # type: ignore[type-arg]
        partner: PartnerLike,
        device: torch.device | None = None,
    ) -> None:
        """Build the trainer (M4b-8a; ADR-002 §Decisions; plan/05 §3.5).

        Prefer :meth:`from_config` over direct construction — it derives
        ``ego_obs_space`` and ``ego_act_space`` from the env so the call
        site cannot drift from the env's actual shapes.

        The partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3)
        runs first, before any tensor allocation: a partner with any
        ``requires_grad=True`` parameter (and no
        :class:`~chamber.partners.interface.PartnerBase` shield blocking
        ``named_parameters`` access) aborts the construction with a
        :class:`ValueError` that names the offending parameter path.

        Args:
            cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
                The trainer reads ``cfg.seed``, ``cfg.happo.*``.
            ego_uid: The env-side uid the ego acts on (matches
                ``cfg.env.agent_uids[0]``).
            ego_obs_space: The Box space for the ego's flat state vector.
                Synthesized by :meth:`from_config` from the env's nested
                observation_space.
            ego_act_space: The Box space for the ego's action vector.
                Read directly from ``env.action_space[ego_uid]``.
            partner: The frozen partner instance the ego will be trained
                against. Validated via :func:`_assert_partner_is_frozen`
                before any tensor allocation. The trainer does not retain
                a reference — the partner is consumed by the training
                loop in :func:`concerto.training.ego_aht.train`, not by
                the trainer itself.
            device: Torch device. Defaults to CPU when called directly;
                :meth:`from_config` resolves it from
                :attr:`RuntimeConfig.device` (auto / cpu / cuda / mps)
                via :func:`_resolve_device` and applies the determinism
                flag from :attr:`RuntimeConfig.deterministic_torch` on
                CPU before constructing the trainer (ADR-002 §Decisions;
                plan/08 §4).

        Raises:
            ValueError: If ``partner`` exposes a torch parameter with
                ``requires_grad=True`` (ADR-009 §Consequences).
        """
        # ADR-009 §Consequences + plan/05 §6 #3: partner-freeze gate is
        # the first construction step, before any tensor allocation.
        _assert_partner_is_frozen(partner)
        self._device = device or torch.device("cpu")
        self._ego_uid = ego_uid
        self._gamma = cfg.happo.gamma
        self._gae_lambda = cfg.happo.gae_lambda
        self._n_epochs = cfg.happo.n_epochs
        self._batch_size = cfg.happo.batch_size
        self._rollout_length = cfg.happo.rollout_length
        # Seed torch's global RNG. ``derive_substream`` is the project-wide
        # P6 reproducibility entry-point; we cast its 64-bit draw down to
        # int32 because ``torch.manual_seed`` documents 64-bit input but
        # has historically failed silently on values > 2**63 on macOS.
        torch_seed_int = int(
            derive_substream(_TORCH_SUBSTREAM, root_seed=cfg.seed)
            .default_rng()
            .integers(0, 2**31 - 1)
        )
        torch.manual_seed(torch_seed_int)
        self._minibatch_rng: np.random.Generator = derive_substream(
            _MINIBATCH_SUBSTREAM, root_seed=cfg.seed
        ).default_rng()

        harl_args = _build_harl_args(happo=cfg.happo)
        self._harl_args = harl_args
        self._happo: HAPPO = HAPPO(harl_args, ego_obs_space, ego_act_space, self._device)
        obs_dim = int(ego_obs_space.shape[0])
        self._obs_dim = obs_dim
        self._act_dim = int(ego_act_space.shape[0])
        self._critic = _EgoCritic(obs_dim=obs_dim, hidden_dim=cfg.happo.hidden_dim).to(self._device)
        self._critic_optim = torch.optim.Adam(self._critic.parameters(), lr=cfg.happo.lr)

        # Rollout buffer (cleared after each update).
        self._buf_obs: list[NDArray[np.float32]] = []
        self._buf_actions: list[NDArray[np.float32]] = []
        self._buf_log_probs: list[NDArray[np.float32]] = []
        self._buf_values: list[float] = []
        self._buf_rewards: list[float] = []
        # ADR-002 §Decisions + Pardo 2017 / issue #62: terminated and
        # truncated are stored *separately* so :meth:`update` can build a
        # per-step bootstrap target. Treating truncation as termination
        # zeros V(s_next) at every time-limit boundary, biasing advantages
        # and slowing learning 2-3x on time-limit-only envs.
        self._buf_terminated: list[bool] = []
        self._buf_truncated: list[bool] = []
        # V(s_truncated_final) computed via the critic at observe time
        # when a truncation occurs. Indexed parallel to the buffer; unused
        # at non-truncation steps but populated with 0.0 so the array
        # shapes stay aligned.
        self._buf_truncation_bootstraps: list[float] = []
        # Pre-step cache (filled by :meth:`act`, consumed by :meth:`observe`).
        self._pending: _PendingStep | None = None
        # Last next-obs (cached by :meth:`observe`) for the critic
        # bootstrap at update time.
        self._last_next_obs: NDArray[np.float32] | None = None

    @classmethod
    def from_config(
        cls,
        cfg: EgoAHTConfig,
        *,
        env: EnvLike,
        partner: PartnerLike,
        ego_uid: str,
    ) -> EgoPPOTrainer:
        """Build from :class:`EgoAHTConfig` + a concrete env + the frozen partner.

        ADR-002 §Decisions / ADR-009 §Consequences / plan/05 §3.5 + §6 #3.

        This is the :class:`~concerto.training.ego_aht.TrainerFactory`
        callable that :func:`concerto.training.ego_aht.train` plugs in via
        dependency injection. The factory reads the ego's observation +
        action shapes off the env directly so the trainer's network
        widths cannot drift from what the env actually emits.

        The partner-freeze gate (:func:`_assert_partner_is_frozen`) runs
        first, before any tensor allocation, so a bad partner aborts
        cheaply (plan/05 §6 #3).

        Per plan/05 §3.5 the bridge from :class:`EgoAHTConfig` to HARL's
        args dict lives in this CONCERTO-side module rather than in the
        fork's :meth:`EgoAHTHAPPO.from_config` (which intentionally
        stays :class:`NotImplementedError` — see
        ``scripts/harl-fork-patches/v0.1.0-aht/`` README for the
        role-split rationale).

        Args:
            cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
            env: Concrete env satisfying :class:`EnvLike`. Must expose
                ``observation_space["agent"][ego_uid]["state"]`` as a
                :class:`gym.spaces.Box` and ``action_space[ego_uid]``
                as a :class:`gym.spaces.Box`.
            partner: Frozen partner instance the ego will train against.
                Validated by :func:`_assert_partner_is_frozen` as the
                first construction step (ADR-009 §Consequences).
            ego_uid: The env-side uid the ego acts on.

        Returns:
            A constructed :class:`EgoPPOTrainer` ready for
            :func:`concerto.training.ego_aht.train`.

        Raises:
            ValueError: If ``partner`` exposes a torch parameter with
                ``requires_grad=True`` (ADR-009 §Consequences).
            TypeError: If the env's ego observation or action space is
                not a :class:`gym.spaces.Box`.
        """
        # ADR-009 §Consequences / plan/05 §6 #3: partner-freeze gate
        # runs first, before any env-space introspection or tensor
        # allocation. A bad partner aborts the construction cheaply.
        _assert_partner_is_frozen(partner)
        env_obs_space = env.observation_space  # type: ignore[attr-defined]
        ego_state_space = env_obs_space["agent"][ego_uid]["state"]
        if not isinstance(ego_state_space, gym.spaces.Box):
            raise TypeError(
                "EgoPPOTrainer.from_config requires "
                f"env.observation_space['agent'][{ego_uid!r}]['state'] to be "
                f"a gym.spaces.Box; got {type(ego_state_space).__name__}."
            )
        env_act_space = env.action_space  # type: ignore[attr-defined]
        ego_act_space = env_act_space[ego_uid]
        if not isinstance(ego_act_space, gym.spaces.Box):
            raise TypeError(
                "EgoPPOTrainer.from_config requires "
                f"env.action_space[{ego_uid!r}] to be a gym.spaces.Box; "
                f"got {type(ego_act_space).__name__}."
            )
        device = _resolve_device(cfg.runtime.device)
        if device.type == "cpu" and cfg.runtime.deterministic_torch:
            # CPU determinism is the project's P6 contract (plan/08 §4).
            # Skip on CUDA/MPS where the contract is relaxed because
            # cuDNN / Metal kernels are not bit-deterministic even with
            # this flag.
            _set_torch_determinism(True)
        return cls(
            cfg=cfg,
            ego_uid=ego_uid,
            ego_obs_space=ego_state_space,
            ego_act_space=ego_act_space,
            partner=partner,
            device=device,
        )

    def act(
        self,
        obs: Mapping[str, Any],
        *,
        deterministic: bool = False,
    ) -> NDArray[np.floating]:
        """Sample the ego action; cache log-prob + value for :meth:`observe` (ADR-002 §Decisions).

        Caches ``(obs, action, log_prob, value)`` in private state so the
        next call to :meth:`observe` can insert them into the rollout
        buffer with the actual reward + done flag. The pairing relies on
        the contract in :func:`concerto.training.ego_aht.train` (each
        ``act(obs)`` is followed by exactly one
        ``observe(obs_next, reward, done)`` before the next ``act``).

        Args:
            obs: Env obs dict. The ego's flat state vector is read from
                ``obs["agent"][ego_uid]["state"]``.
            deterministic: When ``True``, returns the distribution mode
                instead of a sample. Used by evaluation paths
                (M4b-8b's per-checkpoint deterministic eval).

        Returns:
            The ego action as a length-``act_dim`` float32 array, ready
            to be packed into the env's dict-action.
        """
        flat = _flat_ego_obs(obs, self._ego_uid)
        # HARL expects shape ``(batch, obs_dim)`` and a positional
        # ``(rnn_states, masks)`` even when the policy is feedforward.
        # rnn_states is ignored downstream (use_recurrent_policy=False);
        # masks=1.0 means "do not reset the RNN state" — also a no-op
        # without an RNN. We pass shape-correct placeholders so HARL's
        # internal ``check`` calls accept them.
        obs_t = flat.reshape(1, -1)
        rnn_states = np.zeros(
            (1, self._harl_args["recurrent_n"], self._harl_args["hidden_sizes"][-1]),
            dtype=np.float32,
        )
        masks = np.ones((1, 1), dtype=np.float32)
        with torch.no_grad():
            action_t, log_prob_t, _ = self._happo.get_actions(
                obs_t, rnn_states, masks, None, deterministic
            )
            value_t = self._critic(torch.from_numpy(flat).to(self._device).unsqueeze(0))
        action = action_t.detach().cpu().numpy().squeeze(0).astype(np.float32)
        log_prob = log_prob_t.detach().cpu().numpy().squeeze(0).astype(np.float32)
        value = float(value_t.detach().cpu().numpy().squeeze())
        self._pending = _PendingStep(obs=flat, action=action, log_prob=log_prob, value=value)
        return action

    def observe(
        self,
        obs: Mapping[str, Any],
        reward: float,
        done: bool,
        *,
        truncated: bool = False,
    ) -> None:
        """Buffer ``(pending, reward, terminated, truncated)`` (ADR-002 §Decisions; Pardo 2017).

        The trainer keeps a single ``(obs, action, log_prob, value)``
        cache — set by :meth:`act` — and pairs it with the
        ``(reward, done, truncated)`` from :meth:`observe`. The post-step
        ``obs`` is also remembered as the bootstrap target for
        :meth:`update`.

        ``done`` keeps the historical "this step ended the episode" meaning
        (terminated OR truncated). ``truncated`` is the new keyword-only
        flag that distinguishes time-limit truncation from true
        termination. The trainer derives
        ``terminated = done and not truncated`` internally; the GAE
        bootstrap then treats truncation correctly per Pardo et al. 2017
        (V(s_truncated_final) flows through, not 0) — see :func:`compute_gae`
        + project issue #62 for the root-cause writeup.

        At truncation boundaries this method calls the critic on the
        post-step obs once to capture V(s_truncated_final); that value
        is later threaded into :func:`compute_gae` as the per-step
        bootstrap target. At non-truncation steps the recorded value is
        ``0.0`` and unused.

        Args:
            obs: Post-step env obs (the obs the env returned after the
                action passed to the most recent :meth:`act` call).
            reward: Per-step ego reward (the env's shared scalar).
            done: ``True`` if the env terminated *or* truncated this step
                (i.e. the loop should reset the env for the next step).
            truncated: ``True`` iff this step ended the episode via
                time-limit truncation (env returned ``truncated=True``).
                Default ``False`` is the conservative legacy behavior —
                callers that don't yet distinguish truncation from
                termination see the historical "treat as terminal"
                semantics.
        """
        pending = self._pending
        if pending is None:
            # No prior :meth:`act` — defensive no-op so the trainer can
            # safely ignore a stray ``observe`` (eg. a pre-rollout
            # bookkeeping call).
            return
        terminated = bool(done) and not bool(truncated)
        truncated_b = bool(truncated)
        self._buf_obs.append(pending.obs)
        self._buf_actions.append(pending.action)
        self._buf_log_probs.append(pending.log_prob)
        self._buf_values.append(pending.value)
        self._buf_rewards.append(float(reward))
        self._buf_terminated.append(terminated)
        self._buf_truncated.append(truncated_b)
        next_flat = _flat_ego_obs(obs, self._ego_uid)
        # Capture V(s_truncated_final) eagerly at the boundary. Doing it
        # here (rather than at update time) is the cleanest place because
        # ``next_flat`` is the actual post-step obs of the truncated
        # episode; by update time the training loop has already reset
        # the env, and the per-step "next obs in buffer" at index
        # ``t + 1`` is from the *new* episode. The cost is one extra
        # critic forward pass per truncation boundary.
        if truncated_b and not terminated:
            with torch.no_grad():
                v_trunc = float(
                    self._critic(torch.from_numpy(next_flat).to(self._device).unsqueeze(0))
                    .squeeze()
                    .item()
                )
            self._buf_truncation_bootstraps.append(v_trunc)
        else:
            self._buf_truncation_bootstraps.append(0.0)
        self._last_next_obs = next_flat
        # Clear the pre-step cache so a stray double-observe cannot
        # double-insert the same tuple.
        self._pending = None

    def _build_next_values(
        self,
        values: NDArray[np.float32],
        terminated: NDArray[np.bool_],
        truncated: NDArray[np.bool_],
        truncation_bootstraps: NDArray[np.float32],
    ) -> NDArray[np.float32]:
        """Per-step ``V(s_{t+1})`` bootstrap target for GAE (Pardo 2017 §4; issue #62).

        Mid-rollout non-boundary steps use ``values[t + 1]``. At
        terminations the bootstrap is ``0`` (true terminal: no value
        after). At truncations it is ``V(s_truncated_final)`` captured
        at observe time. The rollout-tail step (``t = n-1``) follows the
        same three-way rule, with the live critic call against the
        remembered next-obs as the non-boundary tail bootstrap.
        """
        n_steps = values.shape[0]
        next_values = np.zeros(n_steps, dtype=np.float32)
        for t in range(n_steps - 1):
            if terminated[t]:
                next_values[t] = 0.0
            elif truncated[t]:
                next_values[t] = truncation_bootstraps[t]
            else:
                next_values[t] = values[t + 1]
        last_t = n_steps - 1
        if terminated[last_t]:
            next_values[last_t] = 0.0
        elif truncated[last_t]:
            next_values[last_t] = truncation_bootstraps[last_t]
        elif self._last_next_obs is None:
            # Defensive: no remembered next-obs (rollout never observed
            # anything past the last step). Falls back to zero — safe
            # because the rollout will be cleared and the next one starts
            # fresh.
            next_values[last_t] = 0.0
        else:
            with torch.no_grad():
                next_values[last_t] = float(
                    self._critic(
                        torch.from_numpy(self._last_next_obs).to(self._device).unsqueeze(0)
                    )
                    .squeeze()
                    .item()
                )
        return next_values

    def update(self) -> None:
        """Compute GAE; run PPO + critic updates; clear the buffer (ADR-002 §Decisions; Pardo 2017).

        Order of operations (Schulman 2017, lifted to ego-only / frozen-
        partner per plan/05 §3.5):

        1. Build the per-step ``next_values`` bootstrap array (Pardo 2017
           §4): ``V(s_truncated_final_t)`` at truncation boundaries (cached
           at observe time), ``0.0`` at terminations, ``values[t + 1]``
           otherwise; rollout-tail bootstrap at ``t = T - 1`` is the same
           rule applied to ``self._last_next_obs``.
        2. :func:`compute_gae` → raw advantages ``A_t``.
        3. ``returns_t = A_t + V(s_t)`` (the critic regression target).
        4. :func:`normalize_advantages` → normalized ``A_t`` for the
           PPO importance-weighted surrogate loss.
        5. For each of ``n_epochs`` epochs, shuffle indices and walk
           ``actor_num_mini_batch`` minibatches. For each minibatch, build
           the 9-tuple :meth:`HAPPO.update` expects and call it directly
           (skipping :meth:`HAPPO.train`'s buffer-machinery wrapper).
        6. Train the critic on the same minibatch with MSE on returns.
        7. Clear the buffer.

        The ``factor_batch`` argument :meth:`HAPPO.update` requires for
        HAPPO's sequential multi-agent update is set to all ``1.0`` — the
        ego is the only updating agent (ADR-009 §Decision: black-box AHT,
        partner is frozen), so previous-agent importance weights are
        identically 1.
        """
        if not self._buf_rewards:
            return

        rewards = np.asarray(self._buf_rewards, dtype=np.float32)
        values = np.asarray(self._buf_values, dtype=np.float32)
        terminated = np.asarray(self._buf_terminated, dtype=np.bool_)
        truncated = np.asarray(self._buf_truncated, dtype=np.bool_)
        truncation_bootstraps = np.asarray(self._buf_truncation_bootstraps, dtype=np.float32)
        next_values = self._build_next_values(values, terminated, truncated, truncation_bootstraps)
        episode_boundaries = terminated | truncated
        raw_advantages = compute_gae(
            rewards,
            values,
            next_values,
            episode_boundaries,
            gamma=self._gamma,
            gae_lambda=self._gae_lambda,
        )
        returns = (raw_advantages + values).astype(np.float32)
        norm_advantages = normalize_advantages(raw_advantages)

        obs_arr = np.stack(self._buf_obs).astype(np.float32)
        actions_arr = np.stack(self._buf_actions).astype(np.float32)
        log_probs_arr = np.stack(self._buf_log_probs).astype(np.float32)

        n_steps = rewards.shape[0]
        n_minibatches = max(1, n_steps // self._batch_size)
        mb_size = max(1, n_steps // n_minibatches)

        for _epoch in range(self._n_epochs):
            perm = self._minibatch_rng.permutation(n_steps)
            for mb in range(n_minibatches):
                start = mb * mb_size
                end = start + mb_size if mb < n_minibatches - 1 else n_steps
                mb_idx = perm[start:end]
                mb_obs = obs_arr[mb_idx]
                mb_actions = actions_arr[mb_idx]
                mb_old_log_probs = log_probs_arr[mb_idx]
                # HAPPO expects ``adv_targ`` shape ``(mb, 1)``. log-probs
                # for diag-Gaussian are already per-action-dim; HAPPO's
                # action_aggregation="prod" reduces them inside update().
                mb_advantages = norm_advantages[mb_idx].reshape(-1, 1).astype(np.float32)
                this_mb_size = mb_idx.shape[0]
                mb_rnn_states = np.zeros(
                    (
                        this_mb_size,
                        self._harl_args["recurrent_n"],
                        self._harl_args["hidden_sizes"][-1],
                    ),
                    dtype=np.float32,
                )
                mb_masks = np.ones((this_mb_size, 1), dtype=np.float32)
                mb_active_masks = np.ones((this_mb_size, 1), dtype=np.float32)
                # ``factor_batch`` = 1 because the ego is the only updating
                # agent (ADR-009 §Decision); see plan/05 §3.5.
                mb_factor = np.ones((this_mb_size, 1), dtype=np.float32)
                sample = (
                    mb_obs,
                    mb_rnn_states,
                    mb_actions,
                    mb_masks,
                    mb_active_masks,
                    mb_old_log_probs,
                    mb_advantages,
                    None,
                    mb_factor,
                )
                self._happo.update(sample)

                # Critic update: simple MSE regression on returns.
                mb_returns = torch.from_numpy(returns[mb_idx]).to(self._device).float()
                mb_obs_t = torch.from_numpy(mb_obs).to(self._device).float()
                value_pred = self._critic(mb_obs_t).squeeze(-1)
                critic_loss = ((value_pred - mb_returns) ** 2).mean()
                self._critic_optim.zero_grad()
                critic_loss.backward()
                self._critic_optim.step()

        # Reset rollout buffer.
        self._buf_obs.clear()
        self._buf_actions.clear()
        self._buf_log_probs.clear()
        self._buf_values.clear()
        self._buf_rewards.clear()
        self._buf_terminated.clear()
        self._buf_truncated.clear()
        self._buf_truncation_bootstraps.clear()
        self._last_next_obs = None

    def state_dict(self) -> dict[str, Any]:
        """Return a flat dict with actor + critic + optimizer states (ADR-002 §Decisions; T4b.12).

        The returned dict is what :func:`concerto.training.checkpoints.save_checkpoint`
        ``torch.save``s to the ``.pt`` artefact. Loading is the dual:
        each component reads its own sub-key off the loaded dict via
        :meth:`load_state_dict`.
        """
        return {
            "actor": self._happo.actor.state_dict(),
            "actor_optim": self._happo.actor_optimizer.state_dict(),
            "critic": self._critic.state_dict(),
            "critic_optim": self._critic_optim.state_dict(),
        }

__init__

__init__(
    *,
    cfg: EgoAHTConfig,
    ego_uid: str,
    ego_obs_space: Box,
    ego_act_space: Box,
    partner: PartnerLike,
    device: device | None = None,
) -> None

Build the trainer (M4b-8a; ADR-002 §Decisions; plan/05 §3.5).

Prefer :meth:from_config over direct construction — it derives ego_obs_space and ego_act_space from the env so the call site cannot drift from the env's actual shapes.

The partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3) runs first, before any tensor allocation: a partner with any requires_grad=True parameter (and no :class:~chamber.partners.interface.PartnerBase shield blocking named_parameters access) aborts the construction with a :class:ValueError that names the offending parameter path.

Parameters:

  • cfg (EgoAHTConfig) –

    Validated :class:~concerto.training.config.EgoAHTConfig. The trainer reads cfg.seed, cfg.happo.*.

  • ego_uid (str) –

    The env-side uid the ego acts on (matches cfg.env.agent_uids[0]).

  • ego_obs_space (Box) –

    The Box space for the ego's flat state vector. Synthesized by :meth:from_config from the env's nested observation_space.

  • ego_act_space (Box) –

    The Box space for the ego's action vector. Read directly from env.action_space[ego_uid].

  • partner (PartnerLike) –

    The frozen partner instance the ego will be trained against. Validated via :func:_assert_partner_is_frozen before any tensor allocation. The trainer does not retain a reference — the partner is consumed by the training loop in :func:concerto.training.ego_aht.train, not by the trainer itself.

  • device (device | None, default: None ) –

    Torch device. Defaults to CPU when called directly; :meth:from_config resolves it from :attr:RuntimeConfig.device (auto / cpu / cuda / mps) via :func:_resolve_device and applies the determinism flag from :attr:RuntimeConfig.deterministic_torch on CPU before constructing the trainer (ADR-002 §Decisions; plan/08 §4).

Raises:

  • ValueError

    If partner exposes a torch parameter with requires_grad=True (ADR-009 §Consequences).

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
def __init__(
    self,
    *,
    cfg: EgoAHTConfig,
    ego_uid: str,
    ego_obs_space: gym.spaces.Box,  # type: ignore[type-arg]
    ego_act_space: gym.spaces.Box,  # type: ignore[type-arg]
    partner: PartnerLike,
    device: torch.device | None = None,
) -> None:
    """Build the trainer (M4b-8a; ADR-002 §Decisions; plan/05 §3.5).

    Prefer :meth:`from_config` over direct construction — it derives
    ``ego_obs_space`` and ``ego_act_space`` from the env so the call
    site cannot drift from the env's actual shapes.

    The partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3)
    runs first, before any tensor allocation: a partner with any
    ``requires_grad=True`` parameter (and no
    :class:`~chamber.partners.interface.PartnerBase` shield blocking
    ``named_parameters`` access) aborts the construction with a
    :class:`ValueError` that names the offending parameter path.

    Args:
        cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
            The trainer reads ``cfg.seed``, ``cfg.happo.*``.
        ego_uid: The env-side uid the ego acts on (matches
            ``cfg.env.agent_uids[0]``).
        ego_obs_space: The Box space for the ego's flat state vector.
            Synthesized by :meth:`from_config` from the env's nested
            observation_space.
        ego_act_space: The Box space for the ego's action vector.
            Read directly from ``env.action_space[ego_uid]``.
        partner: The frozen partner instance the ego will be trained
            against. Validated via :func:`_assert_partner_is_frozen`
            before any tensor allocation. The trainer does not retain
            a reference — the partner is consumed by the training
            loop in :func:`concerto.training.ego_aht.train`, not by
            the trainer itself.
        device: Torch device. Defaults to CPU when called directly;
            :meth:`from_config` resolves it from
            :attr:`RuntimeConfig.device` (auto / cpu / cuda / mps)
            via :func:`_resolve_device` and applies the determinism
            flag from :attr:`RuntimeConfig.deterministic_torch` on
            CPU before constructing the trainer (ADR-002 §Decisions;
            plan/08 §4).

    Raises:
        ValueError: If ``partner`` exposes a torch parameter with
            ``requires_grad=True`` (ADR-009 §Consequences).
    """
    # ADR-009 §Consequences + plan/05 §6 #3: partner-freeze gate is
    # the first construction step, before any tensor allocation.
    _assert_partner_is_frozen(partner)
    self._device = device or torch.device("cpu")
    self._ego_uid = ego_uid
    self._gamma = cfg.happo.gamma
    self._gae_lambda = cfg.happo.gae_lambda
    self._n_epochs = cfg.happo.n_epochs
    self._batch_size = cfg.happo.batch_size
    self._rollout_length = cfg.happo.rollout_length
    # Seed torch's global RNG. ``derive_substream`` is the project-wide
    # P6 reproducibility entry-point; we cast its 64-bit draw down to
    # int32 because ``torch.manual_seed`` documents 64-bit input but
    # has historically failed silently on values > 2**63 on macOS.
    torch_seed_int = int(
        derive_substream(_TORCH_SUBSTREAM, root_seed=cfg.seed)
        .default_rng()
        .integers(0, 2**31 - 1)
    )
    torch.manual_seed(torch_seed_int)
    self._minibatch_rng: np.random.Generator = derive_substream(
        _MINIBATCH_SUBSTREAM, root_seed=cfg.seed
    ).default_rng()

    harl_args = _build_harl_args(happo=cfg.happo)
    self._harl_args = harl_args
    self._happo: HAPPO = HAPPO(harl_args, ego_obs_space, ego_act_space, self._device)
    obs_dim = int(ego_obs_space.shape[0])
    self._obs_dim = obs_dim
    self._act_dim = int(ego_act_space.shape[0])
    self._critic = _EgoCritic(obs_dim=obs_dim, hidden_dim=cfg.happo.hidden_dim).to(self._device)
    self._critic_optim = torch.optim.Adam(self._critic.parameters(), lr=cfg.happo.lr)

    # Rollout buffer (cleared after each update).
    self._buf_obs: list[NDArray[np.float32]] = []
    self._buf_actions: list[NDArray[np.float32]] = []
    self._buf_log_probs: list[NDArray[np.float32]] = []
    self._buf_values: list[float] = []
    self._buf_rewards: list[float] = []
    # ADR-002 §Decisions + Pardo 2017 / issue #62: terminated and
    # truncated are stored *separately* so :meth:`update` can build a
    # per-step bootstrap target. Treating truncation as termination
    # zeros V(s_next) at every time-limit boundary, biasing advantages
    # and slowing learning 2-3x on time-limit-only envs.
    self._buf_terminated: list[bool] = []
    self._buf_truncated: list[bool] = []
    # V(s_truncated_final) computed via the critic at observe time
    # when a truncation occurs. Indexed parallel to the buffer; unused
    # at non-truncation steps but populated with 0.0 so the array
    # shapes stay aligned.
    self._buf_truncation_bootstraps: list[float] = []
    # Pre-step cache (filled by :meth:`act`, consumed by :meth:`observe`).
    self._pending: _PendingStep | None = None
    # Last next-obs (cached by :meth:`observe`) for the critic
    # bootstrap at update time.
    self._last_next_obs: NDArray[np.float32] | None = None

act

act(
    obs: Mapping[str, Any], *, deterministic: bool = False
) -> NDArray[np.floating]

Sample the ego action; cache log-prob + value for :meth:observe (ADR-002 §Decisions).

Caches (obs, action, log_prob, value) in private state so the next call to :meth:observe can insert them into the rollout buffer with the actual reward + done flag. The pairing relies on the contract in :func:concerto.training.ego_aht.train (each act(obs) is followed by exactly one observe(obs_next, reward, done) before the next act).

Parameters:

  • obs (Mapping[str, Any]) –

    Env obs dict. The ego's flat state vector is read from obs["agent"][ego_uid]["state"].

  • deterministic (bool, default: False ) –

    When True, returns the distribution mode instead of a sample. Used by evaluation paths (M4b-8b's per-checkpoint deterministic eval).

Returns:

  • NDArray[floating]

    The ego action as a length-act_dim float32 array, ready

  • NDArray[floating]

    to be packed into the env's dict-action.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
def act(
    self,
    obs: Mapping[str, Any],
    *,
    deterministic: bool = False,
) -> NDArray[np.floating]:
    """Sample the ego action; cache log-prob + value for :meth:`observe` (ADR-002 §Decisions).

    Caches ``(obs, action, log_prob, value)`` in private state so the
    next call to :meth:`observe` can insert them into the rollout
    buffer with the actual reward + done flag. The pairing relies on
    the contract in :func:`concerto.training.ego_aht.train` (each
    ``act(obs)`` is followed by exactly one
    ``observe(obs_next, reward, done)`` before the next ``act``).

    Args:
        obs: Env obs dict. The ego's flat state vector is read from
            ``obs["agent"][ego_uid]["state"]``.
        deterministic: When ``True``, returns the distribution mode
            instead of a sample. Used by evaluation paths
            (M4b-8b's per-checkpoint deterministic eval).

    Returns:
        The ego action as a length-``act_dim`` float32 array, ready
        to be packed into the env's dict-action.
    """
    flat = _flat_ego_obs(obs, self._ego_uid)
    # HARL expects shape ``(batch, obs_dim)`` and a positional
    # ``(rnn_states, masks)`` even when the policy is feedforward.
    # rnn_states is ignored downstream (use_recurrent_policy=False);
    # masks=1.0 means "do not reset the RNN state" — also a no-op
    # without an RNN. We pass shape-correct placeholders so HARL's
    # internal ``check`` calls accept them.
    obs_t = flat.reshape(1, -1)
    rnn_states = np.zeros(
        (1, self._harl_args["recurrent_n"], self._harl_args["hidden_sizes"][-1]),
        dtype=np.float32,
    )
    masks = np.ones((1, 1), dtype=np.float32)
    with torch.no_grad():
        action_t, log_prob_t, _ = self._happo.get_actions(
            obs_t, rnn_states, masks, None, deterministic
        )
        value_t = self._critic(torch.from_numpy(flat).to(self._device).unsqueeze(0))
    action = action_t.detach().cpu().numpy().squeeze(0).astype(np.float32)
    log_prob = log_prob_t.detach().cpu().numpy().squeeze(0).astype(np.float32)
    value = float(value_t.detach().cpu().numpy().squeeze())
    self._pending = _PendingStep(obs=flat, action=action, log_prob=log_prob, value=value)
    return action

from_config classmethod

from_config(
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    ego_uid: str,
) -> EgoPPOTrainer

Build from :class:EgoAHTConfig + a concrete env + the frozen partner.

ADR-002 §Decisions / ADR-009 §Consequences / plan/05 §3.5 + §6 #3.

This is the :class:~concerto.training.ego_aht.TrainerFactory callable that :func:concerto.training.ego_aht.train plugs in via dependency injection. The factory reads the ego's observation + action shapes off the env directly so the trainer's network widths cannot drift from what the env actually emits.

The partner-freeze gate (:func:_assert_partner_is_frozen) runs first, before any tensor allocation, so a bad partner aborts cheaply (plan/05 §6 #3).

Per plan/05 §3.5 the bridge from :class:EgoAHTConfig to HARL's args dict lives in this CONCERTO-side module rather than in the fork's :meth:EgoAHTHAPPO.from_config (which intentionally stays :class:NotImplementedError — see scripts/harl-fork-patches/v0.1.0-aht/ README for the role-split rationale).

Parameters:

  • cfg (EgoAHTConfig) –

    Validated :class:~concerto.training.config.EgoAHTConfig.

  • env (EnvLike) –

    Concrete env satisfying :class:EnvLike. Must expose observation_space["agent"][ego_uid]["state"] as a :class:gym.spaces.Box and action_space[ego_uid] as a :class:gym.spaces.Box.

  • partner (PartnerLike) –

    Frozen partner instance the ego will train against. Validated by :func:_assert_partner_is_frozen as the first construction step (ADR-009 §Consequences).

  • ego_uid (str) –

    The env-side uid the ego acts on.

Returns:

Raises:

  • ValueError

    If partner exposes a torch parameter with requires_grad=True (ADR-009 §Consequences).

  • TypeError

    If the env's ego observation or action space is not a :class:gym.spaces.Box.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
@classmethod
def from_config(
    cls,
    cfg: EgoAHTConfig,
    *,
    env: EnvLike,
    partner: PartnerLike,
    ego_uid: str,
) -> EgoPPOTrainer:
    """Build from :class:`EgoAHTConfig` + a concrete env + the frozen partner.

    ADR-002 §Decisions / ADR-009 §Consequences / plan/05 §3.5 + §6 #3.

    This is the :class:`~concerto.training.ego_aht.TrainerFactory`
    callable that :func:`concerto.training.ego_aht.train` plugs in via
    dependency injection. The factory reads the ego's observation +
    action shapes off the env directly so the trainer's network
    widths cannot drift from what the env actually emits.

    The partner-freeze gate (:func:`_assert_partner_is_frozen`) runs
    first, before any tensor allocation, so a bad partner aborts
    cheaply (plan/05 §6 #3).

    Per plan/05 §3.5 the bridge from :class:`EgoAHTConfig` to HARL's
    args dict lives in this CONCERTO-side module rather than in the
    fork's :meth:`EgoAHTHAPPO.from_config` (which intentionally
    stays :class:`NotImplementedError` — see
    ``scripts/harl-fork-patches/v0.1.0-aht/`` README for the
    role-split rationale).

    Args:
        cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`.
        env: Concrete env satisfying :class:`EnvLike`. Must expose
            ``observation_space["agent"][ego_uid]["state"]`` as a
            :class:`gym.spaces.Box` and ``action_space[ego_uid]``
            as a :class:`gym.spaces.Box`.
        partner: Frozen partner instance the ego will train against.
            Validated by :func:`_assert_partner_is_frozen` as the
            first construction step (ADR-009 §Consequences).
        ego_uid: The env-side uid the ego acts on.

    Returns:
        A constructed :class:`EgoPPOTrainer` ready for
        :func:`concerto.training.ego_aht.train`.

    Raises:
        ValueError: If ``partner`` exposes a torch parameter with
            ``requires_grad=True`` (ADR-009 §Consequences).
        TypeError: If the env's ego observation or action space is
            not a :class:`gym.spaces.Box`.
    """
    # ADR-009 §Consequences / plan/05 §6 #3: partner-freeze gate
    # runs first, before any env-space introspection or tensor
    # allocation. A bad partner aborts the construction cheaply.
    _assert_partner_is_frozen(partner)
    env_obs_space = env.observation_space  # type: ignore[attr-defined]
    ego_state_space = env_obs_space["agent"][ego_uid]["state"]
    if not isinstance(ego_state_space, gym.spaces.Box):
        raise TypeError(
            "EgoPPOTrainer.from_config requires "
            f"env.observation_space['agent'][{ego_uid!r}]['state'] to be "
            f"a gym.spaces.Box; got {type(ego_state_space).__name__}."
        )
    env_act_space = env.action_space  # type: ignore[attr-defined]
    ego_act_space = env_act_space[ego_uid]
    if not isinstance(ego_act_space, gym.spaces.Box):
        raise TypeError(
            "EgoPPOTrainer.from_config requires "
            f"env.action_space[{ego_uid!r}] to be a gym.spaces.Box; "
            f"got {type(ego_act_space).__name__}."
        )
    device = _resolve_device(cfg.runtime.device)
    if device.type == "cpu" and cfg.runtime.deterministic_torch:
        # CPU determinism is the project's P6 contract (plan/08 §4).
        # Skip on CUDA/MPS where the contract is relaxed because
        # cuDNN / Metal kernels are not bit-deterministic even with
        # this flag.
        _set_torch_determinism(True)
    return cls(
        cfg=cfg,
        ego_uid=ego_uid,
        ego_obs_space=ego_state_space,
        ego_act_space=ego_act_space,
        partner=partner,
        device=device,
    )

observe

observe(
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None

Buffer (pending, reward, terminated, truncated) (ADR-002 §Decisions; Pardo 2017).

The trainer keeps a single (obs, action, log_prob, value) cache — set by :meth:act — and pairs it with the (reward, done, truncated) from :meth:observe. The post-step obs is also remembered as the bootstrap target for :meth:update.

done keeps the historical "this step ended the episode" meaning (terminated OR truncated). truncated is the new keyword-only flag that distinguishes time-limit truncation from true termination. The trainer derives terminated = done and not truncated internally; the GAE bootstrap then treats truncation correctly per Pardo et al. 2017 (V(s_truncated_final) flows through, not 0) — see :func:compute_gae + project issue #62 for the root-cause writeup.

At truncation boundaries this method calls the critic on the post-step obs once to capture V(s_truncated_final); that value is later threaded into :func:compute_gae as the per-step bootstrap target. At non-truncation steps the recorded value is 0.0 and unused.

Parameters:

  • obs (Mapping[str, Any]) –

    Post-step env obs (the obs the env returned after the action passed to the most recent :meth:act call).

  • reward (float) –

    Per-step ego reward (the env's shared scalar).

  • done (bool) –

    True if the env terminated or truncated this step (i.e. the loop should reset the env for the next step).

  • truncated (bool, default: False ) –

    True iff this step ended the episode via time-limit truncation (env returned truncated=True). Default False is the conservative legacy behavior — callers that don't yet distinguish truncation from termination see the historical "treat as terminal" semantics.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
def observe(
    self,
    obs: Mapping[str, Any],
    reward: float,
    done: bool,
    *,
    truncated: bool = False,
) -> None:
    """Buffer ``(pending, reward, terminated, truncated)`` (ADR-002 §Decisions; Pardo 2017).

    The trainer keeps a single ``(obs, action, log_prob, value)``
    cache — set by :meth:`act` — and pairs it with the
    ``(reward, done, truncated)`` from :meth:`observe`. The post-step
    ``obs`` is also remembered as the bootstrap target for
    :meth:`update`.

    ``done`` keeps the historical "this step ended the episode" meaning
    (terminated OR truncated). ``truncated`` is the new keyword-only
    flag that distinguishes time-limit truncation from true
    termination. The trainer derives
    ``terminated = done and not truncated`` internally; the GAE
    bootstrap then treats truncation correctly per Pardo et al. 2017
    (V(s_truncated_final) flows through, not 0) — see :func:`compute_gae`
    + project issue #62 for the root-cause writeup.

    At truncation boundaries this method calls the critic on the
    post-step obs once to capture V(s_truncated_final); that value
    is later threaded into :func:`compute_gae` as the per-step
    bootstrap target. At non-truncation steps the recorded value is
    ``0.0`` and unused.

    Args:
        obs: Post-step env obs (the obs the env returned after the
            action passed to the most recent :meth:`act` call).
        reward: Per-step ego reward (the env's shared scalar).
        done: ``True`` if the env terminated *or* truncated this step
            (i.e. the loop should reset the env for the next step).
        truncated: ``True`` iff this step ended the episode via
            time-limit truncation (env returned ``truncated=True``).
            Default ``False`` is the conservative legacy behavior —
            callers that don't yet distinguish truncation from
            termination see the historical "treat as terminal"
            semantics.
    """
    pending = self._pending
    if pending is None:
        # No prior :meth:`act` — defensive no-op so the trainer can
        # safely ignore a stray ``observe`` (eg. a pre-rollout
        # bookkeeping call).
        return
    terminated = bool(done) and not bool(truncated)
    truncated_b = bool(truncated)
    self._buf_obs.append(pending.obs)
    self._buf_actions.append(pending.action)
    self._buf_log_probs.append(pending.log_prob)
    self._buf_values.append(pending.value)
    self._buf_rewards.append(float(reward))
    self._buf_terminated.append(terminated)
    self._buf_truncated.append(truncated_b)
    next_flat = _flat_ego_obs(obs, self._ego_uid)
    # Capture V(s_truncated_final) eagerly at the boundary. Doing it
    # here (rather than at update time) is the cleanest place because
    # ``next_flat`` is the actual post-step obs of the truncated
    # episode; by update time the training loop has already reset
    # the env, and the per-step "next obs in buffer" at index
    # ``t + 1`` is from the *new* episode. The cost is one extra
    # critic forward pass per truncation boundary.
    if truncated_b and not terminated:
        with torch.no_grad():
            v_trunc = float(
                self._critic(torch.from_numpy(next_flat).to(self._device).unsqueeze(0))
                .squeeze()
                .item()
            )
        self._buf_truncation_bootstraps.append(v_trunc)
    else:
        self._buf_truncation_bootstraps.append(0.0)
    self._last_next_obs = next_flat
    # Clear the pre-step cache so a stray double-observe cannot
    # double-insert the same tuple.
    self._pending = None

state_dict

state_dict() -> dict[str, Any]

Return a flat dict with actor + critic + optimizer states (ADR-002 §Decisions; T4b.12).

The returned dict is what :func:concerto.training.checkpoints.save_checkpoint torch.saves to the .pt artefact. Loading is the dual: each component reads its own sub-key off the loaded dict via :meth:load_state_dict.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
def state_dict(self) -> dict[str, Any]:
    """Return a flat dict with actor + critic + optimizer states (ADR-002 §Decisions; T4b.12).

    The returned dict is what :func:`concerto.training.checkpoints.save_checkpoint`
    ``torch.save``s to the ``.pt`` artefact. Loading is the dual:
    each component reads its own sub-key off the loaded dict via
    :meth:`load_state_dict`.
    """
    return {
        "actor": self._happo.actor.state_dict(),
        "actor_optim": self._happo.actor_optimizer.state_dict(),
        "critic": self._critic.state_dict(),
        "critic_optim": self._critic_optim.state_dict(),
    }

update

update() -> None

Compute GAE; run PPO + critic updates; clear the buffer (ADR-002 §Decisions; Pardo 2017).

Order of operations (Schulman 2017, lifted to ego-only / frozen- partner per plan/05 §3.5):

  1. Build the per-step next_values bootstrap array (Pardo 2017 §4): V(s_truncated_final_t) at truncation boundaries (cached at observe time), 0.0 at terminations, values[t + 1] otherwise; rollout-tail bootstrap at t = T - 1 is the same rule applied to self._last_next_obs.
  2. :func:compute_gae → raw advantages A_t.
  3. returns_t = A_t + V(s_t) (the critic regression target).
  4. :func:normalize_advantages → normalized A_t for the PPO importance-weighted surrogate loss.
  5. For each of n_epochs epochs, shuffle indices and walk actor_num_mini_batch minibatches. For each minibatch, build the 9-tuple :meth:HAPPO.update expects and call it directly (skipping :meth:HAPPO.train's buffer-machinery wrapper).
  6. Train the critic on the same minibatch with MSE on returns.
  7. Clear the buffer.

The factor_batch argument :meth:HAPPO.update requires for HAPPO's sequential multi-agent update is set to all 1.0 — the ego is the only updating agent (ADR-009 §Decision: black-box AHT, partner is frozen), so previous-agent importance weights are identically 1.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
def update(self) -> None:
    """Compute GAE; run PPO + critic updates; clear the buffer (ADR-002 §Decisions; Pardo 2017).

    Order of operations (Schulman 2017, lifted to ego-only / frozen-
    partner per plan/05 §3.5):

    1. Build the per-step ``next_values`` bootstrap array (Pardo 2017
       §4): ``V(s_truncated_final_t)`` at truncation boundaries (cached
       at observe time), ``0.0`` at terminations, ``values[t + 1]``
       otherwise; rollout-tail bootstrap at ``t = T - 1`` is the same
       rule applied to ``self._last_next_obs``.
    2. :func:`compute_gae` → raw advantages ``A_t``.
    3. ``returns_t = A_t + V(s_t)`` (the critic regression target).
    4. :func:`normalize_advantages` → normalized ``A_t`` for the
       PPO importance-weighted surrogate loss.
    5. For each of ``n_epochs`` epochs, shuffle indices and walk
       ``actor_num_mini_batch`` minibatches. For each minibatch, build
       the 9-tuple :meth:`HAPPO.update` expects and call it directly
       (skipping :meth:`HAPPO.train`'s buffer-machinery wrapper).
    6. Train the critic on the same minibatch with MSE on returns.
    7. Clear the buffer.

    The ``factor_batch`` argument :meth:`HAPPO.update` requires for
    HAPPO's sequential multi-agent update is set to all ``1.0`` — the
    ego is the only updating agent (ADR-009 §Decision: black-box AHT,
    partner is frozen), so previous-agent importance weights are
    identically 1.
    """
    if not self._buf_rewards:
        return

    rewards = np.asarray(self._buf_rewards, dtype=np.float32)
    values = np.asarray(self._buf_values, dtype=np.float32)
    terminated = np.asarray(self._buf_terminated, dtype=np.bool_)
    truncated = np.asarray(self._buf_truncated, dtype=np.bool_)
    truncation_bootstraps = np.asarray(self._buf_truncation_bootstraps, dtype=np.float32)
    next_values = self._build_next_values(values, terminated, truncated, truncation_bootstraps)
    episode_boundaries = terminated | truncated
    raw_advantages = compute_gae(
        rewards,
        values,
        next_values,
        episode_boundaries,
        gamma=self._gamma,
        gae_lambda=self._gae_lambda,
    )
    returns = (raw_advantages + values).astype(np.float32)
    norm_advantages = normalize_advantages(raw_advantages)

    obs_arr = np.stack(self._buf_obs).astype(np.float32)
    actions_arr = np.stack(self._buf_actions).astype(np.float32)
    log_probs_arr = np.stack(self._buf_log_probs).astype(np.float32)

    n_steps = rewards.shape[0]
    n_minibatches = max(1, n_steps // self._batch_size)
    mb_size = max(1, n_steps // n_minibatches)

    for _epoch in range(self._n_epochs):
        perm = self._minibatch_rng.permutation(n_steps)
        for mb in range(n_minibatches):
            start = mb * mb_size
            end = start + mb_size if mb < n_minibatches - 1 else n_steps
            mb_idx = perm[start:end]
            mb_obs = obs_arr[mb_idx]
            mb_actions = actions_arr[mb_idx]
            mb_old_log_probs = log_probs_arr[mb_idx]
            # HAPPO expects ``adv_targ`` shape ``(mb, 1)``. log-probs
            # for diag-Gaussian are already per-action-dim; HAPPO's
            # action_aggregation="prod" reduces them inside update().
            mb_advantages = norm_advantages[mb_idx].reshape(-1, 1).astype(np.float32)
            this_mb_size = mb_idx.shape[0]
            mb_rnn_states = np.zeros(
                (
                    this_mb_size,
                    self._harl_args["recurrent_n"],
                    self._harl_args["hidden_sizes"][-1],
                ),
                dtype=np.float32,
            )
            mb_masks = np.ones((this_mb_size, 1), dtype=np.float32)
            mb_active_masks = np.ones((this_mb_size, 1), dtype=np.float32)
            # ``factor_batch`` = 1 because the ego is the only updating
            # agent (ADR-009 §Decision); see plan/05 §3.5.
            mb_factor = np.ones((this_mb_size, 1), dtype=np.float32)
            sample = (
                mb_obs,
                mb_rnn_states,
                mb_actions,
                mb_masks,
                mb_active_masks,
                mb_old_log_probs,
                mb_advantages,
                None,
                mb_factor,
            )
            self._happo.update(sample)

            # Critic update: simple MSE regression on returns.
            mb_returns = torch.from_numpy(returns[mb_idx]).to(self._device).float()
            mb_obs_t = torch.from_numpy(mb_obs).to(self._device).float()
            value_pred = self._critic(mb_obs_t).squeeze(-1)
            critic_loss = ((value_pred - mb_returns) ** 2).mean()
            self._critic_optim.zero_grad()
            critic_loss.backward()
            self._critic_optim.step()

    # Reset rollout buffer.
    self._buf_obs.clear()
    self._buf_actions.clear()
    self._buf_log_probs.clear()
    self._buf_values.clear()
    self._buf_rewards.clear()
    self._buf_terminated.clear()
    self._buf_truncated.clear()
    self._buf_truncation_bootstraps.clear()
    self._last_next_obs = None

compute_gae

compute_gae(
    rewards: NDArray[float32],
    values: NDArray[float32],
    next_values: NDArray[float32],
    episode_boundaries: NDArray[bool_],
    *,
    gamma: float,
    gae_lambda: float,
) -> NDArray[np.float32]

GAE with per-step bootstrap (Schulman 2016 §6.3; Pardo 2017; ADR-002 §Decisions).

Pardo et al. 2017 ("Time Limits in Reinforcement Learning") motivates the per-step bootstrap target shape: time-limit truncation is not a terminal state — the policy should still receive the value of the state it would have transitioned to. Treating truncation as termination zeros that bootstrap and biases advantages systematically, slowing learning materially (see project issue #62).

The recurrence is::

delta_t = r_t + gamma * next_values[t]     - V(s_t)
A_t     = delta_t + gamma * lambda * (1 - boundary_t) * A_{t+1}
A_T     = delta_T

where next_values[t] is the caller-supplied bootstrap target — in production it is:

  • values[t + 1] for non-boundary steps (typical mid-rollout).
  • V(s_truncated_final_t) at truncation boundaries (the value of the actual post-step obs of the truncated episode, computed by the caller via the critic).
  • 0.0 at termination boundaries (true terminal state — no value after).
  • The rollout-tail bootstrap (V(s_T+1) if the rollout ends mid-episode; 0.0 if it ends on an episode boundary) at the final timestep t = T - 1.

episode_boundaries[t] is True iff step t ended an episode (terminated or truncated). It zeros GAE propagation across the boundary — advantages from a different episode (steps > t after a boundary) must not contribute to A_t.

plan/05 §5 + the property test tests/property/test_advantage_decomposition.py verify this formula matches a freshly hand-rolled reference to within 1e-6 on synthetic trajectories that include both termination and truncation boundaries.

Parameters:

  • rewards (NDArray[float32]) –

    Per-step ego reward. Shape (T,).

  • values (NDArray[float32]) –

    Per-step critic value estimate V(s_t). Shape (T,).

  • next_values (NDArray[float32]) –

    Per-step bootstrap target. Shape (T,). Caller responsibility (see module docstring + issue #62 root-cause writeup for why this split matters).

  • episode_boundaries (NDArray[bool_]) –

    True at steps that ended an episode (terminated OR truncated). Shape (T,). Zeros GAE propagation across the boundary.

  • gamma (float) –

    Discount factor in [0, 1].

  • gae_lambda (float) –

    GAE exponential smoothing in [0, 1].

Returns:

  • NDArray[float32]

    Advantages A_t, same shape and dtype (float32) as rewards.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
def compute_gae(
    rewards: NDArray[np.float32],
    values: NDArray[np.float32],
    next_values: NDArray[np.float32],
    episode_boundaries: NDArray[np.bool_],
    *,
    gamma: float,
    gae_lambda: float,
) -> NDArray[np.float32]:
    """GAE with per-step bootstrap (Schulman 2016 §6.3; Pardo 2017; ADR-002 §Decisions).

    Pardo et al. 2017 ("Time Limits in Reinforcement Learning") motivates
    the per-step bootstrap target shape: time-limit truncation is **not**
    a terminal state — the policy should still receive the value of the
    state it would have transitioned to. Treating truncation as
    termination zeros that bootstrap and biases advantages
    systematically, slowing learning materially (see project issue #62).

    The recurrence is::

        delta_t = r_t + gamma * next_values[t]     - V(s_t)
        A_t     = delta_t + gamma * lambda * (1 - boundary_t) * A_{t+1}
        A_T     = delta_T

    where ``next_values[t]`` is the *caller-supplied* bootstrap target —
    in production it is:

    - ``values[t + 1]`` for non-boundary steps (typical mid-rollout).
    - ``V(s_truncated_final_t)`` at truncation boundaries (the value of
      the actual post-step obs of the truncated episode, computed by the
      caller via the critic).
    - ``0.0`` at termination boundaries (true terminal state — no value
      after).
    - The rollout-tail bootstrap (``V(s_T+1)`` if the rollout ends
      mid-episode; ``0.0`` if it ends on an episode boundary) at the
      final timestep ``t = T - 1``.

    ``episode_boundaries[t]`` is ``True`` iff step ``t`` ended an episode
    (terminated *or* truncated). It zeros GAE propagation across the
    boundary — advantages from a *different* episode (steps ``> t`` after
    a boundary) must not contribute to ``A_t``.

    plan/05 §5 + the property test
    ``tests/property/test_advantage_decomposition.py`` verify this formula
    matches a freshly hand-rolled reference to within 1e-6 on synthetic
    trajectories that include both termination and truncation boundaries.

    Args:
        rewards: Per-step ego reward. Shape ``(T,)``.
        values: Per-step critic value estimate ``V(s_t)``. Shape ``(T,)``.
        next_values: Per-step bootstrap target. Shape ``(T,)``. Caller
            responsibility (see module docstring + issue #62 root-cause
            writeup for why this split matters).
        episode_boundaries: ``True`` at steps that ended an episode
            (terminated OR truncated). Shape ``(T,)``. Zeros GAE
            propagation across the boundary.
        gamma: Discount factor in ``[0, 1]``.
        gae_lambda: GAE exponential smoothing in ``[0, 1]``.

    Returns:
        Advantages ``A_t``, same shape and dtype (float32) as ``rewards``.
    """
    n = rewards.shape[0]
    advantages = np.zeros(n, dtype=np.float32)
    # Accumulate ``gae`` in Python float64 — keeping intermediate
    # advantages at higher precision avoids the 1e-6 rounding drift that
    # float32 in-loop accumulation accrues over long horizons. The final
    # store to ``advantages[t]`` casts back to the declared float32
    # storage. The property test against ``reference_gae`` relies on
    # both implementations using the same accumulation precision.
    gae = 0.0
    for t in reversed(range(n)):
        boundary = bool(episode_boundaries[t])
        delta = float(rewards[t]) + gamma * float(next_values[t]) - float(values[t])
        # Zero GAE propagation across episode boundaries: advantages from
        # the next episode must not flow back into this episode (Pardo 2017
        # §4; issue #62 root-cause writeup).
        gae = delta + gamma * gae_lambda * (0.0 if boundary else 1.0) * gae
        advantages[t] = np.float32(gae)
    return advantages

normalize_advantages

normalize_advantages(
    advantages: NDArray[float32],
) -> NDArray[np.float32]

Mean/std-normalize advantages (ADR-002 §Decisions; matches HAPPO state_type='EP').

The formula::

A_normalized = (A - mean(A)) / (std(A) + 1e-5)

matches the EP-state branch in :meth:HAPPO.train (lines 122-127 of harl/algorithms/actors/happo.py) without the nan-masking it applies to inactive agents. The ego in our 2-agent task is never inactive (use_policy_active_masks=False in :data:_HARL_FIXED_DEFAULTS), so the nan-masking branch is dead code in our setting and the formula collapses to a plain mean/std normalization.

plan/05 §5: pulled out of HARL into this module so the property test (T4b.13) can verify the formula matches single-agent PPO without tautology.

Parameters:

  • advantages (NDArray[float32]) –

    Raw advantages (typically from :func:compute_gae).

Returns:

  • NDArray[float32]

    Normalized advantages, same shape and dtype.

Source code in src/chamber/benchmarks/ego_ppo_trainer.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
def normalize_advantages(advantages: NDArray[np.float32]) -> NDArray[np.float32]:
    """Mean/std-normalize advantages (ADR-002 §Decisions; matches HAPPO state_type='EP').

    The formula::

        A_normalized = (A - mean(A)) / (std(A) + 1e-5)

    matches the EP-state branch in :meth:`HAPPO.train`
    (lines 122-127 of ``harl/algorithms/actors/happo.py``) without the
    nan-masking it applies to inactive agents. The ego in our 2-agent
    task is never inactive (``use_policy_active_masks=False`` in
    :data:`_HARL_FIXED_DEFAULTS`), so the nan-masking branch is dead
    code in our setting and the formula collapses to a plain
    mean/std normalization.

    plan/05 §5: pulled out of HARL into this module so the property test
    (T4b.13) can verify the formula matches single-agent PPO without
    tautology.

    Args:
        advantages: Raw advantages (typically from :func:`compute_gae`).

    Returns:
        Normalized advantages, same shape and dtype.
    """
    mean = float(advantages.mean())
    std = float(advantages.std())
    return ((advantages - mean) / (std + 1e-5)).astype(np.float32)

Stage-1 spike-adapter common surface (ADR-007 §Stage 1a / §Stage 1b; plan/07 §T5b.2).

Single source of truth for the seam that Stage-1 AS and OM adapters (:mod:chamber.benchmarks.stage1_as, :mod:chamber.benchmarks.stage1_om) share — the :class:EgoActionFactory Protocol the Phase-1 trained-ego path plugs into, and the Stage-1a production-default :func:_zero_ego_action_factory.

Why a Protocol (not a per-step Callable):

  • The factory contract is called once per (seed, condition) pair. That matches the natural shape of a trained-policy injection point — Phase 1 (ADR-007 §Stage 1b) supplies a factory that calls :func:concerto.training.ego_aht.train for the homogeneous condition's env at the given seed, then returns a closure that wraps the trained policy's act (the closure is then re-used across the 20 evaluation episodes within that (seed, condition) cell).
  • The Phase-0 per-step ego_action_fn was unable to express this contract — there was no place to hang a "this happens once per (seed, condition), not once per step" idea. The Protocol seam pins the lifecycle explicitly.

Phase-1 wiring contract (verbatim, do not rewrite without an ADR-007 amendment):

Phase-1 supplies a factory that calls concerto.training.ego_aht.train for the homogeneous condition's env at the given seed, returns lambda obs: trained_policy.act(obs). The factory is called once per (seed, condition) pair so the trained policy is reused across the 20 evaluation episodes within a seed.

Stage-1a production default: :func:_zero_ego_action_factory returns a closure that emits a zero action vector of the env's ego-action shape. Stage-1a's purpose is to exercise the rig (env steps, partner acts, episode terminations, success thresholding, SpikeRun aggregation, chamber-eval pipeline) without confounding the rig signal with a trainer signal; a learning ego is sequenced to Stage-1b (ADR-007 §Stage 1b).

Stage-1b production default (P1.04): :class:TrainedPolicyFactory — wires :func:chamber.benchmarks.training_runner.run_training for each (seed, condition) cell and returns a closure that wraps the trained :class:~concerto.training.ego_aht.EgoTrainer's act method (deterministic mode). One ego-AHT training run per cell (5 GPU-h per axis on A100 at the 100k-frame budget per ADR-007 §Stage 1b); the trainer is recovered from the :class:~concerto.training.ego_aht.TrainingResult NamedTuple rather than reloaded from disk, so no checkpoint round-trip per cell. The Phase-0 _zero_ego_action_factory is unchanged — Stage-1a runs keep the rig-validation contract; the Stage-1b dispatch swap is sequenced to P1.05 (separate slice).

EgoActionFactory

Bases: Protocol

Stage-1 ego-action factory contract (ADR-007 §Stage 1a / §Stage 1b).

Called once per (seed, condition) pair by the Stage-1 spike adapter's :func:_run_axis_with_factories. The returned callable is invoked once per env step for the 20 evaluation episodes within that (seed, condition) cell.

Two production implementations satisfy this Protocol:

  • :func:_zero_ego_action_factory — the Stage-1a default (rig validation). Returns a closure that emits a zero action vector of the env's ego-action shape. No randomness; no training.
  • :class:TrainedPolicyFactory — the Stage-1b default (science evaluation; ADR-007 §Stage 1b). Wires :func:concerto.training.ego_aht.train via :func:chamber.benchmarks.training_runner.run_training for each (seed, condition) cell, then returns a closure that wraps the trained :class:~concerto.training.ego_aht.EgoTrainer's act method (deterministic mode for evaluation).

The Protocol is intentionally tiny — (env, seed) → callable — so an external user can plug in their own algorithm without changing :func:_run_axis_with_factories. Project test fixtures can build a counting factory (see tests.integration.test_stage1_*_fake) to assert the lifecycle contract.

Source code in src/chamber/benchmarks/stage1_common.py
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
class EgoActionFactory(Protocol):
    """Stage-1 ego-action factory contract (ADR-007 §Stage 1a / §Stage 1b).

    Called once per ``(seed, condition)`` pair by the Stage-1 spike
    adapter's :func:`_run_axis_with_factories`. The returned callable
    is invoked once per env step for the 20 evaluation episodes within
    that ``(seed, condition)`` cell.

    Two production implementations satisfy this Protocol:

    - :func:`_zero_ego_action_factory` — the Stage-1a default (rig
      validation). Returns a closure that emits a zero action vector
      of the env's ego-action shape. No randomness; no training.
    - :class:`TrainedPolicyFactory` — the Stage-1b default (science
      evaluation; ADR-007 §Stage 1b). Wires
      :func:`concerto.training.ego_aht.train` via
      :func:`chamber.benchmarks.training_runner.run_training` for each
      ``(seed, condition)`` cell, then returns a closure that wraps
      the trained :class:`~concerto.training.ego_aht.EgoTrainer`'s
      ``act`` method (deterministic mode for evaluation).

    The Protocol is intentionally tiny — ``(env, seed)`` → callable —
    so an external user can plug in their own algorithm without
    changing :func:`_run_axis_with_factories`. Project test fixtures
    can build a counting factory (see
    ``tests.integration.test_stage1_*_fake``) to assert the lifecycle
    contract.
    """

    def __call__(
        self,
        env: gym.Env[Any, Any],
        seed: int,
    ) -> Callable[[Mapping[str, Any]], NDArray[np.float32]]:
        """Build the per-step ego-action callable for one ``(seed, condition)`` cell."""
        ...  # pragma: no cover

__call__

__call__(
    env: Env[Any, Any], seed: int
) -> Callable[[Mapping[str, Any]], NDArray[np.float32]]

Build the per-step ego-action callable for one (seed, condition) cell.

Source code in src/chamber/benchmarks/stage1_common.py
106
107
108
109
110
111
112
def __call__(
    self,
    env: gym.Env[Any, Any],
    seed: int,
) -> Callable[[Mapping[str, Any]], NDArray[np.float32]]:
    """Build the per-step ego-action callable for one ``(seed, condition)`` cell."""
    ...  # pragma: no cover

TrainedPolicyFactory

Phase-1 :class:EgoActionFactory — wires the ego-AHT training loop per cell.

ADR-007 §Stage 1b production factory. Implements the :class:EgoActionFactory Protocol (this module) by calling :func:chamber.benchmarks.training_runner.run_training for each (seed, condition) cell and returning a closure that wraps the trained :class:~concerto.training.ego_aht.EgoTrainer's act method (deterministic mode for evaluation).

Lifecycle (ADR-007 §Stage 1b lifecycle contract):

The factory is called once per (seed, condition) pair by the Stage-1 spike adapter's :func:_run_axis_with_factories (5 seeds x 2 conditions = 10 calls per Stage-1 spike). Each call drives one full training run (5 x 100k frames per axis = 1M frames per axis on A100; ~5 GPU-h per axis per the compute plan); the returned closure is re-used across the 20 evaluation episodes within the cell so the trained policy is not retrained per-episode.

Determinism (ADR-007 §Discipline + ADR-002 §Decisions; P6):

The trained-policy seed is set via Pydantic model_copy on cfg.seed; the underlying training run derives all its randomness from :func:concerto.training.seeding.derive_substream. Two invocations at the same (seed, condition) pair produce byte-identical trained policies on CPU; on GPU, the cuDNN / cuBLAS non-bit- determinism caveat (plan/08 §4) applies.

Caching policy (intentionally NO cache):

The factory does NOT cache trained policies across __call__ invocations — each call trains a fresh policy. Caching would silently violate the per-(seed, condition) determinism contract if the cache key were not exhaustive (env identity, partner identity, prereg tag, code state). If the operator needs to resume a partial run, the right tool is the :class:~concerto.training.checkpoints.CheckpointMetadata sidecar serialised under cfg.artifacts_root (the training loop emits one per cfg.checkpoint_every steps); not an in-memory cache.

Partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3):

The partner-freeze gate is honoured by :meth:EgoPPOTrainer.__init__'s :func:_assert_partner_is_frozen call (run first, before any tensor allocation). The factory does not bypass it.

Per-condition partner-uid routing (P1.04 / Q-C refinement):

The factory reads the env's :attr:partner_uid and :attr:ego_uid properties at __call__ time and overrides cfg.env.agent_uids accordingly — single source of truth at the env layer; the factory carries no per-condition dispatch table. For AS-homo (partner=panda_partner) vs AS-hetero / OM-* (partner=fetch), the env's condition_id drives the routing via the :data:chamber.envs.stage1_pickplace._CONDITION_TABLE. Envs that don't expose the properties (Tier-1 fake envs, MPE stand-in) fall back to cfg.env.agent_uids[0] / [1] — the Phase-0 contract that pre-dates the property.

Action-dim handoff to partner (P1.04 / Q-C refinement):

The :class:~chamber.partners.heuristic.ScriptedHeuristicPartner reads its action dimension from spec.extra["action_dim"] and zero-pads beyond the planar xy. The factory reads the actual env.action_space.spaces[partner_uid].shape[0] at call time and overrides cfg.partner.extra["action_dim"] so the partner's action vector matches the env's expectation per-condition. This avoids a side-table; the action_dim is sourced from the env that owns the contract.

Parameters:

  • cfg (EgoAHTConfig) –

    A fully-resolved :class:~concerto.training.config.EgoAHTConfig. The factory rewrites cfg.seed per call via cfg.model_copy(update={"seed": call_seed}) and applies the per-condition env+partner overrides described above; all other fields are honoured as-is.

Example wiring at the Stage-1b adapter (sequenced to P1.05):

.. code-block:: python

from chamber.benchmarks.stage1_common import TrainedPolicyFactory
from concerto.training.config import load_config

factory = TrainedPolicyFactory(
    cfg=load_config(
        config_path=Path("configs/training/ego_aht_happo/stage1_pickplace.yaml")
    )
)
spike_run = _run_axis_with_factories(
    prereg=spec,
    prereg_sha=prereg_sha,
    env_factory=_default_env_factory,
    ego_action_factory=factory,
)
Source code in src/chamber/benchmarks/stage1_common.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
class TrainedPolicyFactory:
    """Phase-1 :class:`EgoActionFactory` — wires the ego-AHT training loop per cell.

    ADR-007 §Stage 1b production factory. Implements the
    :class:`EgoActionFactory` Protocol (this module) by calling
    :func:`chamber.benchmarks.training_runner.run_training` for each
    ``(seed, condition)`` cell and returning a closure that wraps the
    trained :class:`~concerto.training.ego_aht.EgoTrainer`'s ``act``
    method (deterministic mode for evaluation).

    Lifecycle (ADR-007 §Stage 1b lifecycle contract):

    The factory is called *once per ``(seed, condition)`` pair* by the
    Stage-1 spike adapter's :func:`_run_axis_with_factories` (5 seeds x
    2 conditions = 10 calls per Stage-1 spike). Each call drives one
    full training run (5 x 100k frames per axis = 1M frames per axis
    on A100; ~5 GPU-h per axis per the compute plan); the returned
    closure is re-used across the 20 evaluation episodes within the
    cell so the trained policy is not retrained per-episode.

    Determinism (ADR-007 §Discipline + ADR-002 §Decisions; P6):

    The trained-policy seed is set via Pydantic ``model_copy`` on
    ``cfg.seed``; the underlying training run derives all its
    randomness from
    :func:`concerto.training.seeding.derive_substream`. Two invocations
    at the same ``(seed, condition)`` pair produce byte-identical
    trained policies on CPU; on GPU, the cuDNN / cuBLAS non-bit-
    determinism caveat (plan/08 §4) applies.

    Caching policy (intentionally NO cache):

    The factory does NOT cache trained policies across ``__call__``
    invocations — each call trains a fresh policy. Caching would
    silently violate the per-``(seed, condition)`` determinism contract
    if the cache key were not exhaustive (env identity, partner
    identity, prereg tag, code state). If the operator needs to resume
    a partial run, the right tool is the
    :class:`~concerto.training.checkpoints.CheckpointMetadata` sidecar
    serialised under ``cfg.artifacts_root`` (the training loop emits
    one per ``cfg.checkpoint_every`` steps); not an in-memory cache.

    Partner-freeze gate (ADR-009 §Consequences; plan/05 §6 #3):

    The partner-freeze gate is honoured by
    :meth:`EgoPPOTrainer.__init__`'s :func:`_assert_partner_is_frozen`
    call (run first, before any tensor allocation). The factory does
    not bypass it.

    Per-condition partner-uid routing (P1.04 / Q-C refinement):

    The factory reads the env's :attr:`partner_uid` and :attr:`ego_uid`
    properties at ``__call__`` time and overrides
    ``cfg.env.agent_uids`` accordingly — single source of truth at the
    env layer; the factory carries no per-condition dispatch table.
    For AS-homo (partner=``panda_partner``) vs AS-hetero / OM-*
    (partner=``fetch``), the env's ``condition_id`` drives the routing
    via the
    :data:`chamber.envs.stage1_pickplace._CONDITION_TABLE`. Envs that
    don't expose the properties (Tier-1 fake envs, MPE stand-in) fall
    back to ``cfg.env.agent_uids[0]`` / ``[1]`` — the Phase-0 contract
    that pre-dates the property.

    Action-dim handoff to partner (P1.04 / Q-C refinement):

    The :class:`~chamber.partners.heuristic.ScriptedHeuristicPartner`
    reads its action dimension from ``spec.extra["action_dim"]`` and
    zero-pads beyond the planar xy. The factory reads the actual
    ``env.action_space.spaces[partner_uid].shape[0]`` at call time and
    overrides ``cfg.partner.extra["action_dim"]`` so the partner's
    action vector matches the env's expectation per-condition. This
    avoids a side-table; the action_dim is sourced from the env that
    owns the contract.

    Args:
        cfg: A fully-resolved :class:`~concerto.training.config.EgoAHTConfig`.
            The factory rewrites ``cfg.seed`` per call via
            ``cfg.model_copy(update={"seed": call_seed})`` and applies
            the per-condition env+partner overrides described above;
            all other fields are honoured as-is.

    Example wiring at the Stage-1b adapter (sequenced to P1.05):

    .. code-block:: python

        from chamber.benchmarks.stage1_common import TrainedPolicyFactory
        from concerto.training.config import load_config

        factory = TrainedPolicyFactory(
            cfg=load_config(
                config_path=Path("configs/training/ego_aht_happo/stage1_pickplace.yaml")
            )
        )
        spike_run = _run_axis_with_factories(
            prereg=spec,
            prereg_sha=prereg_sha,
            env_factory=_default_env_factory,
            ego_action_factory=factory,
        )
    """

    def __init__(self, *, cfg: EgoAHTConfig) -> None:
        """Build the factory; capture the base config.

        Args:
            cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`
                — the Stage-1b config (typically loaded from
                ``configs/training/ego_aht_happo/stage1_pickplace.yaml``).
                The original instance is never mutated; per-call
                overrides are applied via ``model_copy``.
        """
        self._cfg = cfg

    def __call__(
        self,
        env: gym.Env[Any, Any],
        seed: int,
    ) -> EgoActionCallable:
        """Train a per-cell policy and return the per-step closure (ADR-007 §Stage 1b).

        Lifecycle: invoked once per ``(seed, condition)`` cell by the
        Stage-1 adapter's :func:`_run_axis_with_factories`. The returned
        callable is reused across the 20 evaluation episodes within
        the cell.

        Args:
            env: The per-condition evaluation env. ``env.ego_uid`` /
                ``env.partner_uid`` (P1.04 properties on
                :class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv`)
                drive the per-condition overrides applied to the base
                ``cfg``. ``env`` itself is NOT stepped by the factory;
                the training run inside :func:`run_training` builds
                its own env from the (overridden) ``cfg.env``. The
                eval env is what the spike adapter's loop steps after
                this factory returns.
            seed: Per-cell seed; the factory overrides ``cfg.seed`` via
                ``model_copy``.

        Returns:
            ``(obs) -> action`` callable that wraps
            ``trained_policy.act(obs, deterministic=True)`` and casts
            the output to ``float32``.

        Raises:
            ValueError: Propagated from :meth:`EgoPPOTrainer.__init__`
                if the partner is non-frozen (ADR-009 §Consequences).
                Any other exception raised by
                :func:`chamber.benchmarks.training_runner.run_training`
                also propagates unchanged (per the
                ``test_run_training_failure_propagates`` contract in
                ``tests/integration/test_trained_policy_factory_fake.py``).
        """
        # Imported lazily so chamber.benchmarks.stage1_common stays
        # cheap to import for Stage-1a-only consumers (the Phase-0 path
        # that does not need the HARL HAPPO trainer in scope).
        from chamber.benchmarks.training_runner import run_training

        # 1. Resolve the per-condition ego + partner uids from the env
        #    (P1.04 / Q-C: single source of truth at the env layer).
        #    Fall back to the cfg's agent_uids for envs that don't
        #    expose the properties (Tier-1 fake envs, MPE stand-in).
        ego_uid = self._resolve_ego_uid(env)
        partner_uid = self._resolve_partner_uid(env)

        # 2. Resolve the partner's action_dim from the env's
        #    action_space — avoids a per-condition side-table.
        partner_action_dim = self._resolve_partner_action_dim(env, partner_uid)

        # 3. Resolve the env's condition_id for Stage-1b disambiguation
        #    (OM-homo vs OM-hetero share the same agent_uids tuple, so
        #    the condition_id is the explicit signal that drives the
        #    Stage-1b env build inside run_training → build_env). None
        #    for non-Stage-1b envs; build_env ignores the field when
        #    task != "stage1_pickplace".
        condition_id = self._resolve_condition_id(env)

        # 4. Compose the per-call cfg via Pydantic model_copy so the
        #    base config passed at __init__ is never mutated.
        cfg_for_call = self._compose_cfg_for_call(
            seed=seed,
            ego_uid=ego_uid,
            partner_uid=partner_uid,
            partner_action_dim=partner_action_dim,
            condition_id=condition_id,
        )

        # 5. Build the per-cell safety stack (P1.04.5; ADR-007 §Stage 1b).
        #    When cfg.safety.enabled=True, construct the filter +
        #    SafetyState + Bounds + snapshot builder + dt now, pass
        #    through to run_training. When False, all five stay None
        #    and train() runs the pre-P1.04.5 unfiltered path.
        safety_kwargs = self._build_safety_for_cell(env=env, cfg=cfg_for_call)

        # 6. Drive the chamber-side training entry point. Returns a
        #    TrainingResult NamedTuple (P1.04) carrying the curve +
        #    the trained EgoTrainer instance — no checkpoint
        #    round-trip needed.
        result = run_training(cfg_for_call, repo_root=None, **safety_kwargs)
        trainer = result.trainer

        # 6. Build the per-step closure. Deterministic mode for eval;
        #    cast to float32 so the adapter's step-loop dict can be
        #    handed straight to the env's step boundary.
        def _act(obs: Mapping[str, Any]) -> NDArray[np.float32]:
            """ADR-007 §Stage 1b per-step ego closure (deterministic)."""
            action = trainer.act(obs, deterministic=True)
            return np.asarray(action, dtype=np.float32)

        return _act

    # ----- Internal composition helpers -----

    def _resolve_ego_uid(self, env: gym.Env[Any, Any]) -> str:
        """Read ``env.ego_uid`` (P1.04 property); fall back to ``cfg.env.agent_uids[0]``."""
        uid = getattr(env, "ego_uid", None)
        if isinstance(uid, str):
            return uid
        return self._cfg.env.agent_uids[0]

    def _resolve_partner_uid(self, env: gym.Env[Any, Any]) -> str:
        """Read ``env.partner_uid`` (P1.04 property); fall back to ``cfg.env.agent_uids[1]``."""
        uid = getattr(env, "partner_uid", None)
        if isinstance(uid, str):
            return uid
        return self._cfg.env.agent_uids[1]

    def _resolve_partner_action_dim(
        self,
        env: gym.Env[Any, Any],
        partner_uid: str,
    ) -> int:
        """Read partner action_dim from ``env.action_space.spaces[partner_uid].shape[0]``.

        Falls back to the cfg's existing ``partner.extra["action_dim"]`` if
        the env's action_space doesn't carry the partner uid (Tier-1 fake
        envs that don't index by the same uid).
        """
        action_space = env.action_space
        if isinstance(action_space, gym.spaces.Dict):
            sub = action_space.spaces.get(partner_uid)
            if isinstance(sub, gym.spaces.Box) and sub.shape is not None:
                return int(sub.shape[0])
        # Fallback: trust the cfg.
        existing = self._cfg.partner.extra.get("action_dim", "2")
        try:
            return int(existing)
        except (TypeError, ValueError):
            return 2

    def _resolve_condition_id(self, env: gym.Env[Any, Any]) -> str | None:
        """Read ``env.condition_id`` (P1.03 property); ``None`` for non-Stage-1b envs.

        The condition_id disambiguates OM-homo vs OM-hetero at the
        chamber-side build_env dispatch (both conditions share the
        ``("panda_wristcam", "fetch")`` agent_uids tuple, so the
        condition_id is the only signal that drives the right Stage-1b
        env build).
        """
        cid = getattr(env, "condition_id", None)
        return cid if isinstance(cid, str) else None

    def _compose_cfg_for_call(
        self,
        *,
        seed: int,
        ego_uid: str,
        partner_uid: str,
        partner_action_dim: int,
        condition_id: str | None,
    ) -> EgoAHTConfig:
        """Apply seed + env + partner overrides via Pydantic ``model_copy`` (P6)."""
        env_update: dict[str, object] = {"agent_uids": (ego_uid, partner_uid)}
        if condition_id is not None:
            env_update["condition_id"] = condition_id
        env_override = self._cfg.env.model_copy(update=env_update)
        partner_extra = dict(self._cfg.partner.extra)
        partner_extra["uid"] = partner_uid
        partner_extra["action_dim"] = str(partner_action_dim)
        partner_override = self._cfg.partner.model_copy(update={"extra": partner_extra})
        return self._cfg.model_copy(
            update={
                "seed": seed,
                "env": env_override,
                "partner": partner_override,
            }
        )

    def _build_safety_for_cell(
        self,
        *,
        env: gym.Env[Any, Any],
        cfg: EgoAHTConfig,
    ) -> dict[str, Any]:
        """Construct the per-cell safety stack (P1.04.5; ADR-007 §Stage 1b).

        Called once per ``(seed, condition)`` cell by
        :meth:`__call__`. When ``cfg.safety.enabled=True`` builds the
        filter + SafetyState + Bounds + snapshot builder + dt from the
        env via the chamber-side helpers; otherwise returns an all-
        ``None`` dict that's a no-op kwargs unpack at the
        :func:`run_training` boundary.

        The :class:`SafetyState` is sized to ``n_pairs = n_agents *
        (n_agents - 1) / 2`` (1 for Stage-1b's 2-agent cell) and
        warm-started via
        :func:`concerto.safety.conformal.reset_on_partner_swap` with
        ``cfg.safety.lambda_safe`` + ``cfg.safety.n_warmup_steps``.
        Per D3 of the P1.04.5 design pass, the state is NOT reset per
        episode — the conformal slack accumulates across episodes
        within the cell per Huriot-Sibai §IV.A's long-run rule.

        Args:
            env: The per-cell evaluation env (the same env the
                returned closure will step). Used to source
                ``build_safety_filter``, ``build_bounds_for_env``,
                ``build_safety_snapshot_builder``, ``build_safety_dt``.
            cfg: The per-call cfg (post seed + condition_id overrides).
                ``cfg.safety`` drives the stack's parameters.

        Returns:
            ``dict`` with keys ``safety_filter``, ``safety_state``,
            ``safety_bounds``, ``safety_snapshot_builder``,
            ``safety_dt``, and ``safety_emergency_controllers``
            (P1.04.6 addition; ADR-007 §Stage 1b Rev 8). All-``None``
            when ``cfg.safety.enabled`` is False; all-populated
            otherwise. The dict unpacks directly into the
            :func:`run_training` kwargs (intent-mismatch detection for
            the first five keys lives in
            :func:`concerto.training.ego_aht.train`'s validation
            block; ``safety_emergency_controllers`` is a free-standing
            hint, only consulted when the first five are wired and
            :func:`concerto.safety.braking.maybe_brake` fires).
        """
        if not cfg.safety.enabled:
            return {
                "safety_filter": None,
                "safety_state": None,
                "safety_bounds": None,
                "safety_snapshot_builder": None,
                "safety_dt": None,
                "safety_emergency_controllers": None,
            }
        # Lazy imports: keep the factory Tier-1-import-safe on
        # Vulkan-less hosts; safety_state construction only resolves
        # when the cfg says so.
        from chamber.benchmarks.training_runner import (
            build_bounds_for_env,
            build_emergency_controllers,
            build_safety_dt,
            build_safety_filter,
            build_safety_snapshot_builder,
        )
        from concerto.safety.api import SafetyState, make_lambda_dict
        from concerto.safety.conformal import reset_on_partner_swap

        # Construct the filter + bounds from the env.
        safety_filter = build_safety_filter(env)
        safety_bounds = build_bounds_for_env(env)
        snapshot_builder = build_safety_snapshot_builder(env)
        safety_dt = build_safety_dt(env)
        # P1.04.6 (ADR-007 §Stage 1b Rev 8): per-uid emergency-controller
        # dispatch for the in-rollout :func:`maybe_brake` call. Built
        # from the same env-side control-model map :func:`build_safety_filter`
        # consumes, so the Jacobian-vs-DI dispatch matches the outer CBF
        # row's per-uid embodiment assumption.
        emergency_controllers = build_emergency_controllers(env)
        if snapshot_builder is None:
            msg = (
                "TrainedPolicyFactory._build_safety_for_cell: cfg.safety.enabled "
                "is True but env exposes no build_agent_snapshots() method. "
                "The Stage-1b env (chamber.envs.stage1_pickplace.Stage1PickPlaceEnv) "
                "ships this method in P1.04.5; envs that don't are not eligible "
                "for the safety stack."
            )
            raise ValueError(msg)
        # SafetyState construction + warm-start. Issue #144 promoted
        # ``lambda_`` to a :data:`concerto.safety.api.LambdaDict` keyed
        # by canonical UID-pair tuples over the cell's partner set.
        agent_uids = cfg.env.agent_uids
        state = SafetyState(lambda_=make_lambda_dict(agent_uids))
        reset_on_partner_swap(
            state,
            uids=agent_uids,
            lambda_safe=cfg.safety.lambda_safe,
            n_warmup_steps=cfg.safety.n_warmup_steps,
        )
        return {
            "safety_filter": safety_filter,
            "safety_state": state,
            "safety_bounds": safety_bounds,
            "safety_snapshot_builder": snapshot_builder,
            "safety_dt": safety_dt,
            "safety_emergency_controllers": emergency_controllers,
        }

__call__

__call__(
    env: Env[Any, Any], seed: int
) -> EgoActionCallable

Train a per-cell policy and return the per-step closure (ADR-007 §Stage 1b).

Lifecycle: invoked once per (seed, condition) cell by the Stage-1 adapter's :func:_run_axis_with_factories. The returned callable is reused across the 20 evaluation episodes within the cell.

Parameters:

  • env (Env[Any, Any]) –

    The per-condition evaluation env. env.ego_uid / env.partner_uid (P1.04 properties on :class:chamber.envs.stage1_pickplace.Stage1PickPlaceEnv) drive the per-condition overrides applied to the base cfg. env itself is NOT stepped by the factory; the training run inside :func:run_training builds its own env from the (overridden) cfg.env. The eval env is what the spike adapter's loop steps after this factory returns.

  • seed (int) –

    Per-cell seed; the factory overrides cfg.seed via model_copy.

Returns:

  • EgoActionCallable

    (obs) -> action callable that wraps

  • EgoActionCallable

    trained_policy.act(obs, deterministic=True) and casts

  • EgoActionCallable

    the output to float32.

Raises:

  • ValueError

    Propagated from :meth:EgoPPOTrainer.__init__ if the partner is non-frozen (ADR-009 §Consequences). Any other exception raised by :func:chamber.benchmarks.training_runner.run_training also propagates unchanged (per the test_run_training_failure_propagates contract in tests/integration/test_trained_policy_factory_fake.py).

Source code in src/chamber/benchmarks/stage1_common.py
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
def __call__(
    self,
    env: gym.Env[Any, Any],
    seed: int,
) -> EgoActionCallable:
    """Train a per-cell policy and return the per-step closure (ADR-007 §Stage 1b).

    Lifecycle: invoked once per ``(seed, condition)`` cell by the
    Stage-1 adapter's :func:`_run_axis_with_factories`. The returned
    callable is reused across the 20 evaluation episodes within
    the cell.

    Args:
        env: The per-condition evaluation env. ``env.ego_uid`` /
            ``env.partner_uid`` (P1.04 properties on
            :class:`chamber.envs.stage1_pickplace.Stage1PickPlaceEnv`)
            drive the per-condition overrides applied to the base
            ``cfg``. ``env`` itself is NOT stepped by the factory;
            the training run inside :func:`run_training` builds
            its own env from the (overridden) ``cfg.env``. The
            eval env is what the spike adapter's loop steps after
            this factory returns.
        seed: Per-cell seed; the factory overrides ``cfg.seed`` via
            ``model_copy``.

    Returns:
        ``(obs) -> action`` callable that wraps
        ``trained_policy.act(obs, deterministic=True)`` and casts
        the output to ``float32``.

    Raises:
        ValueError: Propagated from :meth:`EgoPPOTrainer.__init__`
            if the partner is non-frozen (ADR-009 §Consequences).
            Any other exception raised by
            :func:`chamber.benchmarks.training_runner.run_training`
            also propagates unchanged (per the
            ``test_run_training_failure_propagates`` contract in
            ``tests/integration/test_trained_policy_factory_fake.py``).
    """
    # Imported lazily so chamber.benchmarks.stage1_common stays
    # cheap to import for Stage-1a-only consumers (the Phase-0 path
    # that does not need the HARL HAPPO trainer in scope).
    from chamber.benchmarks.training_runner import run_training

    # 1. Resolve the per-condition ego + partner uids from the env
    #    (P1.04 / Q-C: single source of truth at the env layer).
    #    Fall back to the cfg's agent_uids for envs that don't
    #    expose the properties (Tier-1 fake envs, MPE stand-in).
    ego_uid = self._resolve_ego_uid(env)
    partner_uid = self._resolve_partner_uid(env)

    # 2. Resolve the partner's action_dim from the env's
    #    action_space — avoids a per-condition side-table.
    partner_action_dim = self._resolve_partner_action_dim(env, partner_uid)

    # 3. Resolve the env's condition_id for Stage-1b disambiguation
    #    (OM-homo vs OM-hetero share the same agent_uids tuple, so
    #    the condition_id is the explicit signal that drives the
    #    Stage-1b env build inside run_training → build_env). None
    #    for non-Stage-1b envs; build_env ignores the field when
    #    task != "stage1_pickplace".
    condition_id = self._resolve_condition_id(env)

    # 4. Compose the per-call cfg via Pydantic model_copy so the
    #    base config passed at __init__ is never mutated.
    cfg_for_call = self._compose_cfg_for_call(
        seed=seed,
        ego_uid=ego_uid,
        partner_uid=partner_uid,
        partner_action_dim=partner_action_dim,
        condition_id=condition_id,
    )

    # 5. Build the per-cell safety stack (P1.04.5; ADR-007 §Stage 1b).
    #    When cfg.safety.enabled=True, construct the filter +
    #    SafetyState + Bounds + snapshot builder + dt now, pass
    #    through to run_training. When False, all five stay None
    #    and train() runs the pre-P1.04.5 unfiltered path.
    safety_kwargs = self._build_safety_for_cell(env=env, cfg=cfg_for_call)

    # 6. Drive the chamber-side training entry point. Returns a
    #    TrainingResult NamedTuple (P1.04) carrying the curve +
    #    the trained EgoTrainer instance — no checkpoint
    #    round-trip needed.
    result = run_training(cfg_for_call, repo_root=None, **safety_kwargs)
    trainer = result.trainer

    # 6. Build the per-step closure. Deterministic mode for eval;
    #    cast to float32 so the adapter's step-loop dict can be
    #    handed straight to the env's step boundary.
    def _act(obs: Mapping[str, Any]) -> NDArray[np.float32]:
        """ADR-007 §Stage 1b per-step ego closure (deterministic)."""
        action = trainer.act(obs, deterministic=True)
        return np.asarray(action, dtype=np.float32)

    return _act

__init__

__init__(*, cfg: EgoAHTConfig) -> None

Build the factory; capture the base config.

Parameters:

  • cfg (EgoAHTConfig) –

    Validated :class:~concerto.training.config.EgoAHTConfig — the Stage-1b config (typically loaded from configs/training/ego_aht_happo/stage1_pickplace.yaml). The original instance is never mutated; per-call overrides are applied via model_copy.

Source code in src/chamber/benchmarks/stage1_common.py
290
291
292
293
294
295
296
297
298
299
300
def __init__(self, *, cfg: EgoAHTConfig) -> None:
    """Build the factory; capture the base config.

    Args:
        cfg: Validated :class:`~concerto.training.config.EgoAHTConfig`
            — the Stage-1b config (typically loaded from
            ``configs/training/ego_aht_happo/stage1_pickplace.yaml``).
            The original instance is never mutated; per-call
            overrides are applied via ``model_copy``.
    """
    self._cfg = cfg