6. Optimization pipeline (optimizer.py)
The Optimizer class is the user‑facing entry point for “take a Clifford+phase circuit, and try to reduce its T‑count (and maybe depth) using Reed–Muller decoding.” It sits on top of:
- the circuit/phase‑polynomial primitives in
core.py, - the decoding strategies and policies in
decoders.py, and - the autotuner in
autotune.py.
At a high level, the pipeline is:
- Extract the phase polynomial from a circuit.
- Turn its \(\mathbb{Z}_8\) coefficients into a punctured binary word (oddness).
- Choose a decoding strategy and parameters.
- Decode in \(\mathrm{RM}(r, n)\) to get a codeword and monomial set.
- Lift that correction back to \(\mathbb{Z}_8\), apply it to the coefficients, and re‑synthesize.
- Optionally check consistency contracts and return an
OptimizeReport.
The rest of this section unpacks how that is wired up.
6.1 Effort and parameter mapping
The library tries to make “how hard should you try?” a single knob. Internally, several helper functions interpret that knob into concrete decoder parameters.
_beam_from_effort(effort) turns an integer effort level into a beam/list size. If effort is None, the default is a beam of 8. Otherwise the value is clamped to the range 1–5 and mapped to:
- effort 1 → beam 2
- effort 2 → beam 4
- effort 3 → beam 8
- effort 4 → beam 16
- effort 5 → beam 32
The pattern is simply 1 << effort, with the clamping handling out‑of‑range values.
_rpa_iters_from_effort(e) similarly maps an effort level into the number of RPA iterations:
- 1 → 1 iteration
- 2 → 2 iterations
- 3 → 2 iterations
- 4 → 3 iterations
- 5 → 3 iterations
So mid‑range efforts already get a couple of RPA passes; going to the very highest efforts gives you more iterations but doesn’t grow unbounded.
For SNAP (local refinement), _snap_params_from_effort(e) chooses:
- a local search radius
snap_t(whether to consider size‑1 or size‑2 or size‑3 subsets), - a pool size
snap_pool, and - whether strong (branch‑and‑bound) SNAP should be considered a natural default.
When e is None, it returns (2, 16, False), i.e. size‑2 SNAP on a pool of 16 rows with no strong SNP. For explicit effort values:
- effort 1 →
(1, 8, False)– very light, only size‑1. - effort 2 →
(2, 12, False). - effort 3 →
(2, 16, False). - effort 4 →
(2, 24, True)– larger pool and strong SNAP is the default. - effort 5 →
(3, 24, True)– allow size‑3 local moves and strong SNAP.
Those helpers are used in _choose to avoid duplicating “effort → parameters” logic for each decoder flavor.
Finally, there’s a special parsing helper for one specific use case: “I don’t care about effort, just make it run in about X milliseconds.”
If effort is a string of the form "auto-latency-<Xms>" (whitespace ignored, ms suffix optional), this returns the latency budget X as a float. For anything else (None, integers, other strings), it returns None.
This is the hook that lets you write:
and have the optimizer delegate parameter choice to the autotuner rather than to fixed hard‑coded tables.6.2 Cost model placeholder (CostModel)
There is a small CostModel dataclass:
@dataclass
class CostModel:
bitplane_cost: Dict[int, float] = None
def __post_init__(self):
if self.bitplane_cost is None:
self.bitplane_cost = {0: 1.0}
Right now it is a placeholder for future work. The idea is that in multi‑order and bitplane‑aware optimizations, you might want to assign different relative costs to different bitplanes or layers (e.g., T vs S vs Z planes, or different architectural penalties). For the current Clifford+T pipeline, this is not yet wired into the decoder decisions, so bitplane_cost defaults to a trivial {0: 1.0}.
If we were to extend the library, this is where per‑plane or per‑bit cost models would be introduced, and then threaded into decoders’ policy logic or into rotk.py’s multi‑order routines.
6.3 The Optimizer class
Optimizer is what you construct in user code:
from rmsynth import Optimizer
opt = Optimizer(decoder="rpa", effort=3, policy="distance+depth", policy_lambda=5)
new_circ, rep = opt.optimize(circ)
The constructor takes a fairly rich set of knobs, but most have sensible defaults.
-
decoder: which top‑level strategy to use. If you passNone, the constructor will choose"auto"when the C++ extension is available, and"ml-exact"otherwise. -
effort: an integer 1–5 or a string"auto-latency-<Xms>". Integer effort is interpreted via_beam_from_effort,_rpa_iters_from_effort,_snap_params_from_effort. The special string is reserved for autotuning (see §6.4). -
list_size: explicit beam size overriding the mapping fromeffort. If you set this,_beam_from_effortis only used whenlist_sizeisNone. -
chase_t,chase_limit: parameters for"dumer-list-adv"(Dumer‑List‑Chase). These control how many reliability flips and how deep the chase search may go. -
SNAP / RPA parameters:
snap_effort: separate effort level for SNAP; if omitted, the maineffortis reused.snap_t,snap_pool: explicit overrides for the local SNAP parameters; if leftNone, they are derived fromsnap_effort.snap_strong: whether strong branch‑and‑bound SNAP is a default; ifNone, the default comes from_snap_params_from_effort.snap_time_ms,snap_node_limit: soft limits that are passed down tosnap_branch_and_bound_bits.rpa_iters: how many RPA iterations to use (if you don’t want the effort mapping).
-
OSD options:
osd_top: how many beam candidates to feed intobeam-osd*strategies; defaults tomin(list_size, 8)when not specified.
-
Policy knobs:
policy: textual policy name to balance distance vs depth, understood by_lambda_from_policyandpolicy_decide(see §5.5).policy_lambda: numeric weight used for"distance+depth"style policies.depth_tradeoff: legacy explicit guard parameter; still supported and forwarded directly intodecode_rm. If you set bothpolicyanddepth_tradeoff, the mapping logic indecoderswill combine them.
-
Autotuner knobs:
autotune_selector: controls which selector the autotuner uses ("quality-under-target"vs"pareto", matchingautotune.calibrate_single).autotune_pareto_dist,autotune_pareto_lat: Pareto slack and latency factor for the autotuner when using the Pareto selector.
-
check_contracts: ifTrue, the optimizer will run strict consistency checks fromcontracts.pyafter every optimization; ifFalse, it will call the cheapermaybe_check_all.
The constructor stores all of these as self.* attributes and also initializes two introspection fields:
self.last_decoder_used: the final strategy string ("dumer","rpa-adv", etc.) chosen in the last call tooptimize.self.last_params_used: the exact keyword dictionary passed intodecode_rm(list size, RPA iterations, SNAP knobs, policy parameters).
The CLI uses these to print human‑readable summaries of what actually ran; you can also use them in tests or notebooks to inspect the optimizer’s decisions.
6.4 Strategy selection: _choose(...)
The method Optimizer._choose(self, n: int, before_t: int) -> (strategy, kwargs) encapsulates all of the high‑level decision making: given the number of qubits n and the current T‑count before_t, it decides which decoder strategy to call and with which parameters.
The docstring says it explicitly:
Select a decoder strategy + kwargs, honoring:
- explicit decoder choices first,
- auto‑latency (if requested) → rpa‑adv with autotuned params,
- 'auto' policy: small n → dumer; medium n (n=6) → dumer‑list; large n or high T → rpa‑adv.
First it builds a policy_kwargs dictionary. If either policy or depth_tradeoff was set on the optimizer, it includes:
policy_kwargs = {
"policy": self.policy,
"policy_lambda": self.policy_lambda,
"depth_tradeoff": self.depth_tradeoff,
}
Those are then threaded into any strategy that calls decode_rm, so all internal comparisons share the same distance–depth semantics.
Auto‑latency path
Before looking at self.decoder, _choose checks whether effort encodes a latency target:
auto_ms = _parse_auto_latency(self.effort)
if auto_ms is not None and _load_rmcore() is not None:
params = suggest_params(..., target_ms=auto_ms, ...)
...
return "rpa-adv", kwargs
If _parse_auto_latency returns a time budget and the C++ extension is available, _choose delegates to the autotuner:
- It calls
autotune.suggest_params(n, before_t, auto_ms, ...), passing through the same policy and selector knobs you gave to the optimizer. suggest_paramsreturns anEffortParamsdataclass with fields likebeam,rpa_iters,snap_pool, etc._choosethen builds akwargsdict for"rpa-adv":
In this mode, effort is essentially an instruction to autotune for a target latency, not a discrete level. The choice of decoder is fixed to "rpa-adv"; beam and SNAP parameters are derived from real timing measurements cached in autotune.py.
Explicit decoders
If the auto‑latency path didn’t trigger, _choose honours your decoder string more or less literally.
"ml-exact"→ returns("ml-exact", {}).decode_rmwill use the brute‑force ML decoder fromcore.py."dumer"→("dumer", {})."dumer-list"→("dumer-list", {"list_size": ls, **policy_kwargs}), wherelsis either your explicitlist_sizeor derived from_beam_from_effort(self.effort)."dumer-list-adv"→ maps to("dumer-list-chase", {list_size, chase_t, chase_limit, **policy_kwargs}). This is the more advanced Dumer‑List‑Chase strategy."osd1","osd2","osd3"→ return the same string as strategy, plus alist_size. These are handled indecode_rmby callingosd_decodeon top of a Dumer‑List baseline."beam-osd1","beam-osd2","beam-osd3"→ similar, but also include anosd_topparameter (defaulting tomin(ls, 8)) sodecode_rmknows how many beam seeds to refine with OSD.
For the RPA families, _choose sets up everything needed for "rpa-adv" and "rpa2" based on effort and SNAP parameters:
"rpa":- Compute
lsfromlist_sizeor effort. - Compute
itfromrpa_itersor_rpa_iters_from_effort(self.effort). - Compute
(st, sp, strong_default)from_snap_params_from_effort, usingsnap_effortif set, otherwiseeffort. - Override
standspwith explicitly providedsnap_tandsnap_poolif any. - Decide
strongbased onsnap_strongor thestrong_default.
- Compute
Then return:
("rpa-adv",
{"list_size": ls, "rpa_iters": it, "snap": True,
"snap_t": st, "snap_pool": sp, "snap_strong": strong,
"snap_time_ms": ..., "snap_node_limit": ...,
**policy_kwargs})
"rpa2" follows exactly the same mapping, but returns "rpa2" as the strategy instead of "rpa-adv".
So "rpa" is really just a shorthand for “RPA‑1 with SNAP and possibly strong SNAP, using effort‑based defaults.”
"auto" decoder policy
If you set decoder="auto", _choose inspects both the dimension n and the current T‑count before_t and follows a simple three‑regime heuristic (assuming the C++ extension is present):
- If
_load_rmcore()fails,"auto"degrades to"ml-exact"because no fast decoders are available. - For larger or more complex circuits (
n >= 7orbefore_t >= 24), it chooses"rpa-adv"with parameters derived from effort and SNAP settings. This is the “go heavy” regime: use RPA and SNAP because the search space is large and big improvements are likely. - For medium sized instances (
n >= 6orbefore_t >= 16but not in the previous case), it chooses"dumer-list"with an effort‑derivedlist_size. The idea is that Dumer‑List alone is strong enough here and cheaper than running full RPA. - For very small circuits, it uses plain
"dumer"with no list decoding and no RPA.
If decoder is something else that _choose does not recognize, it falls back to returning (self.decoder, {}), letting decode_rm raise a ValueError if the strategy string is truly invalid. This makes it easy to experiment with new strategies by adding them first to decoders.py and then passing the name through Optimizer.
6.5 Main entrypoint: optimize(circ)
The method Optimizer.optimize(self, circ: Circuit) -> (Circuit, OptimizeReport) is the core of the user‑visible pipeline. It takes a Circuit (as defined in core.py) and returns:
* a new optimized Circuit, and
* an OptimizeReport summarizing what happened.
Let’s walk through it.
-
Extract phase coefficients and T‑count.
n = circ.n a = extract_phase_coeffs(circ) vec = coeffs_to_vec(a, n) before_t = t_count_of_coeffs(vec)extract_phase_coeffstracks how CNOTs redistribute phase forms and collects all phase‑gate exponents (mod 8) into a dict{mask -> k}.coeffs_to_vecthen maps that dict into a fixed‑order \(\mathbb{Z}_8\) vectorvecof length \((2^n-1)\).t_count_of_coeffscounts how many entries invecare odd; that is exactly the T‑count of the circuit. -
Determine RM parameters and build the “word” to decode. The code sets
This is the oddness pattern you would like to correct using an RM decoder.r = n - 4. This is the RM code order used throughout the library: fornqubits, we work in \(\mathrm{RM}(n-4, n)\) as in the reference construction. It also notes the lengthL = len(vec)and builds the punctured binary word: -
Choose a strategy.
_chooseis called with(n, before_t):At this point,strategy, kwargs = self._choose(n, before_t) self.last_decoder_used = strategy self.last_params_used = kwargsstrategyis a string understood bydecode_rm("dumer","rpa-adv","auto-latency"→"rpa-adv", etc.), andkwargscontains the list size, number of RPA iterations, SNAP knobs, and policy settings. -
Decode in \(\mathrm{RM}(r, n)\) (or handle trivial \(r < 0\)). If
r < 0(which only happens for very smalln, where \(\mathrm{RM}(n-4,n)\) is trivial), the code skips decoding:In the nontrivial case,if r < 0: code_bits = [0] * ((1 << n) - 1) selected = [] dist = sum(w_bits) else: code_bits, selected, dist = decode_rm(w_bits, n, r, strategy, **kwargs)decode_rmperforms all the heavy lifting described in §5: Dumer, RPA, OSD, SNAP, GL search, etc. The result is a punctured RM codewordcode_bits, a list of selected monomialsselected(compatible withget_monom_list(n,r)), and the punctured Hamming distancedist. -
Lift the correction and apply it to the coefficients. The selected monomials describe a correction vector
Here,cin the phase‑polynomial domain.lift_to_cconstructs the \(\mathbb{Z}_8\) vector of that correction:lift_to_cadds 1 (mod 8) along every position where a chosen monomial evaluates to 1, andadd_mod8_vecadds that correction pointwise modulo 8.The key invariant is that
By construction,vec_opthas odd coefficients exactly at the positions that correspond to \(w_{bits} \oplus code_{bits}\); the T‑count is therefore:after_tis either<= before_tor (in rare cases) slightly worse only if you asked for a distance‑depth trade‑off that allows it (and even then this is bounded by the policy). -
Re‑synthesize the optimized circuit. The new phase polynomial
This uses the same monomial‑to‑CNOT‑gadget construction described in §3.4. If you want depth‑aware synthesis, you can callvec_optis turned back into a circuit:synthesize_with_scheduleat a higher level with the samevec_opt. -
Run consistency contracts (optional). If you constructed the optimizer with
Otherwise, it callscheck_contracts=True, it will run the strict contract checks:maybe_check_all(...), which only runs the contracts when theRM_CHECKSenvironment variable is set. The contracts verify three things:- all
selected_monomialshave degree \(\le r\) (they lie in \(\mathrm{RM}(r,n)\)), code_bitsagrees with the XOR of generator rows corresponding toselected_monomials, and- the reported
distagrees with the T‑count change implied bycode_bitsand the original coefficients. If any of these fail in strict mode, anAssertionErroris raised. This is extremely useful for testing and for catching subtle decoder bugs.
- all
-
Compute a signature and assemble the report. Finally, the optimized coefficient vector is hashed:
and used to construct anOptimizeReport:rep = OptimizeReport( n=n, before_t=before_t, after_t=after_t, distance=dist, r=r, bitlen=L, selected_monomials=selected, signature=sig, )OptimizeReport.summary()prints a compact one‑line summary of the form: >[rmsynth] n=6, r=2, length=63: T-count 32 -> 18 (distance=14). Signature=abcd1234...The SHA‑256 signature acts as a stable identifier for the optimized phase polynomial: if you run the same optimizer settings and get the same signature, you know you got the same coefficients, even if the internal decoding path used different ties or permutations.
Putting it all together, Optimizer.optimize guarantees:
* the output circuit implements the same unitary as the input,
* its phase polynomial lies in the original affine space but shifted by a valid \(\mathrm{RM}(r,n)\) codeword,
* all RM contracts are satisfied (when enabled), and
* the T‑count and/or T‑depth are improved according to the policy you specified, subject to well‑defined guardrails.