Inference walkthroughs

What the cascade actually sees, layer by layer

Published

April 30, 2026

This page walks through three frames from the production cascade — one for each behaviour the architecture is designed to produce. Every panel is generated from the actual end-to-end trace at reports/eval_cascade_metal_k1_calibrated.jsonl by scripts/render_inference_panels_metal.py.

For each example you get three panels:

Input — the raw frame as the camera saw it, resized for display.
PatchCore heatmap — per-patch cosine distance to the nearest neighbours in the domain memory bank, overlaid in inferno colourmap. Bright regions are “looks unlike anything in the normal-distribution training pool”.
YOLO overlay — top-3 detections from the production models/yolo_metal/best.pt (mAP50 0.50, trained 50 ep / 640 px on a T4 in Phase J.4). Red is the highest-confidence box, amber are runners-up.

The trace beneath each example is the literal router output — same code paths the live ACA app exercises, just re-run in-process for cheaper iteration.

L1 fast path — clean steel, gated in <500 ms

Track A, domain severstal, true polarity normal, router stopped at L1 (end-to-end 527 ms).

Severstal in-domain. PatchCore reconstructs the surface below the calibrated z-threshold, the cascade returns no_defect immediately, no L2 / L3 cost. Roughly 95 % of a real factory feed looks like this.

Input frame PatchCore anomaly heatmap (p99 score 0.171) YOLO production detections (n=13)

Router trace

Layer	Decision	Class	Score	Latency
L1	no_defect	-	z=-1.03 (τ=-0.5)	527 ms

L2 production YOLO — defect classified, no Oracle bill

Track A, domain severstal, true polarity defective, router stopped at L2 (end-to-end 430 ms).

Severstal defective frame. PatchCore escalates at L1, the production YOLO (mAP50 0.50, trained 50 ep / 640 px on T4) returns a confident detection ≥ 0.5, the cascade short-circuits at L2 — no AOAI tokens spent. This is what the J.4 GPU retrain buys versus the smoke YOLO.

Input frame PatchCore anomaly heatmap (p99 score 0.218) YOLO production detections (n=29)

Router trace

Layer	Decision	Class	Score	Latency
L1	defect	-	z=-0.04 (τ=-0.5)	403 ms
L2	defect	scratch	conf=0.65, n=3	26 ms

L3 Oracle backstop — KSDD2 OOD defect, AE escalates

Track B, domain ksdd2, true polarity defective, router stopped at L3 (end-to-end 3415 ms).

KSDD2 (commutator surfaces) is a different metal domain — the Severstal-trained YOLO has no class for it, so the router skips L2 by design. PatchCore flags the anomaly, the Oracle classifies it as surface_anomaly with a one-line rationale. This is the architectural payoff: pay the AOAI token cost only on frames where the cheap layers can’t decide.

Input frame PatchCore anomaly heatmap (p99 score 0.402) YOLO production detections (n=67)

Router trace

Layer	Decision	Class	Score	Latency
L1	defect	-	z=16.33 (τ=1.0)	254 ms
L3	scratch	scratch	conf=0.85	3161 ms

Oracle reasoning: The image shows a long, thin, bright linear mark consistent with a scratch defect on the metal surface.

Reading guide

When inspecting the panels:

PatchCore heatmaps on Track A defectives should spotlight the region YOLO subsequently boxes. If the bright region sits anywhere YOLO ignores, the per-domain τ is too tight.
Heatmaps on Track A normals should be mostly cold. Hot patches on clean steel are a sign the calibration was done against the wrong distribution.
Oracle reasoning strings should reference the visible defect type rather than generic “surface irregularity” boilerplate. The few-shot prompt in oracle.py was rewritten in J.2 around the Severstal+KSDD2 taxonomy specifically to ground the model.
YOLO box colours — red is the top detection, amber are runners-up. Track B (KSDD2) frames are deliberately out-of-distribution for the Severstal-trained YOLO; expect noisy class labels that the Oracle then corrects.