flowchart LR
INPUT([Raw frame])
subgraph L1["Layer 1 — Gatekeeper"]
AE["Conv Autoencoder<br/>MSE reconstruction"]
T1{MSE > τ?}
end
subgraph L2["Layer 2 — Specialist"]
YOLO["YOLOv8n<br/>4-class detector"]
T2{conf ≥ 0.85?}
end
subgraph L3["Layer 3 — Oracle"]
GPT["Azure OpenAI<br/>vision + few-shot"]
end
DISCARD([No defect])
LOG([Log + label])
INPUT --> AE --> T1
T1 -- No --> DISCARD
T1 -- Yes --> YOLO --> T2
T2 -- Yes --> LOG
T2 -- No --> GPT --> LOG
style L1 fill:#dbeafe,stroke:#3b82f6
style L2 fill:#dcfce7,stroke:#22c55e
style L3 fill:#fef9c3,stroke:#eab308
Project Cascade Defect
A cost-effective ML cascade for real-time rolled-metal surface defect detection
Overview
Project Cascade Defect is an applied ML portfolio project that treats one question as the whole point: if a multimodal LLM can already answer “is this metal sheet defective?”, how do you build a system around it that makes the answer cheap enough to run on every camera frame?
The answer is a three-layer cascade. A tiny convolutional autoencoder discards anything that looks like clean metal. A small object detector (YOLOv8n) handles the easy positives. Only ambiguous frames escalate to an Azure OpenAI vision model. Most frames stop in milliseconds; only the genuinely hard ones cost cents.
The data pivot
Earlier iterations of this project trained on whatever surface-defect dataset was easiest to obtain — first NEU (a balanced 6-class classification benchmark with no defect-free frames), and briefly an electronics dataset (VisA / PCB1) that had the right shape of data but the wrong domain. Both made the autoencoder’s gatekeeper role contrived: there was nothing for it to learn as “normal”.
The current build uses two datasets that match the actual goal — flag damage on rolled metal surfaces — and that ship with abundant defect-free imagery:
| Dataset | Domain | Total | Normal | Defective |
|---|---|---|---|---|
| KSDD2 (Kolektor Surface-Defect Dataset 2) | Commutator metal surface | 3,335 | ~2,979 (89%) | ~356 |
| Severstal Steel Defect Detection | Flat-rolled steel sheet | ~12,500 | ~5,900 (47%) | ~6,600 |
Both contribute defect-free frames to the autoencoder’s training pool; Severstal additionally supplies the bounding-box training signal for YOLO. KSDD2 defects are deliberately held out as an out-of-distribution test for the detector. See Data Strategy for the full split design.
Why a cascade?
| Approach | Cost / frame | Latency | Accuracy on classified frames |
|---|---|---|---|
| Pure MLLM (every frame) | ~$0.001 | ~3 s | ★★★★★ |
| Pure YOLO (every frame) | ~$0.000001 | ~15 ms | ★★★★☆ |
| Cascade (this project) | ~$0.00001 avg | ~25 ms median | ★★★★★ |
The cascade is the only architecture that buys you the MLLM’s accuracy without its bill — and the only one whose economics improve as more of the camera feed turns out to be defect-free.
Try it live
A public demo of the cascade router is deployed on Azure Container Apps with scale-to-zero — the layers spin up on the first request (≈30–60 s cold start) and shut down again after a few minutes of idle. Three example images are pre-loaded covering each cascade outcome (cheap-stop at L1, specialist-stop at L2, oracle escalation at L3).
The router URL is printed by make demo-deploy and exposed via make apps-show. The demo is rate-limited (10 req/min/IP, 200/day) and guarded by a $5/day Azure OpenAI cap.
Quick links
- Architecture → — system design and Azure topology
- Data Strategy → — KSDD2 + Severstal combination plan
- Evaluation → — measured cost / latency / accuracy
- Inferences → — frame-by-frame walkthroughs