Project Cascade Defect

A cost-effective ML cascade for real-time rolled-metal surface defect detection

Author

j-jayes

Published

May 13, 2026

Overview

Project Cascade Defect is an applied ML portfolio project that treats one question as the whole point: if a multimodal LLM can already answer “is this metal sheet defective?”, how do you build a system around it that makes the answer cheap enough to run on every camera frame?

The answer is a three-layer cascade. A tiny convolutional autoencoder discards anything that looks like clean metal. A small object detector (YOLOv8n) handles the easy positives. Only ambiguous frames escalate to an Azure OpenAI vision model. Most frames stop in milliseconds; only the genuinely hard ones cost cents.

flowchart LR
    INPUT([Raw frame])

    subgraph L1["Layer 1 — Gatekeeper"]
        AE["Conv Autoencoder<br/>MSE reconstruction"]
        T1{MSE > τ?}
    end

    subgraph L2["Layer 2 — Specialist"]
        YOLO["YOLOv8n<br/>4-class detector"]
        T2{conf ≥ 0.85?}
    end

    subgraph L3["Layer 3 — Oracle"]
        GPT["Azure OpenAI<br/>vision + few-shot"]
    end

    DISCARD([No defect])
    LOG([Log + label])

    INPUT --> AE --> T1
    T1 -- No --> DISCARD
    T1 -- Yes --> YOLO --> T2
    T2 -- Yes --> LOG
    T2 -- No --> GPT --> LOG

    style L1 fill:#dbeafe,stroke:#3b82f6
    style L2 fill:#dcfce7,stroke:#22c55e
    style L3 fill:#fef9c3,stroke:#eab308

Figure 1: High-level cascade pipeline

The data pivot

Earlier iterations of this project trained on whatever surface-defect dataset was easiest to obtain — first NEU (a balanced 6-class classification benchmark with no defect-free frames), and briefly an electronics dataset (VisA / PCB1) that had the right shape of data but the wrong domain. Both made the autoencoder’s gatekeeper role contrived: there was nothing for it to learn as “normal”.

The current build uses two datasets that match the actual goal — flag damage on rolled metal surfaces — and that ship with abundant defect-free imagery:

Dataset	Domain	Total	Normal	Defective
KSDD2 (Kolektor Surface-Defect Dataset 2)	Commutator metal surface	3,335	~2,979 (89%)	~356
Severstal Steel Defect Detection	Flat-rolled steel sheet	~12,500	~5,900 (47%)	~6,600

Both contribute defect-free frames to the autoencoder’s training pool; Severstal additionally supplies the bounding-box training signal for YOLO. KSDD2 defects are deliberately held out as an out-of-distribution test for the detector. See Data Strategy for the full split design.

Why a cascade?

Approach	Cost / frame	Latency	Accuracy on classified frames
Pure MLLM (every frame)	~$0.001	~3 s	★★★★★
Pure YOLO (every frame)	~$0.000001	~15 ms	★★★★☆
Cascade (this project)	~$0.00001 avg	~25 ms median	★★★★★

The cascade is the only architecture that buys you the MLLM’s accuracy without its bill — and the only one whose economics improve as more of the camera feed turns out to be defect-free.

Try it live

A public demo of the cascade router is deployed on Azure Container Apps with scale-to-zero — the layers spin up on the first request (≈30–60 s cold start) and shut down again after a few minutes of idle. Three example images are pre-loaded covering each cascade outcome (cheap-stop at L1, specialist-stop at L2, oracle escalation at L3).

The router URL is printed by make demo-deploy and exposed via make apps-show. The demo is rate-limited (10 req/min/IP, 200/day) and guarded by a $5/day Azure OpenAI cap.

Quick links

Architecture → — system design and Azure topology
Data Strategy → — KSDD2 + Severstal combination plan
Evaluation → — measured cost / latency / accuracy
Inferences → — frame-by-frame walkthroughs