Project Cascade Defect
  • Home
  • Case Study
  • Architecture
  • Data Strategy
  • Inferences
  • Evaluation
  • Slides

On this page

  • Overview
  • The data pivot
  • Why a cascade?
  • Try it live
  • Quick links

Project Cascade Defect

A cost-effective ML cascade for real-time rolled-metal surface defect detection

Author

j-jayes

Published

May 13, 2026

Overview

Project Cascade Defect is an applied ML portfolio project that treats one question as the whole point: if a multimodal LLM can already answer “is this metal sheet defective?”, how do you build a system around it that makes the answer cheap enough to run on every camera frame?

The answer is a three-layer cascade. A tiny convolutional autoencoder discards anything that looks like clean metal. A small object detector (YOLOv8n) handles the easy positives. Only ambiguous frames escalate to an Azure OpenAI vision model. Most frames stop in milliseconds; only the genuinely hard ones cost cents.

flowchart LR
    INPUT([Raw frame])

    subgraph L1["Layer 1 — Gatekeeper"]
        AE["Conv Autoencoder<br/>MSE reconstruction"]
        T1{MSE > τ?}
    end

    subgraph L2["Layer 2 — Specialist"]
        YOLO["YOLOv8n<br/>4-class detector"]
        T2{conf ≥ 0.85?}
    end

    subgraph L3["Layer 3 — Oracle"]
        GPT["Azure OpenAI<br/>vision + few-shot"]
    end

    DISCARD([No defect])
    LOG([Log + label])

    INPUT --> AE --> T1
    T1 -- No --> DISCARD
    T1 -- Yes --> YOLO --> T2
    T2 -- Yes --> LOG
    T2 -- No --> GPT --> LOG

    style L1 fill:#dbeafe,stroke:#3b82f6
    style L2 fill:#dcfce7,stroke:#22c55e
    style L3 fill:#fef9c3,stroke:#eab308
Figure 1: High-level cascade pipeline

The data pivot

Earlier iterations of this project trained on whatever surface-defect dataset was easiest to obtain — first NEU (a balanced 6-class classification benchmark with no defect-free frames), and briefly an electronics dataset (VisA / PCB1) that had the right shape of data but the wrong domain. Both made the autoencoder’s gatekeeper role contrived: there was nothing for it to learn as “normal”.

The current build uses two datasets that match the actual goal — flag damage on rolled metal surfaces — and that ship with abundant defect-free imagery:

Dataset Domain Total Normal Defective
KSDD2 (Kolektor Surface-Defect Dataset 2) Commutator metal surface 3,335 ~2,979 (89%) ~356
Severstal Steel Defect Detection Flat-rolled steel sheet ~12,500 ~5,900 (47%) ~6,600

Both contribute defect-free frames to the autoencoder’s training pool; Severstal additionally supplies the bounding-box training signal for YOLO. KSDD2 defects are deliberately held out as an out-of-distribution test for the detector. See Data Strategy for the full split design.

Why a cascade?

Approach Cost / frame Latency Accuracy on classified frames
Pure MLLM (every frame) ~$0.001 ~3 s ★★★★★
Pure YOLO (every frame) ~$0.000001 ~15 ms ★★★★☆
Cascade (this project) ~$0.00001 avg ~25 ms median ★★★★★

The cascade is the only architecture that buys you the MLLM’s accuracy without its bill — and the only one whose economics improve as more of the camera feed turns out to be defect-free.

Try it live

A public demo of the cascade router is deployed on Azure Container Apps with scale-to-zero — the layers spin up on the first request (≈30–60 s cold start) and shut down again after a few minutes of idle. Three example images are pre-loaded covering each cascade outcome (cheap-stop at L1, specialist-stop at L2, oracle escalation at L3).

The router URL is printed by make demo-deploy and exposed via make apps-show. The demo is rate-limited (10 req/min/IP, 200/day) and guarded by a $5/day Azure OpenAI cap.

Quick links

  • Architecture → — system design and Azure topology
  • Data Strategy → — KSDD2 + Severstal combination plan
  • Evaluation → — measured cost / latency / accuracy
  • Inferences → — frame-by-frame walkthroughs

Project Cascade Defect — ML Portfolio