Project Cascade Defect
  • Home
  • Problem
  • Architecture
  • Data Strategy
  • Inferences
  • Evaluation

On this page

  • Overview
  • Why a Cascade?
  • Key Technologies
  • Project Structure
  • Quick Links

Project Cascade Defect

A Cost-Effective ML Cascade Pipeline for Real-Time Rolled-Metal Surface Defect Detection

Author

j-jayes

Published

April 24, 2026

Overview

Project Cascade Defect is a proof-of-concept machine learning portfolio project that demonstrates how to optimise a defect-detection system for inference cost, latency, and accuracy simultaneously.

Instead of running every factory camera frame through an expensive Multimodal Large Language Model (MLLM), we route frames through a three-layer Cascade Architecture — where most frames are handled cheaply and quickly, and only ambiguous edge-cases escalate to GPT-4o.

flowchart LR
    INPUT([🖼️ Raw Frame])

    subgraph L1["Layer 1 — Gatekeeper"]
        AE["Conv Autoencoder\nMSE Reconstruction"]
        T1{"MSE > τ?"}
    end

    subgraph L2["Layer 2 — Specialist"]
        YOLO["YOLOv8n\nInference"]
        T2{"Conf ≥ 85%?"}
    end

    subgraph L3["Layer 3 — Oracle"]
        GPT["GPT-4o\nFew-Shot Prompt"]
    end

    DISCARD(["✅ No Defect"])
    LOG(["📋 Log & Archive"])

    INPUT --> AE
    AE --> T1
    T1 -- No --> DISCARD
    T1 -- Yes --> YOLO
    YOLO --> T2
    T2 -- Yes --> LOG
    T2 -- No --> GPT
    GPT --> LOG

    style L1 fill:#dbeafe,stroke:#3b82f6
    style L2 fill:#dcfce7,stroke:#22c55e
    style L3 fill:#fef9c3,stroke:#eab308
Figure 1: High-level cascade pipeline overview

Why a Cascade?

Approach Cost per frame Latency Accuracy
Pure MLLM (GPT-4o for everything) ~$0.002 ~3 s ★★★★★
Pure Classical (YOLOv8 only) ~$0.000001 ~15 ms ★★★★☆
Cascade (this project) ~$0.0000015 ~18 ms avg ★★★★★

The cascade architecture captures the best of both worlds: classical computer vision handles the 95%+ of “easy” cases in milliseconds, while GPT-4o is reserved for the rare edge cases where certainty matters most.

Key Technologies

  • Model Training: Azure Machine Learning (AML) on Standard_NC6s_v3 compute
  • Inference: Azure Container Apps (ACA) with NVIDIA T4 GPU, scale-to-zero via KEDA
  • Event Bus: Azure Service Bus (decoupled queue between cascade layers)
  • Storage: Azure Data Lake Storage Gen2
  • MLLM: Azure OpenAI gpt-4o with Pydantic-enforced structured JSON output
  • Package Manager: uv (fast, reproducible Python dependency management)
  • Website: Quarto (this site — Mermaid.js diagrams, executable Python cells)

Project Structure

cascade-defect/
├── .agents/skills/          # Copilot domain-knowledge files
├── .claude/plans/           # Project implementation plan
├── .devcontainer/           # VS Code / Codespaces container config
├── .pre-commit-config.yaml  # nbstripout + ruff hooks
├── docs/                    # This Quarto website
├── src/cascade_defect/      # Python source package
│   ├── data/                # Dataset split utilities
│   ├── layer1_autoencoder/  # Conv AE model, training, FastAPI app
│   ├── layer2_yolo/         # YOLOv8 inference FastAPI app
│   └── layer3_gpt4o/        # GPT-4o oracle FastAPI app
├── tests/                   # pytest unit tests
└── pyproject.toml           # uv-managed project config

Quick Links

  • Architecture → — Detailed system design with Mermaid diagrams
  • Data Strategy → — Dataset splits and pseudo-labelling pipeline
  • Evaluation → — Latency, cost, and accuracy benchmarks

Project Cascade Defect — ML Portfolio