flowchart LR
INPUT([🖼️ Raw Frame])
subgraph L1["Layer 1 — Gatekeeper"]
AE["Conv Autoencoder\nMSE Reconstruction"]
T1{"MSE > τ?"}
end
subgraph L2["Layer 2 — Specialist"]
YOLO["YOLOv8n\nInference"]
T2{"Conf ≥ 85%?"}
end
subgraph L3["Layer 3 — Oracle"]
GPT["GPT-4o\nFew-Shot Prompt"]
end
DISCARD(["✅ No Defect"])
LOG(["📋 Log & Archive"])
INPUT --> AE
AE --> T1
T1 -- No --> DISCARD
T1 -- Yes --> YOLO
YOLO --> T2
T2 -- Yes --> LOG
T2 -- No --> GPT
GPT --> LOG
style L1 fill:#dbeafe,stroke:#3b82f6
style L2 fill:#dcfce7,stroke:#22c55e
style L3 fill:#fef9c3,stroke:#eab308
Project Cascade Defect
A Cost-Effective ML Cascade Pipeline for Real-Time Rolled-Metal Surface Defect Detection
Overview
Project Cascade Defect is a proof-of-concept machine learning portfolio project that demonstrates how to optimise a defect-detection system for inference cost, latency, and accuracy simultaneously.
Instead of running every factory camera frame through an expensive Multimodal Large Language Model (MLLM), we route frames through a three-layer Cascade Architecture — where most frames are handled cheaply and quickly, and only ambiguous edge-cases escalate to GPT-4o.
Why a Cascade?
| Approach | Cost per frame | Latency | Accuracy |
|---|---|---|---|
| Pure MLLM (GPT-4o for everything) | ~$0.002 | ~3 s | ★★★★★ |
| Pure Classical (YOLOv8 only) | ~$0.000001 | ~15 ms | ★★★★☆ |
| Cascade (this project) | ~$0.0000015 | ~18 ms avg | ★★★★★ |
The cascade architecture captures the best of both worlds: classical computer vision handles the 95%+ of “easy” cases in milliseconds, while GPT-4o is reserved for the rare edge cases where certainty matters most.
Key Technologies
- Model Training: Azure Machine Learning (AML) on
Standard_NC6s_v3compute - Inference: Azure Container Apps (ACA) with NVIDIA T4 GPU, scale-to-zero via KEDA
- Event Bus: Azure Service Bus (decoupled queue between cascade layers)
- Storage: Azure Data Lake Storage Gen2
- MLLM: Azure OpenAI
gpt-4owith Pydantic-enforced structured JSON output - Package Manager:
uv(fast, reproducible Python dependency management) - Website: Quarto (this site — Mermaid.js diagrams, executable Python cells)
Project Structure
cascade-defect/
├── .agents/skills/ # Copilot domain-knowledge files
├── .claude/plans/ # Project implementation plan
├── .devcontainer/ # VS Code / Codespaces container config
├── .pre-commit-config.yaml # nbstripout + ruff hooks
├── docs/ # This Quarto website
├── src/cascade_defect/ # Python source package
│ ├── data/ # Dataset split utilities
│ ├── layer1_autoencoder/ # Conv AE model, training, FastAPI app
│ ├── layer2_yolo/ # YOLOv8 inference FastAPI app
│ └── layer3_gpt4o/ # GPT-4o oracle FastAPI app
├── tests/ # pytest unit tests
└── pyproject.toml # uv-managed project config
Quick Links
- Architecture → — Detailed system design with Mermaid diagrams
- Data Strategy → — Dataset splits and pseudo-labelling pipeline
- Evaluation → — Latency, cost, and accuracy benchmarks