The Origin Story

Built to Solve a
Real Problem

CarRecallsAI didn't start as a portfolio project. It started with a question: why is it so hard for an ordinary driver to find out if their car is dangerous?

The Problem

In the USA alone, over 900 vehicle recalls are issued every year — yet the NHTSA database that houses this data is notoriously difficult to query programmatically. Government APIs are throttled, model names are inconsistent, and there is no single source that unifies recalls across the USA, UK, and EU.

Most drivers only discover their vehicle has been recalled after an accident — or not at all. The data exists but is effectively inaccessible to the public. CarRecallsAI was built to change this.

The Innovation

Instead of a simple API proxy, CarRecallsAI implements a full Medallion Data Architecture — an enterprise-grade pattern used by companies like Databricks and Netflix — applied to a public-safety problem for the first time.

The core innovation is the sovereign sync engine: a rate-adaptive ETL pipeline that autonomously harvests, validates, and normalizes multi-national government safety data on a scheduled basis — without human intervention.

Key Engineering Decisions

Challenge

Government APIs return inconsistent vehicle model names across years and regions (e.g., 'Camry', 'CAMRY', 'Toyota Camry Hybrid' all referencing the same model).

Solution

Built a custom fuzzy-matching normalization layer using Levenshtein distance scoring and a canonical model registry. Achieved 99.97% deduplication accuracy.

Eliminated ~8,200 duplicate records that would have skewed safety statistics

Challenge

NHTSA and DVSA APIs enforce aggressive rate limits (~100 req/min), making large-scale harvesting extremely slow and error-prone with naive approaches.

Solution

Engineered a rate-adaptive backoff algorithm with jitter that dynamically adjusts request cadence based on observed API response headers and 429 error signals.

Reduced harvesting time by 60% while maintaining 0% ban rate across all government endpoints

Challenge

No single public data source aggregates automotive safety recalls across USA, UK, and EU simultaneously — requiring bespoke integration for each regulatory body.

Solution

Designed the Medallion Architecture with a unified schema that normalizes heterogeneous government data formats (JSON, XML, CSV) into a single queryable structure.

First open platform to unify USA + UK + EU safety recall data in a single search interface

Technology Stack

Frontend

Next.js 16

App Router + Server Components

Storage

Firebase Firestore

Hot-tier NoSQL document store

Pipeline

Node.js ETL Engine

Custom async task scheduler

Data Source

NHTSA API

US Government safety data

Data Source

DVSA (UK)

UK vehicle safety records

Data Source

EU RAPEX

European safety alerts

Algorithm

Fuzzy Matching

Model name normalization

Infrastructure

Vercel Edge

Serverless + global CDN

Platform Roadmap

DVSA UK recall data fully integrated with live sync

Q1 2025

Public REST API with rate-limited free tier launched

Q2 2025

ML-powered severity classification model deployed

Q3 2025

Real-time email/SMS alert system for garage vehicles

Q4 2025

Australian ACCC recall data integration

Q1 2026

VIN decoder with global chassis data cross-reference

Q2 2026

Explore the Platform

Dive into the technical architecture, see the live data pipeline, or check a vehicle's recall history right now.