Overview
CircuitSage is a domain-specific AI platform for electrical engineers. From the outside, each tool looks like a single LLM call — a prompt in, structured output back. The reality is a multi-layered stack: retrieval-augmented context, live distributor data, atomic credit accounting, streaming infrastructure, and tool-specific UX. This post documents how those layers fit together and what each one contributes to the product.
The point is not to defend the product against the "AI wrapper" framing. The point is to show what is actually involved in shipping a reliable, billable, domain-aware AI tool — for any engineer planning to build something similar.
1. Retrieval-Augmented Datasheet Analysis
When a user uploads a PDF to the Datasheet AI tool, the document is parsed page-by-page, chunked with overlap to preserve cross-section context, and embedded using OpenAI text-embedding-3-small. Embeddings are written to an Upstash Vector index keyed by document and page number. On subsequent queries against the same component, the top-matching chunks are retrieved by cosine similarity and injected as authoritative context before the generation call.
This shifts the model from recall (which is unreliable for component minutiae) to comprehension. Pin assignments, absolute maximum ratings, and application notes come from the source document, not the model's training distribution. The UI surfaces page-level citations so engineers can verify any claim in seconds.
- ▸1536-dimensional vector index on Upstash Vector with cosine similarity
- ▸Page-level chunking with sliding window to preserve cross-section context
- ▸Similarity threshold of 0.75 to filter low-confidence retrievals
- ▸Non-blocking ingestion path so the corpus grows with every upload
- ▸Citation chips bound to source filename and page number in every result
2. Live Distributor Data via Nexar
Pricing and availability shift constantly. Component costs can move by an order of magnitude between releases due to supply chain conditions, and lifecycle status (active, NRND, obsolete) affects whether a design is even viable. Returning model-estimated prices would be both inaccurate and misleading.
The BOM Generator integrates the Nexar GraphQL API — the official data source behind Octopart — to fetch real prices, stock levels, lifecycle status, and RoHS compliance per part. Lookups run in parallel across all rows in a generated BOM with a bounded timeout so a slow upstream call never blocks the user.
- ▸OAuth2 client credentials flow with access token cached in Redis (TTL minus 60s safety buffer)
- ▸GraphQL query for mpn, manufacturer, description, RoHS status, lifecycle, and offers
- ▸Parallel part lookups across all BOM rows with an 8-second per-call timeout
- ▸Lifecycle badges (Active / NRND / Obsolete) and RoHS indicators surfaced per row
- ▸Live pricing shown with a distinct "LIVE" indicator versus the model-estimated fallback
3. Atomic Credit and Billing System
A credit-based usage model is straightforward to describe and difficult to implement correctly. Two concurrent requests must not be able to both pass on the same balance, failed AI calls must not silently consume credits, lifetime users must never be downgraded by a stale subscription event, and replayed webhooks must not double-grant entitlements.
The system uses Redis INCRBY for atomic reservation, an explicit rollback path on overage, an NX SET idempotency key on every webhook delivery, and a variant-id whitelist that prevents accidental plan switches. Each plan tier maps to a distinct key schema chosen for its reset semantics.
- ▸Plan tiers: Free (150/mo), Student (500/mo), Pro (1,000/mo) on a monthly rolling window
- ▸Lifetime tiers tracked by a daily UTC-keyed counter with a 25h TTL for automatic reset
- ▸Atomic slot reservation for Lifetime purchases via INCR with a threshold check before checkout
- ▸Emergency credit packs that never expire and stack across purchases
- ▸Webhook idempotency keys (NX SET, 7-day TTL) to neutralize re-deliveries from the billing provider
- ▸Subscription-expired guard that refuses to downgrade users on the lifetime plan
- ▸Refund eligibility computed from subscription start time plus usage count
4. Domain-Specific Tooling
Each of the nine tools has a dedicated system prompt, structured output schema, and result display tuned for its task. The Component Finder enforces that returned part numbers must be real, in-catalog SKUs with known distributors. The PLC Logic generator distinguishes between Siemens TIA Portal and Allen-Bradley ControlLogix dialects. The Fault Detector requests measurement checkpoints before suggesting component replacement. The BOM Generator returns structured tables suitable for CSV export rather than free-form prose.
- ▸Nine purpose-built tools, each with a domain-specific system prompt
- ▸Schema-validated JSON output with parser-level error recovery
- ▸Server-sent events streaming for paid users with progress and partial-result events
- ▸Tool-specific UI: BOM tables, pinout chips, citation badges, syntax-highlighted code blocks
- ▸Exports: BOM CSV, KiCad netlist, Altium CSV, Eagle XML, plus code copy and PDF download
5. Storage, Caching, and Infrastructure
Successful tool calls are persisted to Supabase PostgreSQL — indexed by user, tool, and timestamp — so engineers can revisit and build on prior work. Text-only queries are cached in Upstash Redis with per-tool TTLs, so repeated lookups against the same input return immediately and at zero AI cost. PDFs are stored on Vercel Blob behind signed URLs. Authentication and middleware are handled by Clerk, with a route-aware protection matcher and an automatic Pro entitlement for founder emails.
- ▸Supabase Postgres history indexed by userId, tool, and createdAt
- ▸Upstash Redis cache with namespaced keys and per-tool TTLs
- ▸Vercel Blob for PDF storage with signed URL access only
- ▸Clerk middleware with a route-matcher whitelist for public surfaces
- ▸Vercel functions with maxDuration set to 300s for long-running generation tasks
What Replication Actually Involves
For anyone evaluating the effort to build a comparable system, the work is roughly distributed across the following surfaces:
- ▸Account provisioning across OpenAI, Anthropic, Nexar, Upstash (Redis + Vector), Supabase, Clerk, Lemon Squeezy, and Vercel Blob
- ▸A chunked PDF ingestion pipeline with embedding storage and retrieval logic
- ▸OAuth2 token caching for the Nexar API with TTL management and refresh on expiry
- ▸Atomic Redis-backed credit accounting with rollback and correct key schemas per plan tier
- ▸Nine independent tool UIs with streaming transport, structured JSON parsing, and domain-aware result components
- ▸Webhook event handling with idempotency, variant whitelisting, and downgrade guards
- ▸Plan-aware rate limiting and authentication middleware across all generation routes
- ▸Environment variable management across local, preview, and production environments
- ▸Edge-case test coverage for concurrent spend, expired tokens, failed generations, and malformed inputs
💡 Tip: The model call is one component of the system. Most of the engineering effort lives in the pipeline around it — retrieval, distributor integration, billing correctness, streaming, and domain-specific UX.
Why the Architecture Matters
Each layer compounds. The RAG corpus improves with every upload, so retrieval quality increases monotonically. The Nexar integration produces verifiable BOM pricing — a property a generic AI wrapper cannot claim. The atomic billing system makes the product economically sustainable, which in turn funds further development of the layers above it.
A demo can be built quickly. A production system that engineers will trust with real designs requires every layer above to behave correctly, together, under load. That is the engineering surface CircuitSage is built on.