CAD Dataset Infrastructure — Carbon & the Annotation Tool¶

Home / Engineering / System Design / CAD Dataset Infrastructure / Carbon & the Annotation Tool

Engineering Log Entry — March 2026

High-level overview of Carbon and how the feature annotation tool fits within it — bridging the dataset infrastructure to a working annotation interface through Supabase.

The dataset infrastructure design describes data flowing through an annotation stage: parts enter as 'unannotated', receive ML pre-labels, and move through human correction in Carbon before being snapshotted for training. This page explains what Carbon is, how the annotation tool lives inside it, and how the two systems connect through Supabase.

The feature annotation tool is currently a standalone app (annotation_tool_web) — a React + FastAPI application with a Three.js 3D viewer, pythonocc STEP processing, and integration with three ML detectors (AAGNet, BrepMFR, BRepFormer). It supports 37 feature types, GD&T annotations, instance grouping, and export to multiple formats. It has no database (sessions are file-backed JSON), no authentication, and no multi-user support. Migrating it into Carbon gives it persistent storage, auth, multi-user annotation queues, and a direct line into the same Supabase instance that backs the dataset catalog.

This page covers Carbon's architecture, where the annotation tool fits, and how it reads and writes to Supabase. It does not cover the feature detection backend, ML model deployment, or the dataset CLI — those are described in sibling pages.

Integration Roadmap¶

The migration into Carbon is happening in two phases. Issue #110 tracks the initial integration — getting the annotation tool running as a Carbon app with its own Supabase schema. The dataset infrastructure integration described in the rest of this page follows as a second step.

Phase A: Standalone Carbon App (Issue #110)¶

The initial integration ports the annotation tool into apps/annotation-tool as a self-contained app. Key technical decisions for this phase:

STEP processing moves from pythonocc (Python) to OpenCascade.js running server-side in a Trigger.dev task. The STEP file is uploaded to S3, tessellated into a pre-computed mesh JSON, and stored back to S3. The browser loads the mesh — no WASM or STEP parsing on the client.
Storage uses S3 (Carbon's existing file storage) rather than R2.
Schema is self-contained: an annotationSession table (metadata, S3 path, geometry) and an annotationDocument table (full annotation JSON per session). This mirrors the current tool's session model, now backed by Supabase instead of local JSON files.
Auth is internal only — employees behind requireAuthSession, no signup, no annotator roles.
Single-user — collaborative annotation and real-time queue updates are deferred.

This phase delivers a working annotation tool inside Carbon with persistence, auth, and no Docker/FastAPI dependency. Engineers can upload parts, annotate features, and export results.

Phase B: Dataset Infrastructure Integration¶

Once the standalone app is working, the annotation tool connects to the dataset infrastructure described in this design:

Schema convergence — annotationSession and annotationDocument evolve to read from and write to dataset_parts and dataset_annotations. Annotation data maps into the JSONB annotation_data column with the model_assisted flag distinguishing human labels from ML pre-labels. The session concept may persist as a UI convenience but the source of truth becomes the dataset catalog.
Storage convergence — parts sourced from R2 (content-addressed, pushed via mds push) rather than uploaded ad hoc to S3. The presigned URL pattern replaces direct S3 access.
Multi-user annotation queue — offshore annotators get scoped Supabase Auth accounts with the annotator role. RLS policies restrict visibility to assigned or unclaimed parts. The claim-next-part Edge Function handles part assignment.
Pre-label display — ML-generated annotations written by the batch pre-labeling pipeline appear as initial labels when the annotator opens a part, using the same confidence-based rendering the tool already supports.
Snapshot integration — completed annotations feed into mds snapshot create, freezing the dataset state for ML training runs.

The gap between the two phases is primarily a data layer migration — the frontend annotation experience (3D viewer, feature taxonomy, GD&T, undo/redo) remains the same throughout.

What is Carbon?¶

Carbon is Anvil's manufacturing platform — an ERP, MES, and quality system built as a TypeScript monorepo. It is deployed on Vercel and backed entirely by Supabase (Postgres, Auth, Realtime, Edge Functions, Storage).

Tech stack: React 18, React Router 7 (SSR), Supabase, Tailwind CSS, Radix UI, Zustand, react-three-fiber, Trigger.dev (background jobs).

The monorepo is organized into apps (deployable applications) and packages (shared libraries):

graph TB
    subgraph Apps["apps/"]
        ERP["erp<br/><small>ERP & quoting</small>"]
        MES["mes<br/><small>Shop floor execution</small>"]
        DFM["dfm-analyzer<br/><small>3D model viewer</small>"]
        AT["annotation-tool<br/><small>Feature annotation</small>"]
    end

    subgraph Packages["packages/"]
        AUTH["@carbon/auth<br/><small>Supabase client, sessions,<br/>permissions</small>"]
        DB["@carbon/database<br/><small>Generated types,<br/>migrations</small>"]
        UI["@carbon/react<br/><small>450+ UI components</small>"]
        JOBS["@carbon/jobs<br/><small>Trigger.dev tasks</small>"]
    end

    subgraph Infra["Infrastructure"]
        SB["Supabase<br/><small>Postgres · Auth · Realtime<br/>Edge Functions · Storage</small>"]
    end

    ERP --> AUTH
    ERP --> DB
    ERP --> UI
    MES --> AUTH
    MES --> DB
    MES --> UI
    DFM --> AUTH
    DFM --> DB
    DFM --> UI
    AT --> AUTH
    AT --> DB
    AT --> UI

    AUTH --> SB
    DB --> SB
    JOBS --> SB

    style AT fill:#e8f5e9,stroke:#43a047

Every app imports the same typed Supabase client from @carbon/auth, queries the same Postgres instance, and shares a common set of UI components from @carbon/react. Adding a new app to the monorepo means adding a directory under apps/ and importing these packages — routing, auth, and database access come for free.

Key point for this design: The dataset catalog tables (dataset_parts, dataset_annotations, etc.) live in the same Supabase Postgres instance that Carbon already uses. The annotation tool does not need a separate backend — it queries the dataset tables directly through the typed Supabase client, just like any other Carbon app.

How the Annotation Tool Fits In¶

The feature annotation tool lives as a new app under apps/ in the Carbon monorepo — lightweight and focused on the annotator workflow.

The Current Tool¶

annotation_tool_web is a full-featured annotation application built with the same frontend stack as Carbon (React 18, Three.js via react-three-fiber, Zustand, Tailwind). Its backend is FastAPI with pythonocc for STEP file tessellation and geometry extraction.

Core capabilities:

37 feature types across 9 categories — holes (through, blind, threaded, countersink, counterbore), passages, slots, steps, pockets, chamfers, fillets, walls, bosses, contour surfaces, undercuts, and more
3D viewer — face-level mesh rendering with per-feature color overlays, face/edge selection modes, clipping planes, wireframe toggle, preset camera views, and mesh opacity control
GD&T annotations — 19 types across datums, dimensions, and geometric tolerances (flatness, position, profile, etc.), plus surface finish specifications
Instance grouping — features auto-computed as connected components via BFS on the Attribute Adjacency Graph (AAG), with split constraints to prevent unwanted merges
ML detection — integrates three detector containers (AAGNet, BrepMFR, BRepFormer) with confidence thresholds and merge controls
Undo/redo — full command history with deep clone snapshots
Export — JSON, CSV, Markdown, and PDF output formats

What it lacks: persistence is file-backed JSON (sessions in backend/data/sessions/), there is no authentication or user accounts, and it runs as a single-user local tool. These are exactly the gaps that Carbon fills.

What Carbon Adds¶

Current Tool	Carbon
File-backed JSON sessions (local disk)	Supabase Postgres (persistent, queryable, multi-user)
No authentication	Supabase Auth with role-based access (engineer / annotator)
Single user, local STEP files	Multi-user annotation queue with part claiming
Manual STEP file upload	Presigned R2 URLs via Edge Function
Custom UI components	`@carbon/react` component library (450+ components)

What carries over directly — the frontend annotation experience is largely preserved:

Three.js / react-three-fiber 3D viewer with OrbitControls and face selection
Feature visualization — per-feature color overlays, visibility toggles, instance grouping
Zustand state management pattern (the annotation store is 1700+ lines of well-structured state and actions)
The 37-type feature taxonomy and GD&T annotation model
Undo/redo interaction pattern

What changes — the backend and data layer are replaced:

FastAPI + pythonocc backend → STEP processing moves to a server-side service or Edge Function (pythonocc tessellation produces the mesh data the viewer needs; this can run as a background job via Trigger.dev or as a separate service)
File-backed sessions → annotation state persisted in dataset_annotations via the typed Supabase client
Local ML detector containers → feature detection runs as a batch pre-labeling pipeline (see Annotation Pipeline)
Export service → can remain server-side or move to a Supabase Edge Function

Supabase as the Glue¶

Supabase is the integration layer between Carbon and the dataset infrastructure. Every interaction the annotation tool has with the dataset — reading the queue, loading a part, displaying pre-labels, saving corrections — goes through the same Supabase client that the rest of Carbon uses.

sequenceDiagram
    participant A as Annotator (Browser)
    participant C as Carbon App
    participant SB as Supabase
    participant EF as Edge Function
    participant R2 as Cloudflare R2

    A->>C: Sign in
    C->>SB: supabase.auth.signInWithPassword()
    SB-->>C: JWT (role: annotator)

    A->>C: View annotation queue
    C->>SB: SELECT * FROM dataset_parts<br/>WHERE annotation_status = 'pre-labeled'
    SB-->>C: Parts list (RLS-filtered)

    A->>C: Claim part
    C->>EF: POST /functions/v1/claim-next-part
    EF->>SB: UPDATE dataset_parts SET annotator_id, status
    EF-->>C: { part_id }

    A->>C: Load part viewer
    C->>EF: GET /functions/v1/dataset-url?part_id=...
    EF->>SB: Verify annotator owns part
    EF->>R2: Generate presigned URL
    EF-->>C: { url }
    C->>R2: Fetch STEP file
    R2-->>C: STEP binary

    C->>SB: SELECT * FROM dataset_annotations<br/>WHERE part_id = ... AND model_assisted = true
    SB-->>C: Pre-labels

    A->>C: Correct features, submit
    C->>SB: UPSERT dataset_annotations
    C->>SB: UPDATE dataset_parts SET annotation_status = 'in-review'

Typed Client¶

Carbon's Supabase client is created through @carbon/auth and typed against the full database schema generated from @carbon/database:

import { getCarbon } from "@carbon/auth";

// Returns SupabaseClient<Database> — all queries are type-checked
const carbon = getCarbon(accessToken);

The dataset tables (dataset_parts, dataset_annotations, dataset_snapshots, etc.) are added to the same Postgres instance and included in the generated types. No separate client or connection — the annotation tool queries the dataset catalog the same way the ERP queries inventory.

Reading the Annotation Queue¶

The annotation tool's home view is a list of parts ready for annotation. This is a standard Supabase query on dataset_parts, filtered by status:

const { data: queue } = await carbon
  .from("dataset_parts")
  .select("part_id, filename, tags, annotation_status, created_at")
  .eq("annotation_status", "pre-labeled")
  .is("annotator_id", null)
  .order("created_at", { ascending: true });

RLS policies ensure annotators only see parts they're allowed to access — pre-labeled parts that are unassigned, or parts already assigned to them. Engineers see everything. The query code is the same for both roles; the database enforces the scope.

Loading a STEP File¶

The annotation tool never accesses R2 directly. When an annotator opens a part, the app calls a Supabase Edge Function that verifies auth, checks part assignment, and returns a time-limited presigned URL:

const { data } = await carbon.functions.invoke("dataset-url", {
  body: { part_id: partId },
});
// data.url → presigned R2 URL (1-hour expiry)

The STEP file is fetched from R2 via this URL and loaded into the Three.js viewer. The Edge Function and authorization logic are described in detail in the Access & Authentication page.

Displaying Pre-Labels¶

When the viewer loads, the app fetches any existing ML-generated annotations for the part:

const { data: preLabels } = await carbon
  .from("dataset_annotations")
  .select("*")
  .eq("part_id", partId)
  .eq("model_assisted", true);

Each pre-label row contains an annotation_type (e.g. 'feature') and annotation_data (JSONB with feature-specific fields like face IDs, feature type, confidence). The viewer renders these as colored overlays on the corresponding mesh faces — the same visualization pattern used in the current annotation tool, where confidence < 1.0 indicates ML-generated labels and confidence = 1.0 indicates human labels.

Saving Corrections¶

When the annotator corrects or confirms features and submits, the app writes back to dataset_annotations and updates the part's status:

// Write corrected annotations
await carbon.from("dataset_annotations").upsert(
  correctedAnnotations.map((a) => ({
    part_id: partId,
    annotator_id: userId,
    annotation_type: "feature",
    annotation_data: a.data,
    model_assisted: false,
  }))
);

// Advance the part status
await carbon
  .from("dataset_parts")
  .update({ annotation_status: "in-review" })
  .eq("part_id", partId);

RLS policies ensure annotators can only write annotations for parts assigned to them. The model_assisted: false flag distinguishes human corrections from ML pre-labels.

Real-Time Updates¶

Carbon's useRealtimeChannel hook (from @carbon/react) can subscribe to Postgres changes on dataset_parts. This enables live queue updates — when another annotator claims a part, it disappears from your queue without a page refresh:

useRealtimeChannel({
  topic: "postgres_changes:dataset_parts",
  setup(channel) {
    return channel.on(
      "postgres_changes",
      { event: "*", schema: "public", table: "dataset_parts" },
      () => revalidator.revalidate()
    );
  },
});

This is the same pattern used in Carbon's MES app for live production floor updates. It is not required for an initial version but is available when the annotation team scales to multiple concurrent annotators.

Annotator Workflow¶

The end-to-end workflow from the annotator's perspective:

Sign in — email/password or magic link through Carbon's standard login page. Supabase Auth issues a JWT with role: "annotator" in app_metadata.
See the queue — the home view shows parts with status 'pre-labeled', each displaying filename, source, tags (difficulty, machine type, etc.), and creation date. Filterable and sortable.
Claim a part — clicking a part (or "next part") calls the claim-next-part Edge Function, which atomically assigns the part and prevents double-claiming. The part's status moves to 'in-progress'.
Load and view — the STEP file loads into the 3D viewer via a presigned R2 URL. ML pre-labels appear as colored feature overlays on the mesh faces, with confidence scores shown in a sidebar.
Annotate — the annotator works through the part using the same interaction model as the current tool:
- Click faces to assign feature types from the 37-type taxonomy (through hole, blind pocket, chamfer, fillet, etc.)
- Multi-select faces (Shift+click) and apply labels in batch
- Feature instances auto-compute as connected components on the AAG — no manual grouping needed
- Add split constraints where adjacent same-type faces should remain separate instances
- Visibility toggles to isolate individual features or types
- Undo/redo for all annotation actions
- Optionally add GD&T annotations (datums, tolerances, surface finish) if the workflow requires it
Submit — corrections are saved to dataset_annotations. The part's status advances to 'in-review' (if the annotator is unsure and wants engineer review) or 'approved' (if confident).
Continue — the annotator returns to the queue and claims the next part. Completed parts disappear from the queue view.