Skip to content

CAD Dataset Infrastructure — Access & Authentication

Home / Engineering / System Design / CAD Dataset Infrastructure / Access & Authentication

Engineering Log Entry — March 2026

Access tiers, authentication flows, row-level security policies, and credential management for the dataset infrastructure.

The dataset infrastructure serves four classes of users with different trust levels. This page defines who can access what, how they authenticate, and how credentials are managed.

Access Tiers

Tier Actors Auth Method R2 Access DB Access
engineer Internal engineers, ML researchers Supabase Auth via CLI (mds login) Direct boto3 (shared read/write R2 token) Full read/write (RLS)
annotator Offshore annotators Supabase Auth via Carbon (browser) Presigned URLs only (Edge Function) Scoped to assigned parts (RLS)
ci GitHub Actions pipelines Service role key + env vars Direct boto3 (read-only R2 token) Read-only
service Edge Functions, background workers Supabase service role key Direct boto3 (for presigned URL generation) Full (server-side)

Why engineers get direct R2 access: The mds CLI pushes and pulls thousands of STEP files in parallel. Routing every file through an Edge Function would add latency and complexity. Engineers are trusted internal team members — the R2 API key in ~/.mds/config.toml is scoped to the machenit-dataset bucket.

Why annotators do not get direct R2 access: Offshore annotators are a trust boundary. They should only see parts assigned to them, and only through Carbon's UI. The presigned URL mechanism (see Annotation Pipeline) is the mediation layer.

Authentication

CLI — Engineers and ML Researchers

The mds CLI authenticates against Supabase Auth. This gives individual identity for audit trails — the schema already has created_by UUID REFERENCES auth.users(id) on dataset_snapshots.

Auth flow:

mds login
  1. Prompt for email + password
  2. Call supabase.auth.sign_in_with_password(email, password)
  3. Receive access_token (JWT, 1hr expiry) + refresh_token
  4. Store tokens in ~/.mds/auth.json (file permissions 0600)
  5. On subsequent commands, use access_token for Supabase API calls
  6. If access_token expired, use refresh_token to get a new one
  7. If refresh_token expired, prompt for re-login

Auth token file (~/.mds/auth.json, permissions 0600):

{
  "access_token": "eyJ...",
  "refresh_token": "...",
  "expires_at": "2026-03-19T12:00:00Z",
  "user_id": "uuid-here",
  "email": "david@anvil.co"
}

R2 credentials live in ~/.mds/config.toml (see Dataset CLI). The R2 token does not carry individual identity — that comes from the Supabase JWT. The audit trail is: Supabase JWT identifies who, R2 access key enables what (blob upload/download).

Carbon — Annotators

Carbon already uses Supabase Auth — no new mechanism needed. Annotators are created as Supabase Auth users with role set in their app_metadata (see Role Encoding below). They log in through Carbon's existing login flow and interact with STEP files only via presigned URLs.

CI/CD — GitHub Actions

CI pipelines authenticate via environment variables. The mds CLI detects these and skips interactive login:

- name: Pull benchmark parts
  env:
    MDS_SUPABASE_URL: ${{ secrets.MDS_SUPABASE_URL }}
    MDS_SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.MDS_SUPABASE_SERVICE_ROLE_KEY }}
    MDS_R2_ENDPOINT: ${{ secrets.MDS_R2_ENDPOINT }}
    MDS_R2_ACCESS_KEY_ID: ${{ secrets.MDS_R2_ACCESS_KEY_ID }}
    MDS_R2_SECRET_ACCESS_KEY: ${{ secrets.MDS_R2_SECRET_ACCESS_KEY }}
  run: |
    mds pull --snapshot v003 --part-ids-from tests/cam_benchmark/parts.json --output assets/parts/

CI uses a read-only R2 token — it can pull parts but cannot push new files or modify the bucket.

Role Encoding

Roles are stored in Supabase Auth's app_metadata field, set by an admin via the Supabase dashboard or admin API:

{ "role": "engineer" }

Why app_metadata and not user_metadata: user_metadata can be modified by the user themselves via supabase.auth.update() — an annotator could escalate their own role. app_metadata is immutable from the client side and can only be changed through the admin API or dashboard.

A SQL helper function extracts the role from the JWT for use in RLS policies:

CREATE OR REPLACE FUNCTION public.user_role()
RETURNS TEXT AS $$
  SELECT COALESCE(
    (select auth.jwt()) -> 'app_metadata' ->> 'role',
    'anonymous'
  );
$$ LANGUAGE sql STABLE SECURITY DEFINER;

Upgrade path

If the role model becomes more complex (e.g., per-project roles, multiple roles per user), migrate to a user_roles table with a Supabase custom access token hook. The hook injects role data into the JWT at issuance time, keeping RLS policies unchanged. See the Supabase custom claims docs for this pattern.

Authorization — RLS Policies

All dataset_* tables have Row-Level Security enabled. The policies below use (select auth.uid()) and (select user_role()) — wrapping in SELECT tells Postgres to evaluate once per query rather than per row, which is critical for performance on large tables.

dataset_parts

ALTER TABLE dataset_parts ENABLE ROW LEVEL SECURITY;

-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_parts
  FOR ALL
  TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read parts assigned to them, or unassigned pre-labeled parts
CREATE POLICY "annotators_read_parts" ON dataset_parts
  FOR SELECT
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND (
      annotator_id = (select auth.uid())
      OR (annotation_status = 'pre-labeled' AND annotator_id IS NULL)
    )
  );

-- Annotators: update only their assigned parts
CREATE POLICY "annotators_update_parts" ON dataset_parts
  FOR UPDATE
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  )
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  );

dataset_annotations

ALTER TABLE dataset_annotations ENABLE ROW LEVEL SECURITY;

-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_annotations
  FOR ALL
  TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read their own annotations + model pre-labels for their parts
CREATE POLICY "annotators_read_annotations" ON dataset_annotations
  FOR SELECT
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND (
      annotator_id = (select auth.uid())
      OR (model_assisted = true AND part_id IN (
        SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
      ))
    )
  );

-- Annotators: insert annotations only for their assigned parts
CREATE POLICY "annotators_insert_annotations" ON dataset_annotations
  FOR INSERT
  TO authenticated
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
    AND part_id IN (
      SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
    )
  );

-- Annotators: update only their own annotations
CREATE POLICY "annotators_update_annotations" ON dataset_annotations
  FOR UPDATE
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  )
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  );

Versioning Tables (Engineer-Only)

Annotators have no visibility into snapshots, pins, or versioning internals.

ALTER TABLE dataset_snapshots ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_snapshot_parts ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_pins ENABLE ROW LEVEL SECURITY;

CREATE POLICY "engineers_full_access" ON dataset_snapshots
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

CREATE POLICY "engineers_full_access" ON dataset_snapshot_parts
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

CREATE POLICY "engineers_full_access" ON dataset_pins
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

tag_definitions

ALTER TABLE tag_definitions ENABLE ROW LEVEL SECURITY;

CREATE POLICY "engineers_full_access" ON tag_definitions
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read-only (for Carbon UI autocomplete)
CREATE POLICY "annotators_read_tags" ON tag_definitions
  FOR SELECT TO authenticated
  USING ((select user_role()) = 'annotator');

Part Claiming

When an annotator requests the next part to annotate, Carbon calls a server-side Edge Function rather than allowing the annotator to self-assign. This prevents race conditions when multiple annotators work simultaneously and ensures annotators cannot pick arbitrary parts.

// POST /functions/v1/claim-next-part
const { data: { user } } = await supabase.auth.getUser(
  req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (user?.app_metadata?.role !== 'annotator') return new Response('Forbidden', { status: 403 });

const { data: part } = await supabaseAdmin
  .from('dataset_parts')
  .update({ annotator_id: user.id, annotation_status: 'in-progress' })
  .eq('annotation_status', 'pre-labeled')
  .is('annotator_id', null)
  .order('created_at', { ascending: true })
  .limit(1)
  .select()
  .single();

return Response.json({ part_id: part.part_id });

Presigned URL Authorization

The presigned URL Edge Function (see Annotation Pipeline) must verify the caller's identity and authorization before generating a URL:

// GET /functions/v1/dataset-url?part_id=part-0001
const { data: { user } } = await supabase.auth.getUser(
  req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (!user) return new Response('Unauthorized', { status: 401 });

const role = user.app_metadata?.role;

if (role === 'engineer') {
  // Engineers can access any blob
} else if (role === 'annotator') {
  // Verify the part is assigned to this annotator
  const { data: part } = await supabaseAdmin
    .from('dataset_parts')
    .select('blob_hash')
    .eq('part_id', partId)
    .eq('annotator_id', user.id)
    .single();

  if (!part) return new Response('Forbidden', { status: 403 });
} else {
  return new Response('Forbidden', { status: 403 });
}

// Generate presigned URL (1-hour expiry)

R2 Bucket Credentials

R2 does not have per-object access policies. Access is controlled at the API token level:

Token Name Permissions Holders
mds-readwrite Object Read + Write on machenit-dataset Engineers (in ~/.mds/config.toml), Edge Functions (secrets)
mds-readonly Object Read on machenit-dataset CI pipelines (in GitHub Actions secrets)

No R2 credentials are ever exposed to annotators or the browser. Annotators access blobs exclusively via time-limited presigned URLs (1-hour expiry).

Credential Management

Credential Inventory

Credential Where Stored Who Has It Rotation
Supabase anon key Carbon frontend (public) Everyone Low risk — RLS enforces access
Supabase service_role key Edge Function secrets, GitHub Actions secrets Infrastructure only Rotate if compromised
R2 read/write token ~/.mds/config.toml on engineer machines Engineers, Edge Functions On engineer departure
R2 read-only token GitHub Actions secrets CI pipelines Annually or on CI change
Supabase auth tokens ~/.mds/auth.json (auto-refreshed) Individual engineers Auto-refreshed (1hr access token)

Onboarding

  1. Admin creates Supabase Auth account with app_metadata: {"role": "engineer"}
  2. Engineer installs mds CLI and runs mds init (writes template config with Supabase URL and R2 endpoint)
  3. Engineer receives R2 API keys via secure channel (1Password or equivalent) and adds to ~/.mds/config.toml
  4. Engineer runs mds login with their Supabase email/password
  5. Verification: mds query --count returns current part count
  1. Admin creates Supabase Auth account with app_metadata: {"role": "annotator"}
  2. Annotator receives Carbon URL and login credentials
  3. Annotator logs in to Carbon — sees only pre-labeled parts in the annotation queue
  4. No CLI access, no R2 credentials, no direct database access
  1. Add secrets to GitHub Actions: MDS_SUPABASE_URL, MDS_SUPABASE_SERVICE_ROLE_KEY, MDS_R2_ENDPOINT, MDS_R2_ACCESS_KEY_ID, MDS_R2_SECRET_ACCESS_KEY
  2. CI workflow uses env vars — mds CLI skips interactive login when these are present

Offboarding

Engineer departure:

  1. Disable Supabase Auth account (immediate — invalidates JWTs on next refresh)
  2. Rotate the R2 read/write token and distribute the new key to remaining engineers
  3. Update Edge Function secrets with the new R2 token

Annotator departure:

  1. Disable Supabase Auth account
  2. Reassign in-progress parts: UPDATE dataset_parts SET annotator_id = NULL, annotation_status = 'pre-labeled' WHERE annotator_id = '<their-uuid>'
  3. No R2 credential rotation needed — annotators never had R2 credentials

Phasing

Access and authentication work aligns with the overall implementation roadmap:

Phase 1 (Foundation): mds login / mds init commands, environment variable overrides for CI, two R2 API tokens (read/write + read-only), user_role() SQL function, engineer RLS policies on all tables, CI credentials in GitHub Actions.

Phase 2 (Annotation Integration): Annotator RLS policies, claim-next-part Edge Function, presigned URL authorization check, annotator onboarding process.

Phase 3 (Hardening): Audit logging for push/snapshot/annotation events, R2 key rotation automation, SSO/PKCE upgrade for CLI auth, per-engineer R2 credentials, rate limiting on presigned URL Edge Function.