CAD Dataset Infrastructure — Access & Authentication¶

Home / Engineering / System Design / CAD Dataset Infrastructure / Access & Authentication

Engineering Log Entry — March 2026

Access tiers, authentication flows, row-level security policies, and credential management for the dataset infrastructure.

The dataset infrastructure serves four classes of users with different trust levels. This page defines who can access what, how they authenticate, and how credentials are managed.

Access Tiers¶

Tier	Actors	Auth Method	R2 Access	DB Access
engineer	Internal engineers, ML researchers	Supabase Auth via CLI (`mds login`)	Direct boto3 (shared read/write R2 token)	Full read/write (RLS)
annotator	Offshore annotators	Supabase Auth via Carbon (browser)	Presigned URLs only (Edge Function)	Scoped to assigned parts (RLS)
ci	GitHub Actions pipelines	Service role key + env vars	Direct boto3 (read-only R2 token)	Read-only
service	Edge Functions, background workers	Supabase service role key	Direct boto3 (for presigned URL generation)	Full (server-side)

Why engineers get direct R2 access: The mds CLI pushes and pulls thousands of STEP files in parallel. Routing every file through an Edge Function would add latency and complexity. Engineers are trusted internal team members — the R2 API key in ~/.mds/config.toml is scoped to the machenit-dataset bucket.

Why annotators do not get direct R2 access: Offshore annotators are a trust boundary. They should only see parts assigned to them, and only through Carbon's UI. The presigned URL mechanism (see Annotation Pipeline) is the mediation layer.

Authentication¶

CLI — Engineers and ML Researchers¶

The mds CLI authenticates against Supabase Auth. This gives individual identity for audit trails — the schema already has created_by UUID REFERENCES auth.users(id) on dataset_snapshots.

Auth flow:

mds login
  1. Prompt for email + password
  2. Call supabase.auth.sign_in_with_password(email, password)
  3. Receive access_token (JWT, 1hr expiry) + refresh_token
  4. Store tokens in ~/.mds/auth.json (file permissions 0600)
  5. On subsequent commands, use access_token for Supabase API calls
  6. If access_token expired, use refresh_token to get a new one
  7. If refresh_token expired, prompt for re-login

Auth token file (~/.mds/auth.json, permissions 0600):

{
  "access_token": "eyJ...",
  "refresh_token": "...",
  "expires_at": "2026-03-19T12:00:00Z",
  "user_id": "uuid-here",
  "email": "david@anvil.co"
}

R2 credentials live in ~/.mds/config.toml (see Dataset CLI). The R2 token does not carry individual identity — that comes from the Supabase JWT. The audit trail is: Supabase JWT identifies who, R2 access key enables what (blob upload/download).

Carbon — Annotators¶

Carbon already uses Supabase Auth — no new mechanism needed. Annotators are created as Supabase Auth users with role set in their app_metadata (see Role Encoding below). They log in through Carbon's existing login flow and interact with STEP files only via presigned URLs.

CI/CD — GitHub Actions¶

CI pipelines authenticate via environment variables. The mds CLI detects these and skips interactive login:

- name: Pull benchmark parts
  env:
    MDS_SUPABASE_URL: ${{ secrets.MDS_SUPABASE_URL }}
    MDS_SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.MDS_SUPABASE_SERVICE_ROLE_KEY }}
    MDS_R2_ENDPOINT: ${{ secrets.MDS_R2_ENDPOINT }}
    MDS_R2_ACCESS_KEY_ID: ${{ secrets.MDS_R2_ACCESS_KEY_ID }}
    MDS_R2_SECRET_ACCESS_KEY: ${{ secrets.MDS_R2_SECRET_ACCESS_KEY }}
  run: |
    mds pull --snapshot v003 --part-ids-from tests/cam_benchmark/parts.json --output assets/parts/

CI uses a read-only R2 token — it can pull parts but cannot push new files or modify the bucket.

Role Encoding¶

Roles are stored in Supabase Auth's app_metadata field, set by an admin via the Supabase dashboard or admin API:

{ "role": "engineer" }

Why app_metadata and not user_metadata: user_metadata can be modified by the user themselves via supabase.auth.update() — an annotator could escalate their own role. app_metadata is immutable from the client side and can only be changed through the admin API or dashboard.

A SQL helper function extracts the role from the JWT for use in RLS policies:

CREATE OR REPLACE FUNCTION public.user_role()
RETURNS TEXT AS $$
  SELECT COALESCE(
    (select auth.jwt()) -> 'app_metadata' ->> 'role',
    'anonymous'
  );
$$ LANGUAGE sql STABLE SECURITY DEFINER;

Upgrade path

If the role model becomes more complex (e.g., per-project roles, multiple roles per user), migrate to a user_roles table with a Supabase custom access token hook. The hook injects role data into the JWT at issuance time, keeping RLS policies unchanged. See the Supabase custom claims docs for this pattern.

Authorization — RLS Policies¶

All dataset_* tables have Row-Level Security enabled. The policies below use (select auth.uid()) and (select user_role()) — wrapping in SELECT tells Postgres to evaluate once per query rather than per row, which is critical for performance on large tables.

`dataset_parts`¶

ALTER TABLE dataset_parts ENABLE ROW LEVEL SECURITY;

-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_parts
  FOR ALL
  TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read parts assigned to them, or unassigned pre-labeled parts
CREATE POLICY "annotators_read_parts" ON dataset_parts
  FOR SELECT
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND (
      annotator_id = (select auth.uid())
      OR (annotation_status = 'pre-labeled' AND annotator_id IS NULL)
    )
  );

-- Annotators: update only their assigned parts
CREATE POLICY "annotators_update_parts" ON dataset_parts
  FOR UPDATE
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  )
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  );

`dataset_annotations`¶

ALTER TABLE dataset_annotations ENABLE ROW LEVEL SECURITY;

-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_annotations
  FOR ALL
  TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read their own annotations + model pre-labels for their parts
CREATE POLICY "annotators_read_annotations" ON dataset_annotations
  FOR SELECT
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND (
      annotator_id = (select auth.uid())
      OR (model_assisted = true AND part_id IN (
        SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
      ))
    )
  );

-- Annotators: insert annotations only for their assigned parts
CREATE POLICY "annotators_insert_annotations" ON dataset_annotations
  FOR INSERT
  TO authenticated
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
    AND part_id IN (
      SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
    )
  );

-- Annotators: update only their own annotations
CREATE POLICY "annotators_update_annotations" ON dataset_annotations
  FOR UPDATE
  TO authenticated
  USING (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  )
  WITH CHECK (
    (select user_role()) = 'annotator'
    AND annotator_id = (select auth.uid())
  );

Versioning Tables (Engineer-Only)¶

Annotators have no visibility into snapshots, pins, or versioning internals.

ALTER TABLE dataset_snapshots ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_snapshot_parts ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_pins ENABLE ROW LEVEL SECURITY;

CREATE POLICY "engineers_full_access" ON dataset_snapshots
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

CREATE POLICY "engineers_full_access" ON dataset_snapshot_parts
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

CREATE POLICY "engineers_full_access" ON dataset_pins
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

`tag_definitions`¶

ALTER TABLE tag_definitions ENABLE ROW LEVEL SECURITY;

CREATE POLICY "engineers_full_access" ON tag_definitions
  FOR ALL TO authenticated
  USING ((select user_role()) = 'engineer')
  WITH CHECK ((select user_role()) = 'engineer');

-- Annotators: read-only (for Carbon UI autocomplete)
CREATE POLICY "annotators_read_tags" ON tag_definitions
  FOR SELECT TO authenticated
  USING ((select user_role()) = 'annotator');

Part Claiming¶

When an annotator requests the next part to annotate, Carbon calls a server-side Edge Function rather than allowing the annotator to self-assign. This prevents race conditions when multiple annotators work simultaneously and ensures annotators cannot pick arbitrary parts.

// POST /functions/v1/claim-next-part
const { data: { user } } = await supabase.auth.getUser(
  req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (user?.app_metadata?.role !== 'annotator') return new Response('Forbidden', { status: 403 });

const { data: part } = await supabaseAdmin
  .from('dataset_parts')
  .update({ annotator_id: user.id, annotation_status: 'in-progress' })
  .eq('annotation_status', 'pre-labeled')
  .is('annotator_id', null)
  .order('created_at', { ascending: true })
  .limit(1)
  .select()
  .single();

return Response.json({ part_id: part.part_id });

Presigned URL Authorization¶

The presigned URL Edge Function (see Annotation Pipeline) must verify the caller's identity and authorization before generating a URL:

// GET /functions/v1/dataset-url?part_id=part-0001
const { data: { user } } = await supabase.auth.getUser(
  req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (!user) return new Response('Unauthorized', { status: 401 });

const role = user.app_metadata?.role;

if (role === 'engineer') {
  // Engineers can access any blob
} else if (role === 'annotator') {
  // Verify the part is assigned to this annotator
  const { data: part } = await supabaseAdmin
    .from('dataset_parts')
    .select('blob_hash')
    .eq('part_id', partId)
    .eq('annotator_id', user.id)
    .single();

  if (!part) return new Response('Forbidden', { status: 403 });
} else {
  return new Response('Forbidden', { status: 403 });
}

// Generate presigned URL (1-hour expiry)

R2 Bucket Credentials¶

R2 does not have per-object access policies. Access is controlled at the API token level:

Token Name	Permissions	Holders
`mds-readwrite`	Object Read + Write on `machenit-dataset`	Engineers (in `~/.mds/config.toml`), Edge Functions (secrets)
`mds-readonly`	Object Read on `machenit-dataset`	CI pipelines (in GitHub Actions secrets)

No R2 credentials are ever exposed to annotators or the browser. Annotators access blobs exclusively via time-limited presigned URLs (1-hour expiry).

Credential Management¶

Credential Inventory¶

Credential	Where Stored	Who Has It	Rotation
Supabase `anon` key	Carbon frontend (public)	Everyone	Low risk — RLS enforces access
Supabase `service_role` key	Edge Function secrets, GitHub Actions secrets	Infrastructure only	Rotate if compromised
R2 read/write token	`~/.mds/config.toml` on engineer machines	Engineers, Edge Functions	On engineer departure
R2 read-only token	GitHub Actions secrets	CI pipelines	Annually or on CI change
Supabase auth tokens	`~/.mds/auth.json` (auto-refreshed)	Individual engineers	Auto-refreshed (1hr access token)

Onboarding¶

EngineerAnnotatorCI Pipeline

Admin creates Supabase Auth account with app_metadata: {"role": "engineer"}
Engineer installs mds CLI and runs mds init (writes template config with Supabase URL and R2 endpoint)
Engineer receives R2 API keys via secure channel (1Password or equivalent) and adds to ~/.mds/config.toml
Engineer runs mds login with their Supabase email/password
Verification: mds query --count returns current part count

Admin creates Supabase Auth account with app_metadata: {"role": "annotator"}
Annotator receives Carbon URL and login credentials
Annotator logs in to Carbon — sees only pre-labeled parts in the annotation queue
No CLI access, no R2 credentials, no direct database access

Add secrets to GitHub Actions: MDS_SUPABASE_URL, MDS_SUPABASE_SERVICE_ROLE_KEY, MDS_R2_ENDPOINT, MDS_R2_ACCESS_KEY_ID, MDS_R2_SECRET_ACCESS_KEY
CI workflow uses env vars — mds CLI skips interactive login when these are present

Offboarding¶

Engineer departure:

Disable Supabase Auth account (immediate — invalidates JWTs on next refresh)
Rotate the R2 read/write token and distribute the new key to remaining engineers
Update Edge Function secrets with the new R2 token

Annotator departure:

Disable Supabase Auth account
Reassign in-progress parts: UPDATE dataset_parts SET annotator_id = NULL, annotation_status = 'pre-labeled' WHERE annotator_id = '<their-uuid>'
No R2 credential rotation needed — annotators never had R2 credentials

Phasing¶

Access and authentication work aligns with the overall implementation roadmap:

Phase 1 (Foundation): mds login / mds init commands, environment variable overrides for CI, two R2 API tokens (read/write + read-only), user_role() SQL function, engineer RLS policies on all tables, CI credentials in GitHub Actions.

Phase 2 (Annotation Integration): Annotator RLS policies, claim-next-part Edge Function, presigned URL authorization check, annotator onboarding process.

Phase 3 (Hardening): Audit logging for push/snapshot/annotation events, R2 key rotation automation, SSO/PKCE upgrade for CLI auth, per-engineer R2 credentials, rate limiting on presigned URL Edge Function.