CAD Dataset Infrastructure — Access & Authentication¶
Home / Engineering / System Design / CAD Dataset Infrastructure / Access & Authentication
Engineering Log Entry — March 2026
Access tiers, authentication flows, row-level security policies, and credential management for the dataset infrastructure.
The dataset infrastructure serves four classes of users with different trust levels. This page defines who can access what, how they authenticate, and how credentials are managed.
Access Tiers¶
| Tier | Actors | Auth Method | R2 Access | DB Access |
|---|---|---|---|---|
| engineer | Internal engineers, ML researchers | Supabase Auth via CLI (mds login) |
Direct boto3 (shared read/write R2 token) | Full read/write (RLS) |
| annotator | Offshore annotators | Supabase Auth via Carbon (browser) | Presigned URLs only (Edge Function) | Scoped to assigned parts (RLS) |
| ci | GitHub Actions pipelines | Service role key + env vars | Direct boto3 (read-only R2 token) | Read-only |
| service | Edge Functions, background workers | Supabase service role key | Direct boto3 (for presigned URL generation) | Full (server-side) |
Why engineers get direct R2 access: The mds CLI pushes and pulls thousands of STEP files in parallel. Routing every file through an Edge Function would add latency and complexity. Engineers are trusted internal team members — the R2 API key in ~/.mds/config.toml is scoped to the machenit-dataset bucket.
Why annotators do not get direct R2 access: Offshore annotators are a trust boundary. They should only see parts assigned to them, and only through Carbon's UI. The presigned URL mechanism (see Annotation Pipeline) is the mediation layer.
Authentication¶
CLI — Engineers and ML Researchers¶
The mds CLI authenticates against Supabase Auth. This gives individual identity for audit trails — the schema already has created_by UUID REFERENCES auth.users(id) on dataset_snapshots.
Auth flow:
mds login
1. Prompt for email + password
2. Call supabase.auth.sign_in_with_password(email, password)
3. Receive access_token (JWT, 1hr expiry) + refresh_token
4. Store tokens in ~/.mds/auth.json (file permissions 0600)
5. On subsequent commands, use access_token for Supabase API calls
6. If access_token expired, use refresh_token to get a new one
7. If refresh_token expired, prompt for re-login
Auth token file (~/.mds/auth.json, permissions 0600):
{
"access_token": "eyJ...",
"refresh_token": "...",
"expires_at": "2026-03-19T12:00:00Z",
"user_id": "uuid-here",
"email": "david@anvil.co"
}
R2 credentials live in ~/.mds/config.toml (see Dataset CLI). The R2 token does not carry individual identity — that comes from the Supabase JWT. The audit trail is: Supabase JWT identifies who, R2 access key enables what (blob upload/download).
Carbon — Annotators¶
Carbon already uses Supabase Auth — no new mechanism needed. Annotators are created as Supabase Auth users with role set in their app_metadata (see Role Encoding below). They log in through Carbon's existing login flow and interact with STEP files only via presigned URLs.
CI/CD — GitHub Actions¶
CI pipelines authenticate via environment variables. The mds CLI detects these and skips interactive login:
- name: Pull benchmark parts
env:
MDS_SUPABASE_URL: ${{ secrets.MDS_SUPABASE_URL }}
MDS_SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.MDS_SUPABASE_SERVICE_ROLE_KEY }}
MDS_R2_ENDPOINT: ${{ secrets.MDS_R2_ENDPOINT }}
MDS_R2_ACCESS_KEY_ID: ${{ secrets.MDS_R2_ACCESS_KEY_ID }}
MDS_R2_SECRET_ACCESS_KEY: ${{ secrets.MDS_R2_SECRET_ACCESS_KEY }}
run: |
mds pull --snapshot v003 --part-ids-from tests/cam_benchmark/parts.json --output assets/parts/
CI uses a read-only R2 token — it can pull parts but cannot push new files or modify the bucket.
Role Encoding¶
Roles are stored in Supabase Auth's app_metadata field, set by an admin via the Supabase dashboard or admin API:
Why app_metadata and not user_metadata: user_metadata can be modified by the user themselves via supabase.auth.update() — an annotator could escalate their own role. app_metadata is immutable from the client side and can only be changed through the admin API or dashboard.
A SQL helper function extracts the role from the JWT for use in RLS policies:
CREATE OR REPLACE FUNCTION public.user_role()
RETURNS TEXT AS $$
SELECT COALESCE(
(select auth.jwt()) -> 'app_metadata' ->> 'role',
'anonymous'
);
$$ LANGUAGE sql STABLE SECURITY DEFINER;
Upgrade path
If the role model becomes more complex (e.g., per-project roles, multiple roles per user), migrate to a user_roles table with a Supabase custom access token hook. The hook injects role data into the JWT at issuance time, keeping RLS policies unchanged. See the Supabase custom claims docs for this pattern.
Authorization — RLS Policies¶
All dataset_* tables have Row-Level Security enabled. The policies below use (select auth.uid()) and (select user_role()) — wrapping in SELECT tells Postgres to evaluate once per query rather than per row, which is critical for performance on large tables.
dataset_parts¶
ALTER TABLE dataset_parts ENABLE ROW LEVEL SECURITY;
-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_parts
FOR ALL
TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
-- Annotators: read parts assigned to them, or unassigned pre-labeled parts
CREATE POLICY "annotators_read_parts" ON dataset_parts
FOR SELECT
TO authenticated
USING (
(select user_role()) = 'annotator'
AND (
annotator_id = (select auth.uid())
OR (annotation_status = 'pre-labeled' AND annotator_id IS NULL)
)
);
-- Annotators: update only their assigned parts
CREATE POLICY "annotators_update_parts" ON dataset_parts
FOR UPDATE
TO authenticated
USING (
(select user_role()) = 'annotator'
AND annotator_id = (select auth.uid())
)
WITH CHECK (
(select user_role()) = 'annotator'
AND annotator_id = (select auth.uid())
);
dataset_annotations¶
ALTER TABLE dataset_annotations ENABLE ROW LEVEL SECURITY;
-- Engineers: full read/write
CREATE POLICY "engineers_full_access" ON dataset_annotations
FOR ALL
TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
-- Annotators: read their own annotations + model pre-labels for their parts
CREATE POLICY "annotators_read_annotations" ON dataset_annotations
FOR SELECT
TO authenticated
USING (
(select user_role()) = 'annotator'
AND (
annotator_id = (select auth.uid())
OR (model_assisted = true AND part_id IN (
SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
))
)
);
-- Annotators: insert annotations only for their assigned parts
CREATE POLICY "annotators_insert_annotations" ON dataset_annotations
FOR INSERT
TO authenticated
WITH CHECK (
(select user_role()) = 'annotator'
AND annotator_id = (select auth.uid())
AND part_id IN (
SELECT part_id FROM dataset_parts WHERE annotator_id = (select auth.uid())
)
);
-- Annotators: update only their own annotations
CREATE POLICY "annotators_update_annotations" ON dataset_annotations
FOR UPDATE
TO authenticated
USING (
(select user_role()) = 'annotator'
AND annotator_id = (select auth.uid())
)
WITH CHECK (
(select user_role()) = 'annotator'
AND annotator_id = (select auth.uid())
);
Versioning Tables (Engineer-Only)¶
Annotators have no visibility into snapshots, pins, or versioning internals.
ALTER TABLE dataset_snapshots ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_snapshot_parts ENABLE ROW LEVEL SECURITY;
ALTER TABLE dataset_pins ENABLE ROW LEVEL SECURITY;
CREATE POLICY "engineers_full_access" ON dataset_snapshots
FOR ALL TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
CREATE POLICY "engineers_full_access" ON dataset_snapshot_parts
FOR ALL TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
CREATE POLICY "engineers_full_access" ON dataset_pins
FOR ALL TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
tag_definitions¶
ALTER TABLE tag_definitions ENABLE ROW LEVEL SECURITY;
CREATE POLICY "engineers_full_access" ON tag_definitions
FOR ALL TO authenticated
USING ((select user_role()) = 'engineer')
WITH CHECK ((select user_role()) = 'engineer');
-- Annotators: read-only (for Carbon UI autocomplete)
CREATE POLICY "annotators_read_tags" ON tag_definitions
FOR SELECT TO authenticated
USING ((select user_role()) = 'annotator');
Part Claiming¶
When an annotator requests the next part to annotate, Carbon calls a server-side Edge Function rather than allowing the annotator to self-assign. This prevents race conditions when multiple annotators work simultaneously and ensures annotators cannot pick arbitrary parts.
// POST /functions/v1/claim-next-part
const { data: { user } } = await supabase.auth.getUser(
req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (user?.app_metadata?.role !== 'annotator') return new Response('Forbidden', { status: 403 });
const { data: part } = await supabaseAdmin
.from('dataset_parts')
.update({ annotator_id: user.id, annotation_status: 'in-progress' })
.eq('annotation_status', 'pre-labeled')
.is('annotator_id', null)
.order('created_at', { ascending: true })
.limit(1)
.select()
.single();
return Response.json({ part_id: part.part_id });
Presigned URL Authorization¶
The presigned URL Edge Function (see Annotation Pipeline) must verify the caller's identity and authorization before generating a URL:
// GET /functions/v1/dataset-url?part_id=part-0001
const { data: { user } } = await supabase.auth.getUser(
req.headers.get('Authorization')?.replace('Bearer ', '')
);
if (!user) return new Response('Unauthorized', { status: 401 });
const role = user.app_metadata?.role;
if (role === 'engineer') {
// Engineers can access any blob
} else if (role === 'annotator') {
// Verify the part is assigned to this annotator
const { data: part } = await supabaseAdmin
.from('dataset_parts')
.select('blob_hash')
.eq('part_id', partId)
.eq('annotator_id', user.id)
.single();
if (!part) return new Response('Forbidden', { status: 403 });
} else {
return new Response('Forbidden', { status: 403 });
}
// Generate presigned URL (1-hour expiry)
R2 Bucket Credentials¶
R2 does not have per-object access policies. Access is controlled at the API token level:
| Token Name | Permissions | Holders |
|---|---|---|
mds-readwrite |
Object Read + Write on machenit-dataset |
Engineers (in ~/.mds/config.toml), Edge Functions (secrets) |
mds-readonly |
Object Read on machenit-dataset |
CI pipelines (in GitHub Actions secrets) |
No R2 credentials are ever exposed to annotators or the browser. Annotators access blobs exclusively via time-limited presigned URLs (1-hour expiry).
Credential Management¶
Credential Inventory¶
| Credential | Where Stored | Who Has It | Rotation |
|---|---|---|---|
Supabase anon key |
Carbon frontend (public) | Everyone | Low risk — RLS enforces access |
Supabase service_role key |
Edge Function secrets, GitHub Actions secrets | Infrastructure only | Rotate if compromised |
| R2 read/write token | ~/.mds/config.toml on engineer machines |
Engineers, Edge Functions | On engineer departure |
| R2 read-only token | GitHub Actions secrets | CI pipelines | Annually or on CI change |
| Supabase auth tokens | ~/.mds/auth.json (auto-refreshed) |
Individual engineers | Auto-refreshed (1hr access token) |
Onboarding¶
- Admin creates Supabase Auth account with
app_metadata: {"role": "engineer"} - Engineer installs
mdsCLI and runsmds init(writes template config with Supabase URL and R2 endpoint) - Engineer receives R2 API keys via secure channel (1Password or equivalent) and adds to
~/.mds/config.toml - Engineer runs
mds loginwith their Supabase email/password - Verification:
mds query --countreturns current part count
- Admin creates Supabase Auth account with
app_metadata: {"role": "annotator"} - Annotator receives Carbon URL and login credentials
- Annotator logs in to Carbon — sees only pre-labeled parts in the annotation queue
- No CLI access, no R2 credentials, no direct database access
- Add secrets to GitHub Actions:
MDS_SUPABASE_URL,MDS_SUPABASE_SERVICE_ROLE_KEY,MDS_R2_ENDPOINT,MDS_R2_ACCESS_KEY_ID,MDS_R2_SECRET_ACCESS_KEY - CI workflow uses env vars —
mdsCLI skips interactive login when these are present
Offboarding¶
Engineer departure:
- Disable Supabase Auth account (immediate — invalidates JWTs on next refresh)
- Rotate the R2 read/write token and distribute the new key to remaining engineers
- Update Edge Function secrets with the new R2 token
Annotator departure:
- Disable Supabase Auth account
- Reassign in-progress parts:
UPDATE dataset_parts SET annotator_id = NULL, annotation_status = 'pre-labeled' WHERE annotator_id = '<their-uuid>' - No R2 credential rotation needed — annotators never had R2 credentials
Phasing¶
Access and authentication work aligns with the overall implementation roadmap:
Phase 1 (Foundation): mds login / mds init commands, environment variable overrides for CI, two R2 API tokens (read/write + read-only), user_role() SQL function, engineer RLS policies on all tables, CI credentials in GitHub Actions.
Phase 2 (Annotation Integration): Annotator RLS policies, claim-next-part Edge Function, presigned URL authorization check, annotator onboarding process.
Phase 3 (Hardening): Audit logging for push/snapshot/annotation events, R2 key rotation automation, SSO/PKCE upgrade for CLI auth, per-engineer R2 credentials, rate limiting on presigned URL Edge Function.