AutoCAM Testing Suite — Design¶
Home / Engineering / System Design / AutoCAM Testing Suite / Design
Engineering Log Entry — March 2026
Architecture, component design, and implementation roadmap for the AutoCAM testing and benchmarking suite. See also the requirements.
Architecture Overview¶
The testing suite has four layers: test harness (runs the CAM CLI, collects results), CI integration (triggers runs), results backend (stores and serves data), and web frontend (dashboard + deep dive viewer).
flowchart TB
subgraph Triggers
PR[Pull Request]
Push[Push to main]
Manual[Manual / Dry Run]
end
subgraph CI["CI Layer (GitHub Actions)"]
GHA[GitHub Actions Workflow]
Runner["Self-Hosted Runner\n(Windows + Licenses)"]
end
subgraph Harness["Test Harness"]
Manifest[Test Manifest\nparts.json]
CLI["machenit CLI\n--json"]
Collector[Result Collector]
end
subgraph Storage["Results Backend"]
DB[(SQLite DB\nMetrics)]
Artifacts["Artifact Store\n(PLY, Reports)"]
API[REST API\nFastAPI]
end
subgraph Frontend["Web Frontend"]
Dashboard[Dashboard\nTrend Charts]
DeepDive[Deep Dive Viewer\nThree.js]
end
PR --> GHA
Push --> GHA
Manual --> GHA
GHA --> Runner
Runner --> Manifest
Manifest --> CLI
CLI --> Collector
Collector --> DB
Collector --> Artifacts
DB --> API
Artifacts --> API
API --> Dashboard
API --> DeepDive
Component Design¶
1. Test Manifest¶
A JSON file versioned in the CAM repository that defines the test corpus.
File: tests/cam_benchmark/parts.json
{
"version": 1,
"defaults": {
"tool_library": "assets/tools/machenit_tooling_library.json",
"stepdown": 2.0,
"stepover": 0.0,
"stock_allowance": 5.0
},
"parts": [
{
"id": "pocket-simple-01",
"name": "Simple Rectangular Pocket",
"file": "assets/parts/simple_pocket.stp",
"category": "pockets",
"difficulty": 1,
"strategy": "simple-planar",
"description": "Single rectangular pocket, 20mm deep",
"parameters": {},
"tags": ["should_have_no_gouges"]
}
]
}
Parameters in a part entry override the defaults. This keeps manifest entries minimal for standard configurations.
2. Test Harness¶
A Python script that orchestrates test execution. Python is chosen for scripting convenience, cross-platform support, and easy JSON handling.
File: tests/cam_benchmark/run_benchmark.py
Responsibilities:
- Parse
parts.jsonmanifest. - For each part: build the CLI command, invoke
machenit --json, capture stdout/stderr, measure wall-clock time. - Parse JSON output and extract metrics.
- Produce a results bundle — a single JSON file containing all results plus run metadata.
- Upload results to the backend (or write to a local file for dry-run mode).
Results bundle schema:
{
"run_id": "20260312-143022-a1b2c3d",
"commit_sha": "a1b2c3d4e5f6",
"branch": "feature/adaptive-roughing",
"trigger": "pull_request",
"timestamp": "2026-03-12T14:30:22Z",
"runner": "windows-cam-01",
"suite_duration_ms": 842000,
"results": [
{
"part_id": "pocket-simple-01",
"status": "success",
"wall_time_ms": 12340,
"cli_output": { },
"metrics": {
"generation_time_ms": 8200,
"evaluation_time_ms": 3100,
"total_moves": 1842,
"total_toolpath_length_mm": 24500.3,
"collision_count": 0,
"gouge_count": 0,
"max_gouge_mm": 0.0,
"excess_count": 12,
"max_excess_mm": 0.42,
"tool_count": 3,
"setup_count": 1
},
"artifacts": ["pocket-simple-01.ply"]
}
]
}
Error handling: If machenit crashes or times out (configurable, default 5 minutes per part), the result entry gets "status": "error" with stderr captured. The harness continues to the next part.
3. CI Integration¶
A GitHub Actions workflow on the CAM repository using a self-hosted Windows runner.
# .github/workflows/cam-benchmark.yml
name: CAM Benchmark Suite
on:
pull_request:
branches: [main]
push:
branches: [main]
workflow_dispatch:
inputs:
parts_filter:
description: 'Comma-separated part IDs (empty = all)'
required: false
jobs:
benchmark:
runs-on: [self-hosted, Windows, cam-licensed]
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- name: Build machenit
run: |
cmake -S . -B build -DMW_DIR=C:/sdk/moduleworks -DASITUS_DIR=C:/sdk/asitus
cmake --build build --config Release
- name: Run benchmark suite
run: |
python tests/cam_benchmark/run_benchmark.py
--manifest tests/cam_benchmark/parts.json
--binary build/Release/machenit.exe
--output results.json
--export-ply
- name: Upload results
run: |
python tests/cam_benchmark/upload_results.py
--results results.json
--artifacts output/
--api-url ${{ secrets.BENCHMARK_API_URL }}
- name: Post PR comment
if: github.event_name == 'pull_request'
run: |
python tests/cam_benchmark/pr_comment.py
--results results.json
--repo ${{ github.repository }}
--pr ${{ github.event.pull_request.number }}
Self-hosted runner setup:
- Install as a Windows service for auto-start on reboot.
- Custom label
cam-licensedensures only this runner picks up benchmark jobs. - SDK paths configured as machine-level environment variables.
- Runner workspace excluded from Windows Defender real-time scanning.
PR comment: After a PR run, a bot comment summarizes key metrics with a comparison table against the latest main baseline and a link to the full dashboard view.
4. Results Backend¶
A lightweight API server that stores results and serves them to the frontend.
Storage: SQLite + File System¶
Why SQLite:
- Zero-ops — single file, no database server to manage.
- Handles the expected data volume easily (100 parts x 10 runs/day x 365 days = ~365K rows/year).
- Portable — can move the file between machines if the runner changes.
- Built-in full-text search and JSON functions.
Schema:
CREATE TABLE runs (
id TEXT PRIMARY KEY, -- run_id
commit_sha TEXT NOT NULL,
branch TEXT NOT NULL,
trigger TEXT NOT NULL, -- pull_request | push | manual
timestamp TEXT NOT NULL, -- ISO 8601
runner TEXT,
suite_duration_ms INTEGER,
metadata TEXT -- JSON blob for extensibility
);
CREATE TABLE results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL REFERENCES runs(id),
part_id TEXT NOT NULL,
status TEXT NOT NULL, -- success | error | timeout
wall_time_ms INTEGER,
-- Flattened metrics for fast queries:
generation_time_ms REAL,
evaluation_time_ms REAL,
total_moves INTEGER,
total_toolpath_length_mm REAL,
collision_count INTEGER,
gouge_count INTEGER,
max_gouge_mm REAL,
excess_count INTEGER,
max_excess_mm REAL,
tool_count INTEGER,
setup_count INTEGER,
-- Full CLI output for deep dive:
cli_output TEXT, -- JSON blob
error_message TEXT
);
CREATE TABLE artifacts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL REFERENCES runs(id),
part_id TEXT NOT NULL,
filename TEXT NOT NULL,
file_path TEXT NOT NULL, -- path in artifact store
size_bytes INTEGER
);
-- Indexes for dashboard queries
CREATE INDEX idx_results_part_branch ON results(part_id, run_id);
CREATE INDEX idx_runs_branch_ts ON runs(branch, timestamp);
CREATE INDEX idx_runs_commit ON runs(commit_sha);
Artifact store: A directory on the server (or S3-compatible bucket) organized as artifacts/{run_id}/{part_id}/. Stores PLY files, detailed collision reports, and any future large outputs.
API: FastAPI (Python)¶
A small FastAPI application serving the frontend. Key endpoints:
| Method | Path | Description |
|---|---|---|
POST |
/api/runs |
Ingest a results bundle |
GET |
/api/runs |
List runs (filterable by branch, trigger, date range) |
GET |
/api/runs/{id} |
Get a specific run with all results |
GET |
/api/parts |
List all known parts with latest metrics |
GET |
/api/parts/{id}/history |
Time-series metrics for a part on a branch |
GET |
/api/metrics/aggregate |
Corpus-wide aggregate metrics over time |
GET |
/api/compare |
Compare two runs side-by-side |
GET |
/api/artifacts/{run_id}/{part_id}/{filename} |
Serve artifact files |
Why FastAPI: Lightweight, auto-generates OpenAPI docs, async support, easy to deploy as a single process. The entire backend is a single Python file + SQLite file — trivial to move between machines.
5. Web Frontend — Dashboard¶
A single-page application for viewing metrics trends and navigating to deep dives.
Tech Stack¶
| Layer | Choice | Rationale |
|---|---|---|
| Framework | React (Vite) | Fast builds, large ecosystem, team familiarity |
| Charts | Recharts or Plotly.js | Time-series charts with zoom, hover, and click-through |
| 3D viewer | Three.js | Industry standard for WebGL, loaders for PLY/STL built in |
| Styling | Tailwind CSS | Utility-first, fast to build with, minimal custom CSS |
| Hosting | Cloudflare Pages | Already used for docs site, free tier sufficient |
Dashboard Views¶
Corpus Overview — Landing page showing:
- Summary cards: total parts, latest run status, aggregate gouge/collision counts.
- Metric trend chart: selectable metric (y-axis) plotted over commits to
main(x-axis). - Part table: sortable/filterable table of all parts with their latest metrics. Each row links to the part's detail view.
Part Detail — Shows a specific part's metrics over time:
- Trend charts for all metrics on
main. - Table of recent runs with metric values.
- Each run row links to the deep dive.
Run Detail — Shows all parts for a specific run:
- Summary statistics.
- Per-part results table (sortable, color-coded by status).
- If a PR run: comparison columns showing deltas vs.
mainbaseline.
PR Comparison — Side-by-side view of a PR run vs. its main baseline:
- Delta table (green = improved, red = regressed, gray = unchanged).
- Filterable to show only regressions.
6. Web Frontend — Deep Dive Viewer¶
The deep dive page is a dedicated 3D visualization and analysis tool for a single (run, part) pair.
Layout¶
+----------------------------------------------------------+
| Part: Simple Pocket | Run: abc123 | Branch: main |
+----------------------------------------------------------+
| +------------------+ +-------------------------------+ |
| | Metrics Panel | | | |
| | - Generation: 8s | | 3D Viewport | |
| | - Gouges: 0 | | (Three.js) | |
| | - Excess: 12 | | | |
| | - Length: 24.5m | | | |
| | - Tools: 3 | | | |
| +------------------+ +-------------------------------+ |
| +------------------+ +-------------------------------+ |
| | Operations List | | Playback Controls | |
| | [x] Facing | | [|<] [<] [>||] [>] [>|] | |
| | [x] Roughing | | Speed: [1x] [2x] [5x] [10x] | |
| | [ ] Finishing | | Progress: =====>------ 62% | |
| | [ ] Drilling | | Move: 1142 / 1842 | |
| +------------------+ +-------------------------------+ |
+----------------------------------------------------------+
3D Viewport Features¶
Phase 1 — Toolpath Visualization:
- Render toolpath moves as colored lines (rapid = dashed gray, cutting = solid, color-coded by operation type).
- Show tool position as a cylinder/sphere at the current animation point.
- Load part geometry from the STEP file (tessellated and exported as STL/GLB during the test run) as a translucent reference surface.
- Orbit, pan, zoom controls (Three.js OrbitControls).
- Click on a gouge/excess marker to see its details (position, deviation, move index).
Phase 2 — Stock Removal Simulation:
Two approaches to evaluate, in order of implementation ease:
During the test run, the harness exports stock state at regular intervals (e.g., every N moves) as lightweight meshes (PLY or GLB). The viewer loads these snapshots and interpolates between them during playback.
Pros: Simple viewer logic, no in-browser simulation needed, leverages ModuleWorks' high-fidelity dexel sim.
Cons: Large artifact storage per run, snapshot granularity limits scrubbing resolution. At ~50 snapshots x ~2MB each = ~100MB per part per run.
Use three-bvh-csg to perform boolean subtractions of swept tool volumes from the stock mesh in the browser. Batch moves into groups (e.g., 50–100 moves per CSG operation) to keep performance tractable.
Pros: No pre-computed data needed, smooth scrubbing, small artifact size.
Cons: Lower fidelity than dexel-based sim, performance limits with complex parts, must generate swept tool volumes in JS.
Implement a simplified dexel grid in a WebGPU compute shader. The tool geometry and moves are uploaded as buffers; the shader subtracts tool volumes from the dexel grid in parallel.
Pros: High fidelity, real-time performance, runs entirely in browser.
Cons: Significant development effort, WebGPU browser support still maturing. Best suited for a later phase when the viewer is proven valuable.
Recommended path: Start with Approach A (pre-computed snapshots) for the MVP because it reuses the existing ModuleWorks simulation and keeps the viewer simple. Transition to Approach C if interactive scrubbing and fidelity become important enough to justify the investment.
Phase 3 — Machine Visualization (Future):
- Render a simplified 5-axis machine model (imported as GLTF) with kinematic chain.
- Replay the toolpath with the machine moving (table rotation, spindle translation).
- This is a significant effort and should only be pursued after the toolpath viewer and stock sim are proven useful.
Data Requirements for Deep Dive¶
The test harness must export the following artifacts for each part to enable the deep dive viewer:
| Artifact | Format | Purpose | Approx. Size |
|---|---|---|---|
| Toolpath moves | JSON | Animate tool position | 1–5 MB |
| Part mesh | GLB | Reference geometry | 0.5–2 MB |
| Initial stock mesh | GLB | Starting material block | < 0.1 MB |
| Stock snapshots | GLB (x50) | Stock removal replay (Phase 2A) | ~100 MB |
| Collision markers | JSON | Highlight collision points | < 0.1 MB |
| Gouge/excess markers | JSON | Highlight deviation points | < 0.1 MB |
Design Alternatives Considered¶
Results Storage¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| SQLite + filesystem | Zero ops, portable, sufficient scale | Single-writer, no built-in replication | Chosen — right-sized for the team and data volume |
| PostgreSQL | Rich queries, concurrent writes | Requires a managed service or self-hosting | Overkill for < 1M rows and 2–5 users |
| JSON files in git | Zero infrastructure, version-controlled | Slow queries, repo bloat, merge conflicts | Too limited for time-series queries |
| InfluxDB / TimescaleDB | Purpose-built for time-series | Additional service to manage, learning curve | Over-engineered for this use case |
Dashboard Hosting¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Cloudflare Pages + Workers | Already used for docs, free tier, edge performance | Workers have execution limits | Chosen — unified with existing infra |
| GitHub Pages + static JSON | Zero cost, simple | No API, limited interactivity | Viable for MVP charts, not for deep dive |
| Self-hosted VM | Full control | Ops burden, cost | Unnecessary at this scale |
| Grafana | Powerful charting, alerting | Heavy for this use case, separate system | Better suited if we already ran it for other monitoring |
Benchmark Tracking Integration¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Custom (this design) | Full control over metrics and deep dive, CAM-specific features | More to build | Chosen — generic tools don't support 3D deep dive or CAM-specific metrics |
| github-action-benchmark | Free, simple, GitHub Pages charts | No deep dive, limited customization, charts only | Good for supplementary PR comments |
| Bencher.dev | Statistical regression detection, nice dashboard | No 3D viewer, SaaS dependency, limited customization | Could supplement but not replace |
3D Viewer¶
| Option | Pros | Cons | Verdict |
|---|---|---|---|
| Three.js (custom) | Full control, large ecosystem, PLY/STL loaders built in | Must build viewer from scratch | Chosen — most flexible for our specific needs |
| Reuse native ImGui viewer | Already built, high fidelity | Not web-accessible, requires licenses on viewer machine | Could supplement for local deep debugging |
| CAMotics (WASM port) | Full stock sim, open source | Major porting effort, 3-axis only | Not viable for 5-axis |
Implementation Roadmap¶
Phase 1: CLI Test Harness (2–3 weeks)¶
Goal: Run the full corpus from the command line and produce a structured JSON report.
- [ ] Define
parts.jsonmanifest with 20 initial parts - [ ] Write
run_benchmark.pyharness script - [ ] Add
total_toolpath_length_mmmetric tomachenitCLI JSON output - [ ] Add
--export-plyintegration to the harness for artifact capture - [ ] Add GLB export of part mesh and stock (via a small export utility or script)
- [ ] Validate harness on the Windows machine with licensed SDKs
- [ ] Document dry-run usage for developers
Deliverable: A developer can run python run_benchmark.py --manifest parts.json --binary machenit.exe and get a results.json file.
Phase 2: CI Integration (1–2 weeks)¶
Goal: Benchmark suite runs automatically on PRs and merges to main.
- [ ] Set up self-hosted GitHub Actions runner on the Windows machine (as service)
- [ ] Write
.github/workflows/cam-benchmark.yml - [ ] Implement
pr_comment.pyto post metric summaries on PRs - [ ] Test with a real PR cycle
Deliverable: Every PR gets an automated comment showing CAM metrics vs. main.
Phase 3: Results Backend (2–3 weeks)¶
Goal: Store results persistently and serve them via API.
- [ ] Implement SQLite schema and ingestion script (
upload_results.py) - [ ] Build FastAPI server with endpoints for runs, parts, history, and comparison
- [ ] Set up artifact storage directory structure
- [ ] Deploy API (Cloudflare Worker, small VM, or on the runner machine itself)
- [ ] Backfill results from initial CI runs
Deliverable: API returns historical metrics for any part on any branch.
Phase 4: Dashboard (2–3 weeks)¶
Goal: Web-based metric trends and run exploration.
- [ ] Scaffold React app (Vite + Tailwind)
- [ ] Build corpus overview page with aggregate charts
- [ ] Build part detail page with per-part trend lines
- [ ] Build run detail page with per-part results table
- [ ] Build PR comparison view with delta highlighting
- [ ] Deploy to Cloudflare Pages
Deliverable: Team can view metric trends in a browser and navigate from aggregate to specific runs.
Phase 5: Deep Dive Viewer — Toolpath (2–3 weeks)¶
Goal: 3D toolpath visualization and replay for individual test results.
- [ ] Build Three.js viewport with OrbitControls
- [ ] Load and render part mesh (GLB via Three.js GLTFLoader)
- [ ] Render toolpath moves as colored lines
- [ ] Implement animated replay with play/pause/speed/scrub
- [ ] Show tool position cylinder moving along the path
- [ ] Overlay gouge/excess markers as colored spheres
- [ ] Wire up navigation from dashboard to deep dive
Deliverable: Click a part in the dashboard, see its toolpath animated over the part geometry.
Phase 6: Deep Dive Viewer — Stock Removal (3–4 weeks)¶
Goal: Interactive stock removal simulation in the browser.
- [ ] Integrate snapshot export into the test harness (GLB at intervals)
- [ ] Load and display stock snapshots in the viewer, synced to playback position
- [ ] Add color-coded deviation overlay (gouge = red, excess = blue, clean = green)
- [ ] Implement smooth interpolation between snapshots
- [ ] Evaluate
three-bvh-csgas an alternative to pre-computed snapshots - [ ] Optimize for parts with large move counts
Deliverable: Replay shows material being removed from stock in sync with the toolpath.
Future Phases¶
- Run-to-run comparison viewer — Side-by-side 3D comparison of two runs.
- Machine visualization — Animated 5-axis machine model with kinematic replay.
- WebGPU dexel simulation — High-fidelity in-browser stock removal.
- Alerting — Slack/email notifications on significant regressions.
- Vericut integration — Run top candidates through Vericut for production-grade validation.
- Expanded corpus management — UI for adding/categorizing/retiring test parts.
Deployment Architecture¶
flowchart LR
subgraph GitHub
Repo[CAM Repo]
GHA[GitHub Actions]
end
subgraph WinRunner["Windows Machine"]
Runner[GH Actions Runner\nWindows Service]
MW[ModuleWorks SDK]
AS[Analysis Situs SDK]
Build["machenit.exe"]
Harness["run_benchmark.py"]
end
subgraph Server["API Server"]
API[FastAPI]
SQLite[(SQLite)]
ArtifactDir["/artifacts"]
end
subgraph CF["Cloudflare"]
Pages[Cloudflare Pages\nDashboard SPA]
end
Repo -->|trigger| GHA
GHA -->|dispatch| Runner
Runner --> Build
Build --> Harness
Harness -->|POST results| API
Harness -->|upload artifacts| ArtifactDir
Pages -->|fetch data| API
API --> SQLite
API --> ArtifactDir
API Server Location
The API server can run on the same Windows machine as the runner initially, or on a small cloud VM. Since the dashboard is a static SPA on Cloudflare Pages, the API just needs to be reachable from the browser. A Cloudflare Tunnel can expose a local server without a public IP.
Key Design Decisions¶
| Decision | Rationale |
|---|---|
| Python for harness | Cross-platform, easy JSON handling, no compilation step. The harness is glue code — simplicity matters more than performance. |
| SQLite over Postgres | Zero ops overhead. At our data volume (< 1M rows), SQLite is more than sufficient and can be backed up by copying a single file. |
| FastAPI over serverless | A single long-running process is simpler to reason about than serverless functions for this workload. Can be moved to serverless later if needed. |
| Pre-computed stock snapshots | Leverages the existing high-fidelity ModuleWorks simulation rather than rebuilding a lower-fidelity version in the browser. Trading storage for simplicity. |
| Separate repo for dashboard | The dashboard and API are decoupled from the CAM code. The CAM repo has the harness and manifest; the dashboard repo has the frontend and API. This keeps CI concerns separate. |
| Self-hosted runner with labels | Licensing constraints require specific hardware. Labels ensure benchmark jobs only run on the licensed machine while other CI (linting, docs) can use GitHub-hosted runners. |
Open Questions¶
-
API hosting — Run on the Windows machine behind a Cloudflare Tunnel, or provision a small Linux VM? The tunnel approach is zero-cost but ties availability to the Windows machine.
-
Artifact retention — How long to keep PLY/GLB artifacts for deep dive? Full history could grow to ~100GB/year at 100 parts. Consider a retention policy (e.g., keep last 30 days of artifacts, metrics forever).
-
Stock snapshot format — GLB is compact and Three.js-native. PLY is what
machenitalready exports. Adding a GLB export path to the harness (via a conversion step) is low effort and worth the viewer simplicity. -
Multi-strategy runs — Should each part run exactly one assigned strategy, or should some parts run multiple strategies for comparison? The manifest supports both; the question is whether to exercise it early.
-
Notification channel — Where should regression alerts go? GitHub PR comments are the MVP. Slack integration is a natural next step for push-to-main regressions.