AutoCAM Testing Suite — Requirements
Home / Engineering / System Design / AutoCAM Testing Suite / Requirements
Engineering Log Entry — March 2026
Functional and non-functional requirements for the AutoCAM testing and benchmarking suite. Related to the AutoCAM Pipeline component of the system design.
Purpose
Define a continuous, automated testing and benchmarking system for the AutoCAM toolpath generation pipeline. The system measures CAM output quality and performance across a curated corpus of test parts, tracks metrics over time, and provides tools for deep investigation of individual results.
Goals
- Observability — Quantify how well the AutoCAM pipeline performs across a diverse set of parts and track that performance as the codebase evolves.
- Regression detection — Surface when a code change degrades toolpath quality or performance relative to previous runs on
main.
- Developer workflow integration — Run automatically on PRs, on merge to
main, and on-demand from a developer's machine.
- Deep investigation — Provide a way to drill into individual test results with detailed metrics, toolpath visualization, and stock removal replay.
- Historical tracking — Plot metrics over time for the
main branch to show progress as the AutoCAM system matures.
Non-Goals
- Strict pass/fail gates that block merges (monitoring-first approach).
- Full machine simulation or kinematic validation (covered by the Virtual CNC Simulation workstream).
- Production job scheduling or quoting.
- Replacing commercial verification tools like Vericut (may be adopted later as a complementary layer).
Functional Requirements
FR-1: Test Part Corpus
| ID |
Requirement |
| FR-1.1 |
The system maintains a curated corpus of STEP files with associated metadata. |
| FR-1.2 |
Each test part has a manifest entry specifying: file path, assigned strategy, display name, category (e.g., pockets, drilling, multi-setup, freeform), difficulty rating (1–5), and any parameter overrides (stepdown, stepover, stock allowance). |
| FR-1.3 |
Initial corpus is ~20 parts. The architecture supports scaling to 100+ parts without redesign. |
| FR-1.4 |
Test parts and the manifest are version-controlled alongside the CAM source code. |
| FR-1.5 |
Each part may specify expected behavior tags (e.g., should_have_no_gouges, known_collision) for future alerting, but these do not block runs initially. |
FR-2: Test Execution
| ID |
Requirement |
| FR-2.1 |
The test harness invokes the machenit CLI with --json for each test part using the strategy and parameters from the manifest. |
| FR-2.2 |
Three trigger modes: (a) GitHub PR — runs on every PR targeting main, (b) merge to main — runs on every push to main, (c) manual/dry-run — developer triggers from their machine or via GitHub Actions workflow_dispatch. |
| FR-2.3 |
Execution happens on a self-hosted GitHub Actions runner (Windows machine with ModuleWorks and Analysis Situs licenses). |
| FR-2.4 |
Each run produces a structured results bundle containing: per-part JSON output, aggregate summary, run metadata (commit SHA, branch, timestamp, trigger type). |
| FR-2.5 |
The harness captures wall-clock time per part (generation + evaluation) and for the full suite. |
| FR-2.6 |
Failures (crashes, timeouts, invalid output) are captured as error entries rather than aborting the entire suite. |
FR-3: Metrics Captured
For each test part, the following metrics are captured from the CLI JSON output:
| Metric |
Source |
Description |
generation_time_ms |
results[].generation_time_ms |
Time to generate the plan |
evaluation_time_ms |
results[].evaluation_time_ms |
Time to run stock simulation |
total_moves |
results[].total_moves |
Total toolpath moves |
collision_count |
evaluation.collisions |
Collisions (shaft, arbor, holder) |
gouge_count |
evaluation.gouges |
Undercut count |
max_gouge_mm |
evaluation.max_gouge |
Worst undercut depth |
excess_count |
evaluation.excesses |
Excess material count |
max_excess_mm |
evaluation.max_excess |
Worst excess amount |
total_toolpath_length_mm |
(to be added) |
Sum of move distances — proxy for cycle time |
tool_count |
results[].tools |
Number of distinct tools used |
setup_count |
results[].setups |
Number of clamping setups |
FR-4: Results Storage
| ID |
Requirement |
| FR-4.1 |
Run results are stored in a persistent, queryable format (not just CI logs). |
| FR-4.2 |
Each result is keyed by (commit SHA, part ID, strategy) and tagged with branch name and trigger type. |
| FR-4.3 |
Historical results for main are retained indefinitely for trend analysis. |
| FR-4.4 |
PR run results are retained for at least 90 days. |
| FR-4.5 |
Large artifacts (PLY exports, detailed collision reports) are stored separately from metric summaries. |
FR-5: Dashboard & Visualization
| ID |
Requirement |
| FR-5.1 |
A web-based dashboard shows metric trends for main over time (x-axis: commits or date, y-axis: metric value). |
| FR-5.2 |
Dashboard supports filtering by: part, category, difficulty, strategy, and metric. |
| FR-5.3 |
Each data point links back to the specific test run and commit. |
| FR-5.4 |
PR runs show a comparison view: current branch metrics vs. the main baseline. |
| FR-5.5 |
Aggregate views: corpus-wide summaries (total gouges across all parts, average generation time, etc.). |
FR-6: Deep Dive Viewer
| ID |
Requirement |
| FR-6.1 |
From the dashboard, a user can select a specific (run, part) pair and navigate to a deep dive page. |
| FR-6.2 |
The deep dive page shows detailed metrics: all collision/gouge/excess reports with positions, per-operation breakdowns, per-tool statistics. |
| FR-6.3 |
The deep dive page includes a 3D toolpath viewer that renders toolpath moves over the part geometry. |
| FR-6.4 |
The viewer supports animated replay of the toolpath (play/pause, speed control, scrub to specific operations). |
| FR-6.5 |
The viewer includes interactive stock removal simulation: starting from raw stock, material is removed as the toolpath replays, with color-coded deviations (gouge vs. excess). |
| FR-6.6 |
The deep dive page supports comparison between two runs of the same part (e.g., before/after a code change). |
Non-Functional Requirements
| ID |
Requirement |
| NFR-1.1 |
A full 20-part suite completes in under 30 minutes on the self-hosted runner. |
| NFR-1.2 |
The dashboard loads and renders trend charts in under 3 seconds. |
| NFR-1.3 |
The deep dive 3D viewer loads part data and initializes in under 10 seconds for typical parts. |
NFR-2: Reliability
| ID |
Requirement |
| NFR-2.1 |
A single part failure does not abort the full suite run. |
| NFR-2.2 |
The self-hosted runner automatically recovers after machine restarts (runs as a Windows service). |
| NFR-2.3 |
Results storage survives runner restarts and is backed up or stored externally. |
NFR-3: Maintainability
| ID |
Requirement |
| NFR-3.1 |
Adding a new test part requires only adding the STEP file and a manifest entry. |
| NFR-3.2 |
Adding a new metric requires a change to the harness parser and a dashboard configuration — no schema migration. |
| NFR-3.3 |
The system is operable by a team of 2–5 developers without dedicated DevOps. |
| NFR-3.4 |
All infrastructure is defined as code (GitHub Actions workflows, deployment configs). |
NFR-4: Portability
| ID |
Requirement |
| NFR-4.1 |
The test harness script runs on both Windows (primary) and Linux (future). |
| NFR-4.2 |
The architecture supports migrating from cloud/self-hosted GitHub Actions to on-premises compute without rewriting the harness or storage layer. |
| NFR-4.3 |
Results storage is decoupled from the CI system — results can be ingested from any runner environment. |
NFR-5: Cost
| ID |
Requirement |
| NFR-5.1 |
Self-hosted runner: existing Windows machine (no incremental compute cost). |
| NFR-5.2 |
Dashboard hosting: < $20/month (static hosting or minimal server). |
| NFR-5.3 |
Results storage: < $10/month at 100-part corpus scale with full history. |
Test Part Corpus Specification
Initial Corpus (20 parts)
The initial corpus covers the core feature categories the AutoCAM pipeline must handle:
| Category |
Count |
Description |
| Simple pockets |
3–4 |
Rectangular and contoured pockets, varying depth |
| Holes & drilling |
2–3 |
Through-holes, blind holes, tapped holes |
| Multi-feature |
4–5 |
Parts combining pockets, holes, bosses |
| Freeform surfaces |
2–3 |
Curved surfaces requiring 3D finishing strategies |
| Multi-setup |
2–3 |
Parts requiring features on multiple faces |
| Prismatic |
2–3 |
Walls, slots, step features |
| Stress tests |
1–2 |
Large parts, very fine features, deep cavities |
{
"id": "pocket-simple-01",
"name": "Simple Rectangular Pocket",
"file": "parts/simple_pocket.stp",
"category": "pockets",
"difficulty": 1,
"strategy": "simple-planar",
"description": "Single rectangular pocket, 20mm deep, sharp corners",
"parameters": {
"stepdown": 2.0,
"stepover": 0.5
},
"tags": ["should_have_no_gouges"]
}
Growth Plan
| Phase |
Corpus Size |
Focus |
| MVP |
~20 parts |
Core coverage across all categories |
| V1 |
~100 parts |
Edge cases, real customer parts, strategy-specific tests |
| V2+ |
100+ |
Expanded as new strategies and features are developed |
User Stories
-
As a developer, I open a PR and within 30 minutes see a summary of how my changes affect toolpath quality across all test parts, so I can catch regressions before merging.
-
As a developer, I run a dry-run command locally to test my changes against the full corpus before pushing, so I can iterate faster.
-
As a technical lead, I view the dashboard and see trend lines for gouge counts and generation times on main over the past month, so I can assess whether the AutoCAM pipeline is improving.
-
As a developer, I notice a gouge regression on a specific part in my PR, click through to the deep dive, and replay the toolpath to see exactly where the gouge occurs, so I can diagnose the root cause.
-
As a technical lead, I compare two runs of the same part side-by-side to evaluate whether a strategy change improved cutting efficiency without introducing quality issues.
-
As a new team member, I add a new test part by dropping a STEP file in the corpus directory and adding a line to the manifest, without needing to understand the test infrastructure.