AutoCAM Testing Suite — Requirements¶

Home / Engineering / System Design / AutoCAM Testing Suite / Requirements

Engineering Log Entry — March 2026

Functional and non-functional requirements for the AutoCAM testing and benchmarking suite. Related to the AutoCAM Pipeline component of the system design.

Purpose¶

Define a continuous, automated testing and benchmarking system for the AutoCAM toolpath generation pipeline. The system measures CAM output quality and performance across a curated corpus of test parts, tracks metrics over time, and provides tools for deep investigation of individual results.

Goals¶

Observability — Quantify how well the AutoCAM pipeline performs across a diverse set of parts and track that performance as the codebase evolves.
Regression detection — Surface when a code change degrades toolpath quality or performance relative to previous runs on main.
Developer workflow integration — Run automatically on PRs, on merge to main, and on-demand from a developer's machine.
Deep investigation — Provide a way to drill into individual test results with detailed metrics, toolpath visualization, and stock removal replay.
Historical tracking — Plot metrics over time for the main branch to show progress as the AutoCAM system matures.

Non-Goals¶

Strict pass/fail gates that block merges (monitoring-first approach).
Full machine simulation or kinematic validation (covered by the Virtual CNC Simulation workstream).
Production job scheduling or quoting.
Replacing commercial verification tools like Vericut (may be adopted later as a complementary layer).

Functional Requirements¶

FR-1: Test Part Corpus¶

ID	Requirement
FR-1.1	The system maintains a curated corpus of STEP files with associated metadata.
FR-1.2	Each test part has a manifest entry specifying: file path, assigned strategy, display name, category (e.g., pockets, drilling, multi-setup, freeform), difficulty rating (1–5), and any parameter overrides (stepdown, stepover, stock allowance).
FR-1.3	Initial corpus is ~20 parts. The architecture supports scaling to 100+ parts without redesign.
FR-1.4	Test parts and the manifest are version-controlled alongside the CAM source code.
FR-1.5	Each part may specify expected behavior tags (e.g., `should_have_no_gouges`, `known_collision`) for future alerting, but these do not block runs initially.

FR-2: Test Execution¶

ID	Requirement
FR-2.1	The test harness invokes the `machenit` CLI with `--json` for each test part using the strategy and parameters from the manifest.
FR-2.2	Three trigger modes: (a) GitHub PR — runs on every PR targeting `main`, (b) merge to `main` — runs on every push to `main`, (c) manual/dry-run — developer triggers from their machine or via GitHub Actions `workflow_dispatch`.
FR-2.3	Execution happens on a self-hosted GitHub Actions runner (Windows machine with ModuleWorks and Analysis Situs licenses).
FR-2.4	Each run produces a structured results bundle containing: per-part JSON output, aggregate summary, run metadata (commit SHA, branch, timestamp, trigger type).
FR-2.5	The harness captures wall-clock time per part (generation + evaluation) and for the full suite.
FR-2.6	Failures (crashes, timeouts, invalid output) are captured as error entries rather than aborting the entire suite.

FR-3: Metrics Captured¶

For each test part, the following metrics are captured from the CLI JSON output:

Metric	Source	Description
`generation_time_ms`	`results[].generation_time_ms`	Time to generate the plan
`evaluation_time_ms`	`results[].evaluation_time_ms`	Time to run stock simulation
`total_moves`	`results[].total_moves`	Total toolpath moves
`collision_count`	`evaluation.collisions`	Collisions (shaft, arbor, holder)
`gouge_count`	`evaluation.gouges`	Undercut count
`max_gouge_mm`	`evaluation.max_gouge`	Worst undercut depth
`excess_count`	`evaluation.excesses`	Excess material count
`max_excess_mm`	`evaluation.max_excess`	Worst excess amount
`total_toolpath_length_mm`	(to be added)	Sum of move distances — proxy for cycle time
`tool_count`	`results[].tools`	Number of distinct tools used
`setup_count`	`results[].setups`	Number of clamping setups

FR-4: Results Storage¶

ID	Requirement
FR-4.1	Run results are stored in a persistent, queryable format (not just CI logs).
FR-4.2	Each result is keyed by (commit SHA, part ID, strategy) and tagged with branch name and trigger type.
FR-4.3	Historical results for `main` are retained indefinitely for trend analysis.
FR-4.4	PR run results are retained for at least 90 days.
FR-4.5	Large artifacts (PLY exports, detailed collision reports) are stored separately from metric summaries.

FR-5: Dashboard & Visualization¶

ID	Requirement
FR-5.1	A web-based dashboard shows metric trends for `main` over time (x-axis: commits or date, y-axis: metric value).
FR-5.2	Dashboard supports filtering by: part, category, difficulty, strategy, and metric.
FR-5.3	Each data point links back to the specific test run and commit.
FR-5.4	PR runs show a comparison view: current branch metrics vs. the `main` baseline.
FR-5.5	Aggregate views: corpus-wide summaries (total gouges across all parts, average generation time, etc.).

FR-6: Deep Dive Viewer¶

ID	Requirement
FR-6.1	From the dashboard, a user can select a specific (run, part) pair and navigate to a deep dive page.
FR-6.2	The deep dive page shows detailed metrics: all collision/gouge/excess reports with positions, per-operation breakdowns, per-tool statistics.
FR-6.3	The deep dive page includes a 3D toolpath viewer that renders toolpath moves over the part geometry.
FR-6.4	The viewer supports animated replay of the toolpath (play/pause, speed control, scrub to specific operations).
FR-6.5	The viewer includes interactive stock removal simulation: starting from raw stock, material is removed as the toolpath replays, with color-coded deviations (gouge vs. excess).
FR-6.6	The deep dive page supports comparison between two runs of the same part (e.g., before/after a code change).

Non-Functional Requirements¶

NFR-1: Performance¶

ID	Requirement
NFR-1.1	A full 20-part suite completes in under 30 minutes on the self-hosted runner.
NFR-1.2	The dashboard loads and renders trend charts in under 3 seconds.
NFR-1.3	The deep dive 3D viewer loads part data and initializes in under 10 seconds for typical parts.

NFR-2: Reliability¶

ID	Requirement
NFR-2.1	A single part failure does not abort the full suite run.
NFR-2.2	The self-hosted runner automatically recovers after machine restarts (runs as a Windows service).
NFR-2.3	Results storage survives runner restarts and is backed up or stored externally.

NFR-3: Maintainability¶

ID	Requirement
NFR-3.1	Adding a new test part requires only adding the STEP file and a manifest entry.
NFR-3.2	Adding a new metric requires a change to the harness parser and a dashboard configuration — no schema migration.
NFR-3.3	The system is operable by a team of 2–5 developers without dedicated DevOps.
NFR-3.4	All infrastructure is defined as code (GitHub Actions workflows, deployment configs).

NFR-4: Portability¶

ID	Requirement
NFR-4.1	The test harness script runs on both Windows (primary) and Linux (future).
NFR-4.2	The architecture supports migrating from cloud/self-hosted GitHub Actions to on-premises compute without rewriting the harness or storage layer.
NFR-4.3	Results storage is decoupled from the CI system — results can be ingested from any runner environment.

NFR-5: Cost¶

ID	Requirement
NFR-5.1	Self-hosted runner: existing Windows machine (no incremental compute cost).
NFR-5.2	Dashboard hosting: < $20/month (static hosting or minimal server).
NFR-5.3	Results storage: < $10/month at 100-part corpus scale with full history.

Test Part Corpus Specification¶

Initial Corpus (20 parts)¶

The initial corpus covers the core feature categories the AutoCAM pipeline must handle:

Category	Count	Description
Simple pockets	3–4	Rectangular and contoured pockets, varying depth
Holes & drilling	2–3	Through-holes, blind holes, tapped holes
Multi-feature	4–5	Parts combining pockets, holes, bosses
Freeform surfaces	2–3	Curved surfaces requiring 3D finishing strategies
Multi-setup	2–3	Parts requiring features on multiple faces
Prismatic	2–3	Walls, slots, step features
Stress tests	1–2	Large parts, very fine features, deep cavities

Metadata Schema¶

{
  "id": "pocket-simple-01",
  "name": "Simple Rectangular Pocket",
  "file": "parts/simple_pocket.stp",
  "category": "pockets",
  "difficulty": 1,
  "strategy": "simple-planar",
  "description": "Single rectangular pocket, 20mm deep, sharp corners",
  "parameters": {
    "stepdown": 2.0,
    "stepover": 0.5
  },
  "tags": ["should_have_no_gouges"]
}

Growth Plan¶

Phase	Corpus Size	Focus
MVP	~20 parts	Core coverage across all categories
V1	~100 parts	Edge cases, real customer parts, strategy-specific tests
V2+	100+	Expanded as new strategies and features are developed

User Stories¶

As a developer, I open a PR and within 30 minutes see a summary of how my changes affect toolpath quality across all test parts, so I can catch regressions before merging.
As a developer, I run a dry-run command locally to test my changes against the full corpus before pushing, so I can iterate faster.
As a technical lead, I view the dashboard and see trend lines for gouge counts and generation times on main over the past month, so I can assess whether the AutoCAM pipeline is improving.
As a developer, I notice a gouge regression on a specific part in my PR, click through to the deep dive, and replay the toolpath to see exactly where the gouge occurs, so I can diagnose the root cause.
As a technical lead, I compare two runs of the same part side-by-side to evaluate whether a strategy change improved cutting efficiency without introducing quality issues.
As a new team member, I add a new test part by dropping a STEP file in the corpus directory and adding a line to the manifest, without needing to understand the test infrastructure.