Skip to content

AutoCAM Testing Suite — Requirements

Home / Engineering / System Design / AutoCAM Testing Suite / Requirements

Engineering Log Entry — March 2026

Functional and non-functional requirements for the AutoCAM testing and benchmarking suite. Related to the AutoCAM Pipeline component of the system design.

Purpose

Define a continuous, automated testing and benchmarking system for the AutoCAM toolpath generation pipeline. The system measures CAM output quality and performance across a curated corpus of test parts, tracks metrics over time, and provides tools for deep investigation of individual results.

Goals

  • Observability — Quantify how well the AutoCAM pipeline performs across a diverse set of parts and track that performance as the codebase evolves.
  • Regression detection — Surface when a code change degrades toolpath quality or performance relative to previous runs on main.
  • Developer workflow integration — Run automatically on PRs, on merge to main, and on-demand from a developer's machine.
  • Deep investigation — Provide a way to drill into individual test results with detailed metrics, toolpath visualization, and stock removal replay.
  • Historical tracking — Plot metrics over time for the main branch to show progress as the AutoCAM system matures.

Non-Goals

  • Strict pass/fail gates that block merges (monitoring-first approach).
  • Full machine simulation or kinematic validation (covered by the Virtual CNC Simulation workstream).
  • Production job scheduling or quoting.
  • Replacing commercial verification tools like Vericut (may be adopted later as a complementary layer).

Functional Requirements

FR-1: Test Part Corpus

ID Requirement
FR-1.1 The system maintains a curated corpus of STEP files with associated metadata.
FR-1.2 Each test part has a manifest entry specifying: file path, assigned strategy, display name, category (e.g., pockets, drilling, multi-setup, freeform), difficulty rating (1–5), and any parameter overrides (stepdown, stepover, stock allowance).
FR-1.3 Initial corpus is ~20 parts. The architecture supports scaling to 100+ parts without redesign.
FR-1.4 Test parts and the manifest are version-controlled alongside the CAM source code.
FR-1.5 Each part may specify expected behavior tags (e.g., should_have_no_gouges, known_collision) for future alerting, but these do not block runs initially.

FR-2: Test Execution

ID Requirement
FR-2.1 The test harness invokes the machenit CLI with --json for each test part using the strategy and parameters from the manifest.
FR-2.2 Three trigger modes: (a) GitHub PR — runs on every PR targeting main, (b) merge to main — runs on every push to main, (c) manual/dry-run — developer triggers from their machine or via GitHub Actions workflow_dispatch.
FR-2.3 Execution happens on a self-hosted GitHub Actions runner (Windows machine with ModuleWorks and Analysis Situs licenses).
FR-2.4 Each run produces a structured results bundle containing: per-part JSON output, aggregate summary, run metadata (commit SHA, branch, timestamp, trigger type).
FR-2.5 The harness captures wall-clock time per part (generation + evaluation) and for the full suite.
FR-2.6 Failures (crashes, timeouts, invalid output) are captured as error entries rather than aborting the entire suite.

FR-3: Metrics Captured

For each test part, the following metrics are captured from the CLI JSON output:

Metric Source Description
generation_time_ms results[].generation_time_ms Time to generate the plan
evaluation_time_ms results[].evaluation_time_ms Time to run stock simulation
total_moves results[].total_moves Total toolpath moves
collision_count evaluation.collisions Collisions (shaft, arbor, holder)
gouge_count evaluation.gouges Undercut count
max_gouge_mm evaluation.max_gouge Worst undercut depth
excess_count evaluation.excesses Excess material count
max_excess_mm evaluation.max_excess Worst excess amount
total_toolpath_length_mm (to be added) Sum of move distances — proxy for cycle time
tool_count results[].tools Number of distinct tools used
setup_count results[].setups Number of clamping setups

FR-4: Results Storage

ID Requirement
FR-4.1 Run results are stored in a persistent, queryable format (not just CI logs).
FR-4.2 Each result is keyed by (commit SHA, part ID, strategy) and tagged with branch name and trigger type.
FR-4.3 Historical results for main are retained indefinitely for trend analysis.
FR-4.4 PR run results are retained for at least 90 days.
FR-4.5 Large artifacts (PLY exports, detailed collision reports) are stored separately from metric summaries.

FR-5: Dashboard & Visualization

ID Requirement
FR-5.1 A web-based dashboard shows metric trends for main over time (x-axis: commits or date, y-axis: metric value).
FR-5.2 Dashboard supports filtering by: part, category, difficulty, strategy, and metric.
FR-5.3 Each data point links back to the specific test run and commit.
FR-5.4 PR runs show a comparison view: current branch metrics vs. the main baseline.
FR-5.5 Aggregate views: corpus-wide summaries (total gouges across all parts, average generation time, etc.).

FR-6: Deep Dive Viewer

ID Requirement
FR-6.1 From the dashboard, a user can select a specific (run, part) pair and navigate to a deep dive page.
FR-6.2 The deep dive page shows detailed metrics: all collision/gouge/excess reports with positions, per-operation breakdowns, per-tool statistics.
FR-6.3 The deep dive page includes a 3D toolpath viewer that renders toolpath moves over the part geometry.
FR-6.4 The viewer supports animated replay of the toolpath (play/pause, speed control, scrub to specific operations).
FR-6.5 The viewer includes interactive stock removal simulation: starting from raw stock, material is removed as the toolpath replays, with color-coded deviations (gouge vs. excess).
FR-6.6 The deep dive page supports comparison between two runs of the same part (e.g., before/after a code change).

Non-Functional Requirements

NFR-1: Performance

ID Requirement
NFR-1.1 A full 20-part suite completes in under 30 minutes on the self-hosted runner.
NFR-1.2 The dashboard loads and renders trend charts in under 3 seconds.
NFR-1.3 The deep dive 3D viewer loads part data and initializes in under 10 seconds for typical parts.

NFR-2: Reliability

ID Requirement
NFR-2.1 A single part failure does not abort the full suite run.
NFR-2.2 The self-hosted runner automatically recovers after machine restarts (runs as a Windows service).
NFR-2.3 Results storage survives runner restarts and is backed up or stored externally.

NFR-3: Maintainability

ID Requirement
NFR-3.1 Adding a new test part requires only adding the STEP file and a manifest entry.
NFR-3.2 Adding a new metric requires a change to the harness parser and a dashboard configuration — no schema migration.
NFR-3.3 The system is operable by a team of 2–5 developers without dedicated DevOps.
NFR-3.4 All infrastructure is defined as code (GitHub Actions workflows, deployment configs).

NFR-4: Portability

ID Requirement
NFR-4.1 The test harness script runs on both Windows (primary) and Linux (future).
NFR-4.2 The architecture supports migrating from cloud/self-hosted GitHub Actions to on-premises compute without rewriting the harness or storage layer.
NFR-4.3 Results storage is decoupled from the CI system — results can be ingested from any runner environment.

NFR-5: Cost

ID Requirement
NFR-5.1 Self-hosted runner: existing Windows machine (no incremental compute cost).
NFR-5.2 Dashboard hosting: < $20/month (static hosting or minimal server).
NFR-5.3 Results storage: < $10/month at 100-part corpus scale with full history.

Test Part Corpus Specification

Initial Corpus (20 parts)

The initial corpus covers the core feature categories the AutoCAM pipeline must handle:

Category Count Description
Simple pockets 3–4 Rectangular and contoured pockets, varying depth
Holes & drilling 2–3 Through-holes, blind holes, tapped holes
Multi-feature 4–5 Parts combining pockets, holes, bosses
Freeform surfaces 2–3 Curved surfaces requiring 3D finishing strategies
Multi-setup 2–3 Parts requiring features on multiple faces
Prismatic 2–3 Walls, slots, step features
Stress tests 1–2 Large parts, very fine features, deep cavities

Metadata Schema

{
  "id": "pocket-simple-01",
  "name": "Simple Rectangular Pocket",
  "file": "parts/simple_pocket.stp",
  "category": "pockets",
  "difficulty": 1,
  "strategy": "simple-planar",
  "description": "Single rectangular pocket, 20mm deep, sharp corners",
  "parameters": {
    "stepdown": 2.0,
    "stepover": 0.5
  },
  "tags": ["should_have_no_gouges"]
}

Growth Plan

Phase Corpus Size Focus
MVP ~20 parts Core coverage across all categories
V1 ~100 parts Edge cases, real customer parts, strategy-specific tests
V2+ 100+ Expanded as new strategies and features are developed

User Stories

  1. As a developer, I open a PR and within 30 minutes see a summary of how my changes affect toolpath quality across all test parts, so I can catch regressions before merging.

  2. As a developer, I run a dry-run command locally to test my changes against the full corpus before pushing, so I can iterate faster.

  3. As a technical lead, I view the dashboard and see trend lines for gouge counts and generation times on main over the past month, so I can assess whether the AutoCAM pipeline is improving.

  4. As a developer, I notice a gouge regression on a specific part in my PR, click through to the deep dive, and replay the toolpath to see exactly where the gouge occurs, so I can diagnose the root cause.

  5. As a technical lead, I compare two runs of the same part side-by-side to evaluate whether a strategy change improved cutting efficiency without introducing quality issues.

  6. As a new team member, I add a new test part by dropping a STEP file in the corpus directory and adding a line to the manifest, without needing to understand the test infrastructure.