11  Verification & Validation Plan

11.1 Verification & Validation (V&V) Strategy

Per [1, Secs. 2.3.5.9, 2.3.5.11], this plan defines how we confirm the system meets requirements (verification) and stakeholder needs (validation).

The V&V strategy reflects the layered architecture of the system itself. The SysML v2 MCP server is built atop a tree-sitter grammar, wrapped in Rust crates, and exposed via the MCP protocol. Each layer has distinct failure modes and appropriate verification techniques: grammar correctness is best verified by corpus tests against known-good parse trees, Rust crate behavior by unit tests with the standard cargo test framework, and MCP protocol compliance by integration tests that exercise the full JSON-RPC message flow. This layered approach ensures defects are caught at the earliest possible stage — a grammar error surfaces in corpus testing before it can propagate to MCP tool responses.

The dual-CI strategy (GitHub Actions for tree-sitter-sysml, GitLab CI for the capstone ecosystem) reflects the grammar’s dual contribution path: tree-sitter organization conventions require GitHub-hosted CI, while project-level coverage tracking and security scanning use GitLab Ultimate features.

Method Scope Environment
Corpus Testing tree-sitter grammar constructs Local (tree-sitter test)
Coverage Testing Training file parse rate GitLab CI
Unit Testing Rust crates Local (cargo test)
Integration Testing MCP protocol compliance Local (stdio)
Container Testing Image builds, runtime GitLab CI only
HTTP Transport Testing Remote MCP connections GitLab CI (service containers)
Acceptance Testing End-to-end with Claude/VS Code Local (stdio) + manual

System verification (this chapter) establishes that tools are functional and meet their requirements. The benchmark vignettes (Section C.1) then use these verified tools to evaluate AI-MBSE workflow effectiveness for the GVSETS publication — measuring whether MCP-enabled AI outperforms baseline approaches on real SE tasks. The two activities are complementary: verification is a prerequisite for meaningful benchmarking.

11.2 Verification Methods

Per [1, Sec. 2.3.5.9], verification uses IADT methods:

Method Abbreviation Description When Used
Inspection I Visual examination of artifacts Documentation, code review
Analysis A Mathematical/logical evaluation Performance, security assessment
Demonstration D Functional operation shown MCP protocol interaction
Test T Execution with defined inputs Unit tests, integration tests

11.2.1 Verification Method Assignment

Test (T) dominates the verification method assignments below because this is a software-intensive system where most requirements are directly executable. The MCP protocol, repository integration, and SysML parsing requirements all produce observable, deterministic outputs given controlled inputs — making automated testing the most efficient and repeatable verification approach. Inspection (I) is reserved for documentation requirements where pass/fail is assessed by human review, and Demonstration (D) supplements testing for protocol compliance where showing a working client interaction provides additional confidence beyond unit-level assertions.

Requirement Method Rationale
FR-MCP-001 T, D Test server initialization, demonstrate with client
FR-MCP-002, FR-MCP-005 T Test tool enumeration and execution
FR-REPO-001, FR-REPO-002 T Test file read from Git repositories
FR-SYS-001 T Test parsing via tree-sitter corpus tests
FR-SYS-006 T Test grammar subset via training file parse rate
FR-SYS-007 T Test error recovery (tree-sitter ERROR nodes)
FR-SYS-008 I Inspect tree-sitter-sysml README coverage docs
NFR-DEP-001 T, A Test binary builds, analyze size
NFR-DEP-002 T Test container builds in CI
NFR-DOC-001 I Inspect Quarto output for completeness
NoteTailoring Note

The VMA table above covers the 11 highest-risk requirements that are directly verifiable through the current test infrastructure. The remaining 23 system requirements (covering HTTP transport, SysML v2 API integration, container deployment, and security) are deferred to post-Phase 1 verification as their corresponding features are implemented. Per INCOSE Handbook 4.3.4, this tailoring is appropriate for a software-intensive academic project where verification activities are prioritized by implementation phase.

11.3 Acceptance Criteria

Requirement Category Verification Method Acceptance Criteria
MCP Protocol Compliance Integration test Server initializes, lists tools/resources, executes tools
Repository Integration Integration test Read files from GitLab (reference) and self-hosted
SysML v2 Validation System test Validates correct/incorrect SysML syntax
Container Deployment CI pipeline Image builds, runs, responds to MCP requests
Documentation Inspection Quarto renders, deploys to GitLab Pages

11.4 Enabling Systems

Per [1, Sec. 2.3.5.9], enabling systems support verification activities.

Enabling System Purpose Responsibility
tree-sitter CLI Grammar testing (tree-sitter test) Local + CI
Cargo Test Framework Rust unit and integration testing Built into Rust toolchain
GitLab CI/CD Automated pipeline execution GitLab SaaS runners
GitHub Actions tree-sitter grammar CI GitHub runners
Buildah/Podman Container image builds CI environment only
Claude Desktop Manual acceptance testing Local development
MCP Inspector Protocol debugging Local development
Quarto Documentation builds Local + CI

11.4.1 Test Environment Configuration

Environment Transport External Services Use Case
Local Dev stdio Mocked/optional Unit tests, rapid iteration
CI Test stdio Mocked Automated test suite
CI Integration HTTP GitLab API (PAT) Integration tests
CI Container HTTP Service containers End-to-end container tests

11.5 Test Cases

11.5.1 MCP Protocol Tests

ID Test Case Expected Result Method
TC-MCP-001 Send initialize request Server responds with capabilities T
TC-MCP-002 Request tools/list Returns list including sysml_parse T
TC-MCP-003 Call sysml_parse with valid SysML Returns parsed elements T
TC-MCP-004 Request resources/list Returns example resources T
TC-MCP-005 Read sysml://examples/hello Returns vehicle model content T

11.5.2 Repository Integration Tests

ID Test Case Expected Result Method
TC-REPO-001 Read file from public repo Returns file content T
TC-REPO-002 Read file with PAT auth Returns file content T
TC-REPO-003 List .sysml files in directory Returns file list T
TC-REPO-004 Read from self-hosted Git provider Returns file content T
TC-REPO-005 Handle non-existent file Returns appropriate error T

11.5.3 SysML Parsing Tests

11.5.3.1 tree-sitter Corpus Tests

ID Test Case Expected Result Method
TC-SYS-001 Parse package declaration Correct CST structure T (corpus)
TC-SYS-002 Parse part definition Correct CST structure T (corpus)
TC-SYS-003 Parse requirement definition Correct CST structure T (corpus)
TC-SYS-004 Parse nested elements Correct CST hierarchy T (corpus)
TC-SYS-005 Parse with syntax errors ERROR node in CST, partial parse T (corpus)

11.5.3.2 Training File Coverage Tests

ID Test Case Expected Result Method
TC-COV-001 Parse Module 01 files Clean parse (no ERROR nodes) T (CI)
TC-COV-002 Parse Module 02 files Clean parse (no ERROR nodes) T (CI)
TC-COV-003 Calculate overall parse rate 100% achieved (target was ≥10% Phase 1, ≥50% Phase 2) A (CI)

11.6 Known Limitations

  1. Container testing: Cannot be performed locally on macOS; relies on CI (risk R5, accepted)
  2. HTTP transport: Requires CI service containers or Linux machine
  3. SysML v2 API: Requires running API server; deferred to post-capstone. Basic parsing and repository operations work without API dependency
  4. Grammar coverage: The tree-sitter-sysml grammar achieves 99.6% coverage across 275 external files (274/275) and 100% coverage of OMG training files (100/100). The single unparseable file uses non-standard UML syntax outside the SysML v2 specification. Full semantic compliance (type checking, import resolution) is out of scope for the tree-sitter grammar and deferred to future work (sysml.rs)

11.7 Continuous Integration/Continuous Delivery (CI/CD) Verification Pipeline

Per [1, Sec. 2.3.5.9], automated verification integrates into CI/CD.

11.7.1 Pipeline Stages

stages:
  - lint
  - test
  - build
  - integration
  - publish

11.7.2 tree-sitter-sysml Test Stage

test:
  stage: test
  image: node:20-alpine
  script:
    - npm ci
    - npx tree-sitter test
    - npx tree-sitter parse --quiet training/**/*.sysml 2>&1 | tee parse-results.txt
  artifacts:
    reports:
      metrics: coverage-metrics.txt

11.7.3 open-mcp-sysml Test Stage

test:
  stage: test
  image: rust:1.85
  script:
    - cargo test --workspace
    - cargo clippy --workspace -- -D warnings
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

11.7.4 Integration Test Stage

integration:
  stage: integration
  image: rust:1.85
  variables:
    GITLAB_TOKEN: $CI_JOB_TOKEN
  script:
    - cargo test --workspace --features integration
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

11.7.5 Container Test Stage

container-test:
  stage: test
  image: quay.io/buildah/stable
  services:
    - name: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
      alias: mcp-server
  script:
    - echo '{"jsonrpc":"2.0","id":1,"method":"initialize"...}' | nc mcp-server 8080
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

11.8 Validation Approach

Per [1, Sec. 2.3.5.11], validation confirms the system meets stakeholder needs.

11.8.1 Validation Activities

Activity Stakeholder Need Method Acceptance
End-to-end demo AI tool integration Demonstration Claude reads SysML from GitLab
User acceptance Developer experience Interview Positive feedback from pilot users
Paper submission Academic validation Peer review GVSETS acceptance
Capstone review Educational objectives Review Advisor approval

11.8.2 Validation Schedule

Milestone Date Validation Activity Status
SRR Feb 14, 2026 Requirements validated with stakeholders Complete
PDR Feb 14, 2026 Architecture validated against requirements (and by implementation) Complete
CDR Mar 29, 2026 Implementation validated, acceptance tests pass Pending
Final Apr 25, 2026 Stakeholder acceptance, capstone submission Pending

11.9 Review Verification

Review Verification Activities
SRR Requirements complete, traceable to stakeholders
PDR Architecture addresses all requirements
CDR All tests pass, acceptance criteria met