11 Verification & Validation Plan

11.1 Verification & Validation (V&V) Strategy

Per [1, Secs. 2.3.5.9, 2.3.5.11], this plan defines how we confirm the system meets requirements (verification) and stakeholder needs (validation).

The V&V strategy reflects the layered architecture of the system itself. The SysML v2 MCP server is built atop a tree-sitter grammar, wrapped in Rust crates, and exposed via the MCP protocol. Each layer has distinct failure modes and appropriate verification techniques: grammar correctness is best verified by corpus tests against known-good parse trees, Rust crate behavior by unit tests with the standard cargo test framework, and MCP protocol compliance by integration tests that exercise the full JSON-RPC message flow. This layered approach ensures defects are caught at the earliest possible stage — a grammar error surfaces in corpus testing before it can propagate to MCP tool responses.

The dual-CI strategy (GitHub Actions for tree-sitter-sysml, GitLab CI for the capstone ecosystem) reflects the grammar’s dual contribution path: tree-sitter organization conventions require GitHub-hosted CI, while project-level coverage tracking and security scanning use GitLab Ultimate features.

Method	Scope	Environment
Corpus Testing	tree-sitter grammar constructs	Local (tree-sitter test)
Coverage Testing	Training file parse rate	GitLab CI
Unit Testing	Rust crates	Local (cargo test)
Integration Testing	MCP protocol compliance	Local (stdio)
Container Testing	Image builds, runtime	GitLab CI only
HTTP Transport Testing	Remote MCP connections	GitLab CI (service containers)
Acceptance Testing	End-to-end with Claude/VS Code	Local (stdio) + manual

System verification (this chapter) establishes that tools are functional and meet their requirements. The benchmark vignettes (Section C.1) then use these verified tools to evaluate AI-MBSE workflow effectiveness for the GVSETS publication — measuring whether MCP-enabled AI outperforms baseline approaches on real SE tasks. The two activities are complementary: verification is a prerequisite for meaningful benchmarking.

11.2 Verification Methods

Per [1, Sec. 2.3.5.9], verification uses IADT methods:

Method	Abbreviation	Description	When Used
Inspection	I	Visual examination of artifacts	Documentation, code review
Analysis	A	Mathematical/logical evaluation	Performance, security assessment
Demonstration	D	Functional operation shown	MCP protocol interaction
Test	T	Execution with defined inputs	Unit tests, integration tests

11.2.1 Verification Method Assignment

Test (T) dominates the verification method assignments below because this is a software-intensive system where most requirements are directly executable. The MCP protocol, repository integration, and SysML parsing requirements all produce observable, deterministic outputs given controlled inputs — making automated testing the most efficient and repeatable verification approach. Inspection (I) is reserved for documentation requirements where pass/fail is assessed by human review, and Demonstration (D) supplements testing for protocol compliance where showing a working client interaction provides additional confidence beyond unit-level assertions.

Requirement	Method	Rationale
FR-MCP-001	T, D	Test server initialization, demonstrate with client
FR-MCP-002, FR-MCP-005	T	Test tool enumeration and execution
FR-REPO-001, FR-REPO-002	T	Test file read from Git repositories
FR-SYS-001	T	Test parsing via tree-sitter corpus tests
FR-SYS-006	T	Test grammar subset via training file parse rate
FR-SYS-007	T	Test error recovery (tree-sitter ERROR nodes)
FR-SYS-008	I	Inspect tree-sitter-sysml README coverage docs
NFR-DEP-001	T, A	Test binary builds, analyze size
NFR-DEP-002	T	Test container builds in CI
NFR-DOC-001	I	Inspect Quarto output for completeness

Tailoring Note

The VMA table above covers the 11 highest-risk requirements that are directly verifiable through the current test infrastructure. The remaining 23 system requirements (covering HTTP transport, SysML v2 API integration, container deployment, and security) are deferred to post-Phase 1 verification as their corresponding features are implemented. Per INCOSE Handbook 4.3.4, this tailoring is appropriate for a software-intensive academic project where verification activities are prioritized by implementation phase.

11.3 Acceptance Criteria

Requirement Category	Verification Method	Acceptance Criteria
MCP Protocol Compliance	Integration test	Server initializes, lists tools/resources, executes tools
Repository Integration	Integration test	Read files from GitLab (reference) and self-hosted
SysML v2 Validation	System test	Validates correct/incorrect SysML syntax
Container Deployment	CI pipeline	Image builds, runs, responds to MCP requests
Documentation	Inspection	Quarto renders, deploys to GitLab Pages

11.4 Enabling Systems

Per [1, Sec. 2.3.5.9], enabling systems support verification activities.

Enabling System	Purpose	Responsibility
tree-sitter CLI	Grammar testing (`tree-sitter test`)	Local + CI
Cargo Test Framework	Rust unit and integration testing	Built into Rust toolchain
GitLab CI/CD	Automated pipeline execution	GitLab SaaS runners
GitHub Actions	tree-sitter grammar CI	GitHub runners
Buildah/Podman	Container image builds	CI environment only
Claude Desktop	Manual acceptance testing	Local development
MCP Inspector	Protocol debugging	Local development
Quarto	Documentation builds	Local + CI

11.4.1 Test Environment Configuration

Environment	Transport	External Services	Use Case
Local Dev	stdio	Mocked/optional	Unit tests, rapid iteration
CI Test	stdio	Mocked	Automated test suite
CI Integration	HTTP	GitLab API (PAT)	Integration tests
CI Container	HTTP	Service containers	End-to-end container tests

11.5 Test Cases

11.5.1 MCP Protocol Tests

ID	Test Case	Expected Result	Method
TC-MCP-001	Send initialize request	Server responds with capabilities	T
TC-MCP-002	Request tools/list	Returns list including sysml_parse	T
TC-MCP-003	Call sysml_parse with valid SysML	Returns parsed elements	T
TC-MCP-004	Request resources/list	Returns example resources	T
TC-MCP-005	Read sysml://examples/hello	Returns vehicle model content	T

11.5.2 Repository Integration Tests

ID	Test Case	Expected Result	Method
TC-REPO-001	Read file from public repo	Returns file content	T
TC-REPO-002	Read file with PAT auth	Returns file content	T
TC-REPO-003	List .sysml files in directory	Returns file list	T
TC-REPO-004	Read from self-hosted Git provider	Returns file content	T
TC-REPO-005	Handle non-existent file	Returns appropriate error	T

11.5.3 SysML Parsing Tests

11.5.3.1 tree-sitter Corpus Tests

ID	Test Case	Expected Result	Method
TC-SYS-001	Parse package declaration	Correct CST structure	T (corpus)
TC-SYS-002	Parse part definition	Correct CST structure	T (corpus)
TC-SYS-003	Parse requirement definition	Correct CST structure	T (corpus)
TC-SYS-004	Parse nested elements	Correct CST hierarchy	T (corpus)
TC-SYS-005	Parse with syntax errors	ERROR node in CST, partial parse	T (corpus)

11.5.3.2 Training File Coverage Tests

ID	Test Case	Expected Result	Method
TC-COV-001	Parse Module 01 files	Clean parse (no ERROR nodes)	T (CI)
TC-COV-002	Parse Module 02 files	Clean parse (no ERROR nodes)	T (CI)
TC-COV-003	Calculate overall parse rate	100% achieved (target was ≥10% Phase 1, ≥50% Phase 2)	A (CI)

11.6 Known Limitations

Container testing: Cannot be performed locally on macOS; relies on CI (risk R5, accepted)
HTTP transport: Requires CI service containers or Linux machine
SysML v2 API: Requires running API server; deferred to post-capstone. Basic parsing and repository operations work without API dependency
Grammar coverage: The tree-sitter-sysml grammar achieves 99.6% coverage across 275 external files (274/275) and 100% coverage of OMG training files (100/100). The single unparseable file uses non-standard UML syntax outside the SysML v2 specification. Full semantic compliance (type checking, import resolution) is out of scope for the tree-sitter grammar and deferred to future work (sysml.rs)

11.7 Continuous Integration/Continuous Delivery (CI/CD) Verification Pipeline

Per [1, Sec. 2.3.5.9], automated verification integrates into CI/CD.

11.7.1 Pipeline Stages

stages:
  - lint
  - test
  - build
  - integration
  - publish

11.7.2 tree-sitter-sysml Test Stage

test:
  stage: test
  image: node:20-alpine
  script:
    - npm ci
    - npx tree-sitter test
    - npx tree-sitter parse --quiet training/**/*.sysml 2>&1 | tee parse-results.txt
  artifacts:
    reports:
      metrics: coverage-metrics.txt

11.7.3 open-mcp-sysml Test Stage

test:
  stage: test
  image: rust:1.85
  script:
    - cargo test --workspace
    - cargo clippy --workspace -- -D warnings
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

11.7.4 Integration Test Stage

integration:
  stage: integration
  image: rust:1.85
  variables:
    GITLAB_TOKEN: $CI_JOB_TOKEN
  script:
    - cargo test --workspace --features integration
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

11.7.5 Container Test Stage

container-test:
  stage: test
  image: quay.io/buildah/stable
  services:
    - name: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
      alias: mcp-server
  script:
    - echo '{"jsonrpc":"2.0","id":1,"method":"initialize"...}' | nc mcp-server 8080
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

11.8 Validation Approach

Per [1, Sec. 2.3.5.11], validation confirms the system meets stakeholder needs.

11.8.1 Validation Activities

Activity	Stakeholder Need	Method	Acceptance
End-to-end demo	AI tool integration	Demonstration	Claude reads SysML from GitLab
User acceptance	Developer experience	Interview	Positive feedback from pilot users
Paper submission	Academic validation	Peer review	GVSETS acceptance
Capstone review	Educational objectives	Review	Advisor approval

11.8.2 Validation Schedule

Milestone	Date	Validation Activity	Status
SRR	Feb 14, 2026	Requirements validated with stakeholders	Complete
PDR	Feb 14, 2026	Architecture validated against requirements (and by implementation)	Complete
CDR	Mar 29, 2026	Implementation validated, acceptance tests pass	Pending
Final	Apr 25, 2026	Stakeholder acceptance, capstone submission	Pending

11.9 Review Verification

Review	Verification Activities
SRR	Requirements complete, traceable to stakeholders
PDR	Architecture addresses all requirements
CDR	All tests pass, acceptance criteria met