Enabling AI-Augmented MBSE with the Model Context Protocol

Systems Engineering Capstone Project

Authors

Affiliations

Andrew Dunn

GitLab, Public Sector

Greg Pappas

Department of Defense, Army DEVCOM

Dr. Stephen Rapp

Wayne State University, Industrial and Systems Engineering

Published

February 17, 2026

Download as PDF View Presentation Presentation PDF

0.1 Executive Summary

This document outlines the systems engineering plan for developing an open source SysML v2 Model Context Protocol (MCP) server. The project serves dual purposes:

Open Source Contribution: Provide standalone tooling for AI-augmented Model-Based Systems Engineering (MBSE) workflows, filling a gap between commercial platforms and DIY scripting
Academic Capstone: Demonstrate INCOSE systems engineering principles [1] for a Wayne State University masters engineering capstone project

Commercial tools like SysGit provide mature, licensed Git-based MBSE platforms with comprehensive SysML v2 support. This project offers a lightweight open source alternative focused specifically on AI/LLM integration via the Model Context Protocol.

0.1.1 Key Deliverables

Software:

tree-sitter-sysml: Standalone SysML v2 grammar for tree-sitter (100% training file coverage)
kebnf-to-tree-sitter: Automated converter from OMG KEBNF specifications to tree-sitter grammars
open-mcp-sysml: Rust MCP server with Git integration (GitLab as reference) and SysML v2 support

Publications:

NDIA GVSETS 2026: AI-Augmented MBSE via MCP (draft Mar 23, final Jun 5, presentation Aug 11)
INCOSE/SysEng Journal: Grammar Transposition Methodology (kebnf-to-tree-sitter)
INCOSE 2027: SE Benchmark for LLMs (future work)

Academic:

Capstone SE documentation (SEP, SyRS, ADD, VVP, RTM)

0.1.2 Timeline

Initial Research: Early January 2026 (SysML v2 specifications and prior art)
Concept Phase Start: January 12, 2026 (Week 1)
Capstone Delivery: April 25, 2026 (Week 15)
Duration: 15 weeks

0.1.3 Project Status (Feb 14, 2026)

Current Status

Repositories:

Repository	Status	Next Step
tree-sitter-sysml	✅ 99.6% coverage (274/275 files), 125/125 tests	Pre-release cleanup (queries, CHANGELOG)
kebnf-to-tree-sitter	◐ 640 rules parsed, 335+ conflicts	Resolve DD-001 architecture decision
open-mcp-sysml	✅ Phase 1 Complete (5 MCP tools, 22 tests)	Execute benchmark vignettes, Phase 2 token strategies
sysml-grammar-benchmark	🆕 Scaffolded (0% functional)	Add corpus submodules, implement adapter
gvsets	◐ ~7 pages, quantitative claims unvalidated	Execute V1/V4/V5, replace TODO placeholders
capstone	✓ SRR + PDR complete (Feb 14)	VVP prose, conclusions rewrite

Publications:

Paper	Status	Due
GVSETS 2026	◐ Drafted, evaluation section placeholder	Draft Mar 23, Final Jun 5
Grammar Transposition	◐ Conflict resolution in progress	Q3-Q4 2026
INCOSE 2027 Benchmark	○ 8 vignettes defined (Section C.1)	Q3 2027

tree-sitter-sysml (Brute-Force Grammar): PRODUCTION READY

125/125 corpus tests passing
99.6% coverage across 275 external files (OMG, GfSE, Advent)
Context-sensitive definition bodies implemented
6 language bindings (C, Rust, Go, Python, Node.js, Swift)
Pre-release cleanup: ~18-25 hours to 1.0.0

kebnf-to-tree-sitter (Spec-Driven Grammar):

KEBNF parser complete (640/640 rules)
Grammar generation produces 335+ conflicts (vs 54 in brute-force)
4 critical path decisions open (DD-001, DD-008, DD-009, DD-020)

open-mcp-sysml (MCP Server): PHASE 1 COMPLETE

3 crates: sysml-parser, repo-client, mcp-server
5 MCP tools: sysml_parse, sysml_validate, sysml_list_definitions, repo_list_files, repo_get_file
L0/L1/L2 detail levels implemented (80-97% token reduction)
22 tests (unit, integration, MCP protocol compliance)
Phase 2 PRD ready: 7 token reduction strategies (including overflow detection)

sysml-grammar-benchmark:

Repository scaffolded with PRD, CI, Quarto dashboard
Python runner script complete, no adapters or corpora yet
Placeholder data in dashboard

GVSETS Paper:

3-condition experiment designed with benchmark vignettes V1, V4, V5 (Section C.1):

Baseline: All files concatenated (naive)
Vanilla MCP: Simple tool calls
Optimized MCP: Cache ID + Summary pattern

Next Priority: Execute benchmark vignettes V1/V4/V5 to validate GVSETS quantitative claims

0.2 Problem Statement

The Model Context Protocol [2] ecosystem has 75,000+ GitHub stars and 10+ official SDKs, while SysML v2 [3] achieved OMG adoption in July 2025. Yet their intersection remains unexplored. Defense and aerospace organizations need:

Standardized AI-tool integration for MBSE workflows
Lightweight programmatic access to SysML v2 models
CI/CD integration for model validation
Open source alternatives to proprietary vendor lock-in

0.3 MCP for SysML Context

The Model Context Protocol [2] standardizes how AI applications access external data and tools. An MCP server bridges AI assistants and domain-specific systems—in our case, SysML v2 models stored in Git repositories.

WITHOUT MCP SERVER:

  ┌──────────────┐                      ┌──────────────────┐
  │   Engineer   │ ─── copy/paste ────▶ │   AI Assistant   │
  │              │ ◀── copy/paste ───── │  (Claude, etc.)  │
  └──────────────┘                      └──────────────────┘
         │                                       │
         ▼                                       ▼
  ┌──────────────┐                      ┌──────────────────┐
  │  Git Repo    │    (no connection)   │  Generic SysML   │
  │    .sysml    │                      │  knowledge only  │
  └──────────────┘                      └──────────────────┘

  Problems: AI sees snippets, not full project. Cannot validate.
            Cannot commit. Context lost between conversations.


WITH MCP SERVER:

  ┌──────────────┐       MCP        ┌──────────────────┐
  │   Engineer   │◀─── Protocol ───▶│   AI Assistant   │
  └──────────────┘                  │  (Claude, etc.)  │
                                    └────────┬─────────┘
                                             │
                                             │ MCP
                                             ▼
                                    ┌──────────────────┐
                                    │   SysML v2 MCP   │
                                    │      Server      │
                                    └────────┬─────────┘
                                             │
              ┌──────────────────────────────┼──────────────────────────────┐
              │                              │                              │
              ▼                              ▼                              ▼
     ┌──────────────┐               ┌─────────────┐               ┌─────────────┐
     │  Git Repo    │               │  SysML v2   │               │    Local    │
     │    .sysml    │               │  API Server │               │    Parser   │
     └──────────────┘               └─────────────┘               └─────────────┘

  Benefits: AI reads full project. Validates models. Commits changes.
            Structured understanding. Persists across conversations.

Without MCP	With MCP Server
AI sees pasted snippets	AI reads entire project
No model validation	Validates against SysML v2 spec
Manual copy/paste workflow	Direct Git repository integration
Generic SysML knowledge	Structured element queries
Context lost between sessions	Project state persists

This transforms the AI from a “SysML syntax helper” into an “MBSE collaborator” that understands actual project state and can take actions within it. For detailed MCP architecture and server design, see Section 4.1.

0.4 Project Objectives

Develop an open source MCP server for SysML v2
Integrate with Git providers for model persistence (GitLab as reference implementation)
Connect to SysML v2 API Services for validation
Demonstrate AI-augmented MBSE workflows using GitLab Duo
Publish findings at NDIA GVSETS

0.5 Central Thesis: The Harness Matters

This MCP server is one component of a larger harness for leveraging LLMs in MBSE workflows. The thesis of this project is that harness design—how context is selected, structured, and presented to LLMs—may matter more than raw model capability.

This thesis draws inspiration from emerging practitioner frameworks, particularly Dex Horthy’s 12-Factor Agents [4] and the concept of Context Engineering—the discipline of optimizing what information reaches an LLM and how it’s structured. As Horthy notes: “Everything is context engineering. LLMs are stateless functions that turn inputs into outputs. To get the best outputs, you need to give them the best inputs.”

OpenAI’s “Harness Engineering” report [5] provides compelling industry validation: a team built a million-line production product with zero manually-written code by investing in environment design rather than direct coding. Their central finding—“the primary job of our engineering team became enabling the agents to do useful work”—directly parallels this project’s thesis. Their experience with progressive disclosure (“give Codex a map, not a 1,000-page instruction manual”), repository-local knowledge stores, and mechanical enforcement of architectural invariants confirms that harness quality determines agent effectiveness, independent of model capability.

0.5.1 The Context Window Problem

Large language models exhibit measurable performance degradation when operating in the back half of their context windows. Even frontier models with 200K+ token limits show reasoning quality drops as context length increases. SysML v2 models exacerbate this:

Enterprise systems contain thousands of elements across hundreds of files
Naive “load everything” approaches exhaust token budgets before work begins
Relevant elements become obscured within structural boilerplate
Model relationships span files in ways that defeat simple truncation

A 40,000-token “project awareness” overhead (as observed in this document’s own development) leaves limited budget for actual reasoning about complex models.

0.5.2 Intelligent Context Management

The value proposition extends beyond “MCP server provides model access” to “MCP server enables selective context presentation”:

Anti-Pattern	Harness-Aware Approach
Return entire `.sysml` files	Return specific elements by query
Dump full element hierarchies	Return element + immediate relationships
Include all metadata	Filter to semantically relevant properties
Load model into context upfront	Lazy-load via iterative tool calls
Single monolithic prompt	Decompose across agents/iterations

The parser/grammar provides the foundation: structured access to model elements. The MCP server provides the interface: tools that can be designed for minimal, targeted context injection. The harness design determines whether this pipeline produces meaningful LLM contributions or context-stuffed hallucinations.

0.5.3 Research Questions

This thesis motivates several questions addressed in the literature review (Section 3.1):

What context management strategies do existing AI+MBSE systems employ?
How much context is sufficient for meaningful LLM reasoning over models?
What decomposition patterns (multi-agent, iterative refinement) reduce per-call context?
How do we measure “meaningful performance” for LLM-MBSE interactions?

The architecture (Section 10.1) is designed to enable experimentation with these questions through configurable tool granularity and optional SysML v2 API integration for server-side query resolution.

0.5.4 Research-Plan-Implement Cycles

An emerging pattern in effective LLM agent design is the iterative research-before-planning approach: rather than planning upfront and executing linearly, high-quality agent workflows interleave targeted research slices with incremental planning. Each cycle:

Research: Gather just enough context relevant to the immediate decision
Plan: Make a focused plan for the next concrete step
Implement: Execute the step, capturing results
Repeat: Use implementation results to inform the next research slice

This pattern appears across multiple sources—Anthropic’s “Building Effective Agents” [6] describes the evaluator-optimizer workflow where “one LLM call generates a response while another provides evaluation and feedback in a loop.” The 12-Factor Agents framework emphasizes small, focused agents that “own their context window” rather than attempting monolithic operations.

For SysML v2 models, this suggests MCP tools should support incremental exploration: query a subsystem, analyze its interfaces, decide what adjacent context is needed, fetch that context, then proceed—rather than loading an entire model upfront. The grammar and parser provide the foundation for these surgical context extractions.

0.6 Scope

0.6.1 In Scope

Grammar Development (Dual-Path):

tree-sitter-sysml: Brute-force grammar with 100% training file coverage
kebnf-to-tree-sitter: Spec-driven grammar converter for formal traceability

MCP Server:

open-mcp-sysml (Rust): Consumes tree-sitter-sysml via bindings
Git provider file read/write operations (GitLab as reference)
SysML v2 API client integration
stdio and HTTP transport mechanisms
Container deployment

Publications:

GVSETS 2026 paper on MCP architecture
Grammar transposition methodology paper
SE documentation (SEP, SyRS, ADD, VVP)

0.6.2 Out of Scope (Future Work)

sysml.rs: Full SysML v2 semantic analysis in Rust — a research instrument for PhD work enabling import resolution, type checking, and constraint evaluation beyond tree-sitter’s syntax-only capabilities
AI benchmarking framework (INCOSE 2027 paper topic)
Multi-agent architectures
Commercial integrations

0.7 Document Structure

This book contains the complete systems engineering documentation:

Chapter 1: Foundation (SysML v2 background)
Chapter 3: Literature Review (AI + MBSE research, prior art)
Chapter 4: Model Context Protocol
Chapter 5: Tooling Ecosystem
Chapter 6: Systems Engineering Plan (SEP)
Chapter 7: Work Breakdown Structure (WBS)
Chapter 8: Stakeholder Analysis
Chapter 9: System Requirements Specification (SyRS)
Chapter 10: Architecture Design Description (ADD)
Chapter 11: Verification & Validation Plan (VVP)
Chapter 12: Implementation
Chapter 13: Conclusions

Appendices include glossary, references, traceability matrix, publication strategy, and benchmark vignettes.