12 Implementation
12.1 Overview
Implementation pivoted in February 2026 to a tree-sitter grammar plus Rust MCP server architecture. This chapter documents progress organized by deliverable.
| Deliverable | Status | Description |
|---|---|---|
| tree-sitter-sysml | ✅ Complete | Brute-force grammar, 99.6% external file coverage, 100% query coverage |
| kebnf-to-tree-sitter | ◐ In Progress | Spec-driven converter, 640 rules parsed, 335+ conflicts |
| open-mcp-sysml | ✅ Phase 1 Complete | 5 MCP tools, 22 tests, tree-sitter integration working |
12.2 tree-sitter-sysml
Production-ready SysML v2 grammar for tree-sitter ecosystem.
12.2.1 Status
| Metric | Value |
|---|---|
| Grammar size | ~2,236 lines |
| Corpus tests | 125/125 passing |
| Training file coverage | 100% (100/100 OMG examples) |
| External file coverage | 99.6% (274/275 files) |
| Query coverage | 100% (190/190 highlights, tags, folds) |
| Development time | ~25 hours |
12.2.2 Architecture
tree-sitter-sysml/
├── grammar.js # Grammar definition
├── src/parser.c # Generated parser
├── bindings/{c,go,node,python,rust,swift}/
├── corpus/ # 125 test cases
└── .gitlab-ci.yml # CI with coverage tracking
12.2.3 Methodology: Brute Force
The grammar was built through empirical iteration against training files:
- Run coverage harness against 100 OMG training files
- Identify failing files, inspect ERROR nodes
- Implement missing construct in grammar.js
- Generate parser, verify no regressions
- Repeat until 100% coverage
This approach achieved rapid results (~10% → 100% in ~20 hours of focused grammar development, ~25 hours total including corpus and test infrastructure) but has limitations: - Unknown specification alignment: Grammar was reverse-engineered - Likely over-acceptance: May accept invalid syntax - No semantic preservation: AST structure not metamodel-aligned
12.2.4 Key Constructs Implemented
| Category | Constructs |
|---|---|
| Definitions | part, item, port, action, state, requirement, constraint, use case, view, metadata, allocation |
| Usages | All corresponding usage forms plus event, exhibit, message, satisfy, verify |
| Behavioral | Actions, states, transitions, fork/join/decision, if/while/loop |
| Expressions | Operators, invocations, feature chains, select (.?{}), collect (.{}), units |
12.2.5 Technical Debt Remediation
Session 5 addressed accumulated technical debt: - Split generic relationship_part into explicit specialization, subsetting, redefinition - Created context-sensitive body types for definitions (part_body vs action_body) - Added negative test framework (7/8 invalid patterns correctly rejected)
12.3 kebnf-to-tree-sitter
Automated converter from OMG KEBNF specifications to tree-sitter grammars.
12.3.1 Purpose
| Goal | Description |
|---|---|
| Reproducibility | Re-generate grammar when SysML v2 spec updates |
| Traceability | Map tree-sitter rules to KEBNF source lines |
| Research | Quantify automation rates for INCOSE paper |
12.3.2 Status
| Metric | Value |
|---|---|
| KEBNF rules parsed | 640/640 (100%) |
| Direct conversion | 38% (247 rules) |
| Strip-and-convert | 55% (353 rules) |
| Best-effort | 6% (37 rules) |
| Manual review | <1% (3 rules) |
| LR conflicts remaining | 335+ |
12.3.3 Methodology: Spec-Driven
The tool parses official KEBNF files and generates tree-sitter grammar:
- Parse: Lex and parse KEBNF into AST (Chumsky 1.0)
- Classify: Categorize each construct (direct, strip, approximate, manual)
- Transform: Apply appropriate conversion strategy
- Emit: Generate grammar.js with mapping document
KEBNF extends standard EBNF with metamodel annotations (type annotations, property assignments, cross-references). These are stripped for tree-sitter and recorded in mapping.json for downstream tools.
12.3.4 Conversion Categories
| Category | % | Handling |
|---|---|---|
| Direct | 38% | Basic syntax maps directly to tree-sitter |
| Strip & convert | 55% | Remove annotations, keep syntax structure |
| Best-effort | 6% | Approximate semantic actions |
| Manual review | <1% | Complex disambiguation needed |
Overall: ~93% automated, ~7% requiring manual intervention
12.3.5 Architecture
kebnf-to-tree-sitter/
├── src/
│ ├── ast.rs # KEBNF AST representation
│ ├── parser.rs # KEBNF parser
│ ├── emitter.rs # Tree-sitter grammar emitter
│ └── mapping.rs # Semantic mapping generator
└── docs/
├── KEBNF-SPEC.md # KEBNF format documentation
└── MAPPING.md # Semantic gap documentation
See Section B.3 for the INCOSE paper outlining the methodology.
12.4 open-mcp-sysml
Rust MCP server providing SysML v2 tools to AI assistants.
12.4.1 Status
Phase 1 complete (Feb 13, 2026). Tree-sitter integration working with 5 MCP tools and 22 tests passing. Phase 2 PRD ready for token reduction strategies.
| Metric | Value |
|---|---|
| MCP tools | 5 (sysml_parse, sysml_validate, sysml_list_definitions, repo_list_files, repo_get_file) |
| Tests | 22 (4 unit + 4 integration + 10 protocol + 1 repo + 3 doc) |
| Training coverage | 100% (100/100 files parse without errors) |
| Detail levels | L0 (names), L1 (structure), L2 (full) |
12.4.2 Architecture
open-mcp-sysml/
├── Cargo.toml # Workspace root
├── crates/
│ ├── sysml-parser/ # Tree-sitter wrapper, L0/L1/L2 detail levels
│ ├── repo-client/ # Provider-agnostic Git interface
│ └── mcp-server/ # MCP server binary (rmcp SDK)
└── tests/fixtures/sysml-v2/ # OMG training files (git submodule)
12.4.3 Implemented Tools
| Tool | Description | Backend | Status |
|---|---|---|---|
sysml_parse |
Parse SysML v2 text with L0/L1/L2 detail | tree-sitter-sysml | ✅ Complete |
sysml_validate |
Return parse diagnostics | tree-sitter-sysml | ✅ Complete |
sysml_list_definitions |
List all definitions in model | tree-sitter-sysml | ✅ Complete |
repo_list_files |
List .sysml files in repository | repo-client (GitLab) | ✅ Complete |
repo_get_file |
Read file from repository | repo-client (GitLab) | ✅ Complete |
12.4.4 Repository Client Interface
Provider-agnostic trait with GitLab as reference implementation:
pub trait RepoClient: Send + Sync {
async fn read_file(&self, project: &str, path: &str, ref_: &str)
-> Result<Vec<u8>, RepoError>;
async fn list_files(&self, project: &str, path: &str, ref_: &str)
-> Result<Vec<FileEntry>, RepoError>;
}12.5 Dual-Path Grammar Strategy
Both grammar approaches provide value:
| Aspect | tree-sitter-sysml | kebnf-to-tree-sitter |
|---|---|---|
| Approach | Empirical iteration | Spec conversion |
| Coverage | 99.6% external files | 100% KEBNF rules |
| Conflicts | Manually tuned | Auto-detected |
| Traceability | None | Full source mapping |
| Use case | MCP server, editors | Research, spec updates |
Recommendation: Use brute-force grammar for immediate needs (syntax highlighting, MCP server). Use spec-driven for long-term maintenance and INCOSE paper contribution.
12.6 Lessons Learned
12.6.1 On Grammar Development
- Test harnesses matter: Simple coverage metrics enabled rapid iteration
- Specification access crucial: KEBNF files enabled systematic approach
- Semantic vs syntactic: Tree-sitter handles parsing; semantics require additional layer
12.6.2 On Automation Assessment
| Initial Assumption | Actual Result |
|---|---|
| 40% fully automatable (estimated from KEBNF complexity analysis) | 93% automated (38% direct + 55% strip-and-convert) |
| 30% manual work required | 7% manual intervention |
| 60-100 hours effort | ~12 hours |
Key insight: Semantic annotations can be stripped and documented externally rather than requiring complex transformation.
12.6.3 On Practical vs Formal
- Both have value: Neither approach sufficient alone
- Practical catches edge cases: Specifications have gaps
- Formal enables maintenance: Reproducible > ad-hoc
- Hybrid recommended: Spec foundation + practical fixes
12.7 Next Steps
- Phase 2 token reduction: Implement vanilla baseline, Cache ID + Summary, and remaining optimization strategies per PRD
- Benchmark execution: Run V1, V4, V5 vignettes for GVSETS quantitative data
- kebnf conflict resolution: Iteratively resolve remaining LR conflicts for INCOSE paper
- Grammar benchmark dashboard: Wire tree-sitter adapter and corpus into sysml-grammar-benchmark