12  Implementation

12.1 Overview

Implementation pivoted in February 2026 to a tree-sitter grammar plus Rust MCP server architecture. This chapter documents progress organized by deliverable.

Deliverable Status Description
tree-sitter-sysml ✅ Complete Brute-force grammar, 99.6% external file coverage, 100% query coverage
kebnf-to-tree-sitter ◐ In Progress Spec-driven converter, 640 rules parsed, 335+ conflicts
open-mcp-sysml ✅ Phase 1 Complete 5 MCP tools, 22 tests, tree-sitter integration working

12.2 tree-sitter-sysml

Production-ready SysML v2 grammar for tree-sitter ecosystem.

12.2.1 Status

Metric Value
Grammar size ~2,236 lines
Corpus tests 125/125 passing
Training file coverage 100% (100/100 OMG examples)
External file coverage 99.6% (274/275 files)
Query coverage 100% (190/190 highlights, tags, folds)
Development time ~25 hours

12.2.2 Architecture

tree-sitter-sysml/
├── grammar.js              # Grammar definition
├── src/parser.c            # Generated parser
├── bindings/{c,go,node,python,rust,swift}/
├── corpus/                 # 125 test cases
└── .gitlab-ci.yml          # CI with coverage tracking

12.2.3 Methodology: Brute Force

The grammar was built through empirical iteration against training files:

  1. Run coverage harness against 100 OMG training files
  2. Identify failing files, inspect ERROR nodes
  3. Implement missing construct in grammar.js
  4. Generate parser, verify no regressions
  5. Repeat until 100% coverage

This approach achieved rapid results (~10% → 100% in ~20 hours of focused grammar development, ~25 hours total including corpus and test infrastructure) but has limitations: - Unknown specification alignment: Grammar was reverse-engineered - Likely over-acceptance: May accept invalid syntax - No semantic preservation: AST structure not metamodel-aligned

12.2.4 Key Constructs Implemented

Category Constructs
Definitions part, item, port, action, state, requirement, constraint, use case, view, metadata, allocation
Usages All corresponding usage forms plus event, exhibit, message, satisfy, verify
Behavioral Actions, states, transitions, fork/join/decision, if/while/loop
Expressions Operators, invocations, feature chains, select (.?{}), collect (.{}), units

12.2.5 Technical Debt Remediation

Session 5 addressed accumulated technical debt: - Split generic relationship_part into explicit specialization, subsetting, redefinition - Created context-sensitive body types for definitions (part_body vs action_body) - Added negative test framework (7/8 invalid patterns correctly rejected)

12.3 kebnf-to-tree-sitter

Automated converter from OMG KEBNF specifications to tree-sitter grammars.

12.3.1 Purpose

Goal Description
Reproducibility Re-generate grammar when SysML v2 spec updates
Traceability Map tree-sitter rules to KEBNF source lines
Research Quantify automation rates for INCOSE paper

12.3.2 Status

Metric Value
KEBNF rules parsed 640/640 (100%)
Direct conversion 38% (247 rules)
Strip-and-convert 55% (353 rules)
Best-effort 6% (37 rules)
Manual review <1% (3 rules)
LR conflicts remaining 335+

12.3.3 Methodology: Spec-Driven

The tool parses official KEBNF files and generates tree-sitter grammar:

  1. Parse: Lex and parse KEBNF into AST (Chumsky 1.0)
  2. Classify: Categorize each construct (direct, strip, approximate, manual)
  3. Transform: Apply appropriate conversion strategy
  4. Emit: Generate grammar.js with mapping document

KEBNF extends standard EBNF with metamodel annotations (type annotations, property assignments, cross-references). These are stripped for tree-sitter and recorded in mapping.json for downstream tools.

12.3.4 Conversion Categories

Category % Handling
Direct 38% Basic syntax maps directly to tree-sitter
Strip & convert 55% Remove annotations, keep syntax structure
Best-effort 6% Approximate semantic actions
Manual review <1% Complex disambiguation needed

Overall: ~93% automated, ~7% requiring manual intervention

12.3.5 Architecture

kebnf-to-tree-sitter/
├── src/
│   ├── ast.rs            # KEBNF AST representation
│   ├── parser.rs         # KEBNF parser
│   ├── emitter.rs        # Tree-sitter grammar emitter
│   └── mapping.rs        # Semantic mapping generator
└── docs/
    ├── KEBNF-SPEC.md     # KEBNF format documentation
    └── MAPPING.md        # Semantic gap documentation

See Section B.3 for the INCOSE paper outlining the methodology.

12.4 open-mcp-sysml

Rust MCP server providing SysML v2 tools to AI assistants.

12.4.1 Status

Phase 1 complete (Feb 13, 2026). Tree-sitter integration working with 5 MCP tools and 22 tests passing. Phase 2 PRD ready for token reduction strategies.

Metric Value
MCP tools 5 (sysml_parse, sysml_validate, sysml_list_definitions, repo_list_files, repo_get_file)
Tests 22 (4 unit + 4 integration + 10 protocol + 1 repo + 3 doc)
Training coverage 100% (100/100 files parse without errors)
Detail levels L0 (names), L1 (structure), L2 (full)

12.4.2 Architecture

open-mcp-sysml/
├── Cargo.toml              # Workspace root
├── crates/
│   ├── sysml-parser/       # Tree-sitter wrapper, L0/L1/L2 detail levels
│   ├── repo-client/        # Provider-agnostic Git interface
│   └── mcp-server/         # MCP server binary (rmcp SDK)
└── tests/fixtures/sysml-v2/ # OMG training files (git submodule)

12.4.3 Implemented Tools

Tool Description Backend Status
sysml_parse Parse SysML v2 text with L0/L1/L2 detail tree-sitter-sysml ✅ Complete
sysml_validate Return parse diagnostics tree-sitter-sysml ✅ Complete
sysml_list_definitions List all definitions in model tree-sitter-sysml ✅ Complete
repo_list_files List .sysml files in repository repo-client (GitLab) ✅ Complete
repo_get_file Read file from repository repo-client (GitLab) ✅ Complete

12.4.4 Repository Client Interface

Provider-agnostic trait with GitLab as reference implementation:

pub trait RepoClient: Send + Sync {
    async fn read_file(&self, project: &str, path: &str, ref_: &str) 
        -> Result<Vec<u8>, RepoError>;
    async fn list_files(&self, project: &str, path: &str, ref_: &str) 
        -> Result<Vec<FileEntry>, RepoError>;
}

12.5 Dual-Path Grammar Strategy

Both grammar approaches provide value:

Aspect tree-sitter-sysml kebnf-to-tree-sitter
Approach Empirical iteration Spec conversion
Coverage 99.6% external files 100% KEBNF rules
Conflicts Manually tuned Auto-detected
Traceability None Full source mapping
Use case MCP server, editors Research, spec updates

Recommendation: Use brute-force grammar for immediate needs (syntax highlighting, MCP server). Use spec-driven for long-term maintenance and INCOSE paper contribution.

12.6 Lessons Learned

12.6.1 On Grammar Development

  1. Test harnesses matter: Simple coverage metrics enabled rapid iteration
  2. Specification access crucial: KEBNF files enabled systematic approach
  3. Semantic vs syntactic: Tree-sitter handles parsing; semantics require additional layer

12.6.2 On Automation Assessment

Initial Assumption Actual Result
40% fully automatable (estimated from KEBNF complexity analysis) 93% automated (38% direct + 55% strip-and-convert)
30% manual work required 7% manual intervention
60-100 hours effort ~12 hours

Key insight: Semantic annotations can be stripped and documented externally rather than requiring complex transformation.

12.6.3 On Practical vs Formal

  • Both have value: Neither approach sufficient alone
  • Practical catches edge cases: Specifications have gaps
  • Formal enables maintenance: Reproducible > ad-hoc
  • Hybrid recommended: Spec foundation + practical fixes

12.7 Next Steps

  1. Phase 2 token reduction: Implement vanilla baseline, Cache ID + Summary, and remaining optimization strategies per PRD
  2. Benchmark execution: Run V1, V4, V5 vignettes for GVSETS quantitative data
  3. kebnf conflict resolution: Iteratively resolve remaining LR conflicts for INCOSE paper
  4. Grammar benchmark dashboard: Wire tree-sitter adapter and corpus into sysml-grammar-benchmark