Full Suite Coverage

How each adapter performs against the complete test suite (85 tests). Measures overall capability and completeness.

Adapter Coverage Passed Not Implemented Progress
Claude Code 73% 62 / 85 21
Deep Agents 48% 41 / 85 38
Goose 46% 39 / 85 42
Letta 41% 35 / 85 43

Claimed Feature Reliability

How well each adapter implements the features it claims to support. Only tests for declared capabilities.

Adapter Reliability Passed Failed Progress
Claude Code 97% 62 / 64 2
Letta 92% 35 / 38 3
Goose 91% 39 / 43 4
Deep Agents 87% 41 / 47 6

Results by Category

Test results grouped by capability area.

Category Tests Claude Code Letta Goose Deep Agents
Execution 6 6/6 5/6 5/6 4/6
Streaming 6 6/6 6/6 5/6 6/6
Tool Events 5 5/5 5/5 5/5 5/5
Sessions 7 -- -- -- --
Agents 7 -- 7/7 -- --
Memory 7 -- 7/7 -- --
Subagents 6 6/6 -- -- 6/6
MCP 6 6/6 -- 6/6 --
Files 8 8/8 -- 8/8 8/8
Planning 7 7/7 -- -- 7/7
Hooks 6 6/6 -- -- --
Skills 7 7/7 -- 7/7 --
Tools API 7 5/7 5/7 3/7 5/7

Understanding the Metrics

Known Issues