Sign in Start Free

Menu

Product

Context layerProject context and memory for coding agents without a new chat surface.Project BrainThe context layer your coding agent starts from.Before / afterWhat changes when the agent starts from project state instead of recall.60s demoWatch a cold ACME agent become a briefed agent before the first edit.Proof replaySee what changes when the agent starts from reviewed project state.InstallConnect the hosted MCP endpoint to the agent you already use.Agent surfacesHow the same Project Brain reaches the coding agents teams already use.

Developers

QuickstartConnect an AI coding agent in minutes.Project Brain modelWhat changed, why, impact, next step, safety.Hosted MCPTool contracts for agent clients.API ReferenceProgrammatic context and project intelligence access.GitHub AppRepository sync and PR Answer Packs.Multi-agentCoordinate swarms, claims, state, events and htasks.Snipara SandboxRepeatable execution when validation needs proof.Companion CLIWorkflow continuity for long agent work.

Proof Philosophy Pricing Docs

Sign in Start Free

Evals

Did the agent understand the project?

Snipara evals ask whether the agent understood the project before editing: accepted decisions, impact, handoff state, source authority, and verification.

Read protocol See proof

eval-summary.json

{

"suite": "continuity",

"cold_start": "rediscover decisions",

"with_snipara": "load Project Brain",

"score": ["decision", "impact", "verify"],

"claim": "continuity improves"

}

299/300

Claude + Codex cloud tasks pass with Snipara

vs 32/300 cold baseline across the same continuity protocol

170/180

local model tasks pass with Snipara

GPT-OSS, Qwen3-Coder, Devstral across hosted retrieval

80.3%

less context than a broad raw window

6.3K selected tokens per query vs 32K raw baseline

1.2%

true hallucination rate

contradictions only, omissions tracked separately

Cold vs briefed

Same tasks. Same repo. Different first minute.

The baseline starts with the repo and task only. The Snipara condition starts with retrieved decisions, handoff context, impact hints, and verification guidance.

Sonnet and Opus both reached 60/60 with hosted Snipara retrieval.

Three Codex CLI model runs, same repo tasks, same scoring.

Local coding models

GPT-OSS 20B, Qwen3-Coder, and Devstral moved from cold starts to usable continuity.

What we score

Passing tests is not the whole story.

A project-aware eval separates two agents that both compile: one respects the current project truth, the other reintroduces an old decision.

01

Decision consistency

Did the agent respect the accepted product decision?

02

Impact awareness

Did it see callers, files, routes, and tests before editing?

03

Continuity

Did it resume from the right point after a handoff?

04

Verification

Did it identify the checks that make the result trusted?

05

Grounding

Did unsupported facts disappear from the answer pack?

Deterministic harnesses first.

We lead with checks that can be rerun: source authority, stale handling, continuity contracts, Code Graph structure, and answer-pack grounding.

Golden context

7/7 deterministic cases

Source authority, stale caveats, routing.

Continuity contract

6/6 deterministic cases

Resume scope, supersession, next action.

Code Graph

8/8 deterministic cases

Callers, imports, symbol cards, test hints.

Answer packs

4 -> 0 forbidden facts

Unsupported claims removed from grounded packs.

Limits

Strong proof, narrow claim.

We keep the public claim simple: Snipara improves continuity and project grounding before coding agents act.

These are controlled continuity scenarios, not a global ranking of models.

Model-graded scores carry a run date and should be read as directional.

Retrieval precision depends on the project corpus, chunk freshness, and source authority.

The strongest claim is not smarter models. It is better project continuity before the model acts.

Open proof Read docs