RELP: Recursive Execution Loop Pipeline

RELP enables your LLM to work with documentation 100x larger than its context window by intelligently decomposing queries and orchestrating multiple context retrievals.

Reduced Hallucination

RELP grounds responses in your actual documentation, providing source-attributed answers that dramatically reduce LLM hallucination.

The Problem

Modern LLMs have context windows ranging from 8K to 200K tokens. But real-world documentation often exceeds these limits:

  • A typical codebase: 500K+ tokens of docs, comments, and READMEs
  • Enterprise documentation: Often 1M+ tokens across wikis and specs
  • API references: Can easily reach 200K+ tokens

Without RELP, you either truncate your docs (losing context) or pay to send everything (expensive and slow). Both approaches lead to incomplete answers and hallucination.

How RELP Works

RELP solves this by breaking complex queries into focused sub-queries, each retrieving precisely relevant context within your token budget.

┌─────────────────────────────────────────────────────────────┐
│  RELP WORKFLOW                                              │
│                                                             │
│  Step 1: Complex Query                                      │
│          "Implement user authentication with JWT"           │
│                    ↓                                        │
│  Step 2: rlm_decompose                                      │
│          → ["login flow", "JWT handling", "sessions"]       │
│                    ↓                                        │
│  Step 3: rlm_multi_query (for each sub-query)              │
│          → Retrieve 4K tokens of relevant context each      │
│                    ↓                                        │
│  Step 4: LLM synthesizes final answer                       │
│          → Grounded in actual documentation                 │
│          → Source-attributed, verifiable                    │
└─────────────────────────────────────────────────────────────┘

The Three RELP Tools

Pro+

rlm_decompose

Breaks a complex query into focused sub-queries with dependency analysis. Returns an execution plan optimized for your token budget.

rlm_decompose("implement auth system")
{
  "sub_queries": [
    "login flow and validation",
    "JWT token generation",
    "session management"
  ],
  "estimated_tokens": 12000
}
Pro+

rlm_multi_query

Executes multiple queries in a single call with shared token budget. Efficiently retrieves context for all sub-queries.

rlm_multi_query({
  queries: ["login flow", "JWT tokens"],
  max_tokens_per_query: 4000
})
Team+

rlm_plan

Generates a complete execution plan for complex tasks. Includes dependency ordering, estimated tokens, and execution strategy.

rlm_plan("refactor auth to use OAuth2")
{
  "strategy": "RELEVANCE_FIRST",
  "steps": [...],
  "total_estimated_tokens": 25000
}

Plan Availability

ToolFreePro ($19/mo)Team ($49/mo)Enterprise
rlm_context_query
rlm_decompose
rlm_multi_query
rlm_plan

Token Budget Management

RELP intelligently distributes your token budget across sub-queries:

Example: 12,000 token budget for 3 sub-queries

Sub-query 1: "login flow"        → 4,000 tokens (high relevance)
Sub-query 2: "JWT handling"      → 4,000 tokens (high relevance)
Sub-query 3: "session storage"   → 4,000 tokens (medium relevance)

Total: 12,000 tokens across 3 focused retrievals
vs. 500,000 tokens if you sent everything

Reducing Hallucination

RELP significantly reduces LLM hallucination through several mechanisms:

  • Grounded responses: Every answer is based on actual documentation, not just training data
  • Focused context: Each sub-query gets precisely relevant sections, reducing ambiguity
  • Source attribution: Responses include file paths and line numbers for verification
  • Shared context: Team coding standards are always injected, preventing non-compliant suggestions

Example: Complex Task Execution

# Your LLM receives this complex request
"Implement user authentication with JWT tokens,
 following our existing patterns and error handling"
 Step 1: LLM calls rlm_decompose
rlm_decompose("implement auth with JWT")
 Step 2: LLM calls rlm_multi_query for each sub-query
rlm_multi_query(["login flow", "JWT tokens", "sessions"])
 Step 3: LLM calls rlm_shared_context for team standards
rlm_shared_context("error handling patterns")
 Result: Accurate, grounded implementation

Expected Query Usage

Different task complexities consume different numbers of queries:

Task ComplexityExampleQueries Used
Simple"What is the API endpoint for users?"1-2
Medium"How does authentication work?"3-5
Complex"Implement a new feature following our patterns"8-15
Planning"Refactor the entire auth system"15-25

Next Steps