RLM Runtime v2.0.0

Python CLI and SDK for complex multi-step tasks. Combines sandboxed code execution with Snipara context optimization, autonomous agents, sub-LLM orchestration, and WebAssembly REPL for task decomposition, code generation, and validation.

Open Source

RLM Runtime is open source and works with any LLM provider. Snipara integration is optional but recommended. Available on PyPI and GitHub.

What is RLM Runtime?

RLM (Recursive Language Models) Runtime is a framework that enables LLMs to:

  • Recursively decompose complex tasks into sub-tasks
  • Execute code in sandboxed environments (local, Docker, or WebAssembly)
  • Query documentation using Snipara's context optimization
  • Log execution traces for debugging and review
  • Orchestrate sub-LLM calls (rlm_sub_complete, rlm_batch_complete)
  • Run autonomous agents (observe → think → act → terminate loop)
  • Execute code in WebAssembly REPL (browser-safe, no Docker needed)
  • Parallel tool calls via asyncio
  • Structured outputs with JSON schema constraints
  • Multi-modal support (images, audio)
┌─────────────────────────────────────────────────────────────────┐
│  RLM Orchestrator                                               │
│  • Manages recursion depth and token budgets                    │
│  • Coordinates LLM calls and tool execution                     │
│  • Cost tracking and budget enforcement                         │
├─────────────────────────────────────────────────────────────────┤
│  AgentRunner                                                    │
│  • Autonomous observe → think → act → terminate loop            │
│  • FINAL/FINAL_VAR termination protocol                         │
│  • Iteration, cost, and timeout budgets                         │
├─────────────────────────────────────────────────────────────────┤
│  LLM Backends              │  REPL Environments                 │
│  • LiteLLM (default)       │  • Local (RestrictedPython)        │
│  • OpenAI                  │  • Docker (isolated)               │
│  • Anthropic               │  • WebAssembly (Pyodide)           │
├─────────────────────────────────────────────────────────────────┤
│  Tool Registry                                                  │
│  • Builtin: file_read, execute_code                            │
│  • Sub-LLM: rlm_sub_complete, rlm_batch_complete              │
│  • Agent: FINAL, FINAL_VAR                                     │
│  • Snipara: context_query, sections, search (optional)         │
│  • Memory: rlm_remember, rlm_recall (optional)                 │
│  • Custom: your own tools                                       │
├─────────────────────────────────────────────────────────────────┤
│  MCP Server (rlm mcp-serve)                                     │
│  • 7 tools for Claude Desktop / Claude Code                     │
│  • Zero API keys required                                       │
└─────────────────────────────────────────────────────────────────┘

Installation

# Standard installation
pip install rlm-runtime
# With Docker support (recommended for isolation)
pip install rlm-runtime[docker]
# With WebAssembly support (browser-safe execution)
pip install rlm-runtime[wasm]
# With MCP server for Claude Desktop/Code
pip install rlm-runtime[mcp]
# With Snipara context optimization
pip install rlm-runtime[snipara]
# With trajectory visualizer
pip install rlm-runtime[visualizer]
# Full installation with all features
pip install rlm-runtime[all]

Quick Start

CLI Quick Start

# Initialize configuration
rlm init
# Run a completion
rlm run "Summarize the authentication flow"
# Run with Docker isolation
rlm run --env docker "Parse and analyze logs"
# Run an autonomous agent
rlm agent "Analyze all CSV files and generate a report"
# Launch the visualization dashboard
rlm visualize

Basic Python Usage

import asyncio
from rlm import RLM
async def main():
    rlm = RLM(
        model="gpt-4o-mini",
        environment="local"
    )
    result = await rlm.completion("Count the lines in data.csv")
    print(result.response)
asyncio.run(main())

With Snipara Context Optimization

from rlm import RLM
rlm = RLM(
    model="claude-sonnet-4-20250514",
    environment="docker",
    # Snipara integration (optional but recommended)
    snipara_api_key="rlm_your_api_key",
    snipara_project_slug="your-project"
)
# LLM can now query your documentation
result = await rlm.completion(
    "Implement a new API endpoint following our coding standards"
)

Autonomous Agent Runner

Full autonomous agent loop: observe → think → act → terminate. The model explores documentation, writes code, spawns sub-LLM calls, and terminates via the FINAL/FINAL_VAR protocol when ready.

AgentRunner API

Configure max iterations, cost limits, token budgets, and timeout. The agent autonomously loops through observe-think-act cycles until it reaches a conclusion.

FINAL / FINAL_VAR Protocol

FINAL("answer") returns a natural language answer. FINAL_VAR("var") returns a computed REPL variable as the result. Graceful degradation forces FINAL when limits are hit.

Python API

from rlm.agent import AgentRunner
runner = AgentRunner(
    model="gpt-4o",
    environment="docker",
    max_iterations=10,   # Max observe-think-act cycles (clamped to 50)
    cost_limit=2.0,      # Dollar cap (max $10)
    token_budget=50000,  # Token budget
)
result = await runner.run("Analyze all CSV files and generate a report")
print(result.response)

CLI Usage

rlm agent "Analyze all CSV files and generate a report"
rlm agent --max-iterations 20 --cost-limit 5.0 "Complex multi-step task"

Safety Limits

LimitDefaultHard Cap
Max iterations1050
Cost limit$2.00$10.00
Timeout300s600s
Recursion depth45

Sub-LLM Orchestration

Models can delegate focused sub-problems to fresh LLM calls with their own context window and budget. The parent model decides when delegation is beneficial.

rlm_sub_complete

Spawn a single sub-LLM call with its own context and budget. Supports auto-context injection via context_query parameter.

rlm_batch_complete

Parallel sub-LLM calls with shared budget. Execute multiple focused queries concurrently for faster results.

Budget Inheritance

Sub-calls inherit min(requested, remaining * 0.5) of the parent's budget. Per-session dollar caps, max sub-calls per turn, and depth limits enforce cost guardrails automatically.

rlm = RLM(
    model="gpt-4o",
    environment="docker",
    max_depth=4,  # Controls nesting depth for sub-calls
)
# Sub-calls are automatic - the model decides when to delegate
result = await rlm.completion("Research and compare 3 approaches to caching")

CLI Flags

rlm run --sub-calls "Complex task requiring delegation"
rlm run --no-sub-calls "Simple task, no delegation"
rlm run --max-sub-calls 5 "Limited delegation"

MCP Server for Claude

RLM Runtime includes an MCP server that provides sandboxed Python execution to Claude Desktop and Claude Code. Zero API keys required — designed to work within Claude's billing.

Available MCP Tools

ToolDescription
execute_pythonRun Python code in a sandboxed environment
get_repl_contextGet current REPL context variables
set_repl_contextSet a variable in REPL context
clear_repl_contextClear all REPL context
rlm_agent_runStart an autonomous agent that iteratively solves a task
rlm_agent_statusCheck the status of an autonomous agent run
rlm_agent_cancelCancel a running autonomous agent

Configuration

Add to your Claude Desktop or Claude Code configuration:

{
  "mcpServers": {
    "rlm": {
      "command": "rlm",
      "args": ["mcp-serve"]
    }
  }
}

With Snipara (Optional)

For context retrieval alongside code execution, add snipara-mcp alongside rlm-runtime:

{
  "mcpServers": {
    "rlm": {
      "command": "rlm",
      "args": ["mcp-serve"]
    },
    "snipara": {
      "command": "snipara-mcp-server"
    }
  }
}

Configuration

Environment Variables

# LLM Provider (choose one)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
# RLM Settings
export RLM_MODEL="gpt-4o-mini"
export RLM_ENVIRONMENT="docker"
# Snipara (optional)
export SNIPARA_API_KEY="rlm_..."
export SNIPARA_PROJECT_SLUG="my-project"

Configuration File (rlm.toml)

[rlm]
backend = "litellm"
model = "gpt-4o-mini"
environment = "docker"   # "local", "docker", or "wasm"
max_depth = 4
max_subcalls = 12
token_budget = 8000
verbose = false
# Docker settings
docker_image = "python:3.11-slim"
docker_cpus = 1.0
docker_memory = "512m"
# Snipara (optional)
snipara_api_key = "rlm_..."
snipara_project_slug = "your-project"
# Agent memory (requires Snipara)
memory_enabled = false

CLI Commands

CommandDescriptionExample
rlm initInitialize project with configrlm init
rlm runExecute a promptrlm run "Parse the JSON files"
rlm agentRun autonomous agentrlm agent "Analyze CSV files"
rlm logsView execution trajectoryrlm logs
rlm visualizeLaunch trajectory dashboardrlm visualize --port 8502
rlm mcp-serveStart MCP server for Clauderlm mcp-serve
rlm doctorCheck system healthrlm doctor

Execution Environments

Local

Fastest iteration with in-process execution. Uses RestrictedPython for sandboxing. Limited isolation. Suitable for development or trusted inputs only.

environment="local"

Docker

Stronger isolation with container execution. Configurable resource limits (CPU, memory). Network disabled by default. Recommended for production.

environment="docker"

WebAssembly

Browser-safe execution via Pyodide. Strongest isolation with no filesystem or network access. Suitable for web-based integrations.

environment="wasm"

Advanced Features

Structured Outputs

JSON schema-constrained responses via the response_format parameter. Get deterministic, parseable output from any completion.

Multi-Modal Support

Image and audio input via list-based Message.content for vision and audio tasks alongside code execution.

Streaming

Real-time token streaming for simple completions via rlm.stream(). See tokens as they arrive.

async for chunk in rlm.stream("Explain X"):
    print(chunk, end="")

Cost Tracking

Per-model pricing, cost budgets, and token breakdown. Token budget enforcement is active — not just configured. Per-call cost tracked in trajectory events.

Trajectory Visualization

Interactive Streamlit dashboard with execution tree, token charts, duration analysis, tool distribution, and cost breakdown.

rlm visualize --dir ./logs

Agent Memory

Persistent context via Snipara rlm_remember/rlm_recall. Gated by memory_enabled config. Requires Snipara integration.

Snipara Tools

When Snipara is configured, these tools become available to the LLM:

ToolDescriptionUse Case
context_querySemantic search for documentation"How does authentication work?"
shared_contextGet team best practices"What are our error handling conventions?"
decomposeBreak complex queries into sub-queries"Plan how to implement user permissions"
multi_queryExecute multiple queries efficiently"Get info on auth, database, and API"
searchRegex pattern search"Find all TODO comments"
sectionsList all documentation sections"What documentation is available?"
rlm_rememberStore a memory for later recall"Remember this API decision"
rlm_recallSemantically recall stored memories"What did we decide about caching?"

Example: Context-Aware Code Generation

from rlm import RLM
rlm = RLM(
    model="claude-sonnet-4-20250514",
    environment="docker",
    snipara_api_key="rlm_...",
    snipara_project_slug="my-app"
)
# The LLM will:
# 1. Call context_query to find auth patterns
# 2. Call shared_context to get coding standards
# 3. Execute code to explore existing files
# 4. Spawn sub-LLM calls for focused sub-tasks
# 5. Generate new code following conventions
result = await rlm.completion("""
    Add a password reset endpoint to our auth system.
    Follow our existing patterns and coding standards.
    Include error handling and tests.
""")
print(result.response)
print(f"Tool calls: {result.tool_calls}")
print(f"Total tokens: {result.total_tokens}")

With vs Without Snipara

FeatureWithout SniparaWith Snipara
File readingDirect (full content)Semantic (relevant only)
Token usageHigh (500K+ possible)Optimized (5K typical)
SearchRegex onlyHybrid (keyword + semantic)
Best practicesNoneShared team context
SummariesNoneCached summaries
Agent memoryNonePersistent rlm_remember/recall

Trajectory Logging

Every call emits JSONL events with:

  • trajectory_id - Unique ID for the request
  • call_id - ID of this specific call
  • parent_call_id - ID of parent call (for sub-calls)
  • sub_call_type - Type of sub-call (for sub-LLM orchestration)
  • depth - Recursion depth
  • prompt / response - Input/output
  • tool_calls / tool_results - Tool usage
  • token_usage / duration_ms - Metrics
  • cost - Per-call cost tracking

Troubleshooting

IssueCauseSolution
"No providers configured"Missing API keysSet OPENAI_API_KEY or ANTHROPIC_API_KEY
"Recursion depth exceeded"Task too complexIncrease max_depth or reduce task scope
"Sandbox timeout"Slow executionIncrease environment timeout
"Token budget exhausted"Token limit hitIncrease token_budget or simplify task
"Cost limit exceeded"Dollar cap reachedIncrease cost_limit in agent config
"Agent iteration limit"Too many cyclesIncrease max_iterations (up to 50)
"Invalid API key"Snipara key issueCheck key starts with rlm_
"Project not found"Wrong project slugVerify slug in dashboard

Next Steps