RLM Runtime v2.0.0
Python CLI and SDK for complex multi-step tasks. Combines sandboxed code execution with Snipara context optimization, autonomous agents, sub-LLM orchestration, and WebAssembly REPL for task decomposition, code generation, and validation.
Open Source
RLM Runtime is open source and works with any LLM provider. Snipara integration is optional but recommended. Available on PyPI and GitHub.
What is RLM Runtime?
RLM (Recursive Language Models) Runtime is a framework that enables LLMs to:
- Recursively decompose complex tasks into sub-tasks
- Execute code in sandboxed environments (local, Docker, or WebAssembly)
- Query documentation using Snipara's context optimization
- Log execution traces for debugging and review
- Orchestrate sub-LLM calls (
rlm_sub_complete,rlm_batch_complete) - Run autonomous agents (observe → think → act → terminate loop)
- Execute code in WebAssembly REPL (browser-safe, no Docker needed)
- Parallel tool calls via asyncio
- Structured outputs with JSON schema constraints
- Multi-modal support (images, audio)
┌─────────────────────────────────────────────────────────────────┐ │ RLM Orchestrator │ │ • Manages recursion depth and token budgets │ │ • Coordinates LLM calls and tool execution │ │ • Cost tracking and budget enforcement │ ├─────────────────────────────────────────────────────────────────┤ │ AgentRunner │ │ • Autonomous observe → think → act → terminate loop │ │ • FINAL/FINAL_VAR termination protocol │ │ • Iteration, cost, and timeout budgets │ ├─────────────────────────────────────────────────────────────────┤ │ LLM Backends │ REPL Environments │ │ • LiteLLM (default) │ • Local (RestrictedPython) │ │ • OpenAI │ • Docker (isolated) │ │ • Anthropic │ • WebAssembly (Pyodide) │ ├─────────────────────────────────────────────────────────────────┤ │ Tool Registry │ │ • Builtin: file_read, execute_code │ │ • Sub-LLM: rlm_sub_complete, rlm_batch_complete │ │ • Agent: FINAL, FINAL_VAR │ │ • Snipara: context_query, sections, search (optional) │ │ • Memory: rlm_remember, rlm_recall (optional) │ │ • Custom: your own tools │ ├─────────────────────────────────────────────────────────────────┤ │ MCP Server (rlm mcp-serve) │ │ • 7 tools for Claude Desktop / Claude Code │ │ • Zero API keys required │ └─────────────────────────────────────────────────────────────────┘
Installation
# Standard installationpip install rlm-runtime# With Docker support (recommended for isolation)pip install rlm-runtime[docker]# With WebAssembly support (browser-safe execution)pip install rlm-runtime[wasm]# With MCP server for Claude Desktop/Codepip install rlm-runtime[mcp]# With Snipara context optimizationpip install rlm-runtime[snipara]# With trajectory visualizerpip install rlm-runtime[visualizer]# Full installation with all featurespip install rlm-runtime[all]Quick Start
CLI Quick Start
# Initialize configurationrlm init# Run a completionrlm run "Summarize the authentication flow"# Run with Docker isolationrlm run --env docker "Parse and analyze logs"# Run an autonomous agentrlm agent "Analyze all CSV files and generate a report"# Launch the visualization dashboardrlm visualizeBasic Python Usage
import asynciofrom rlm import RLMasync def main(): rlm = RLM( model="gpt-4o-mini", environment="local" ) result = await rlm.completion("Count the lines in data.csv") print(result.response)asyncio.run(main())With Snipara Context Optimization
from rlm import RLMrlm = RLM( model="claude-sonnet-4-20250514", environment="docker", # Snipara integration (optional but recommended) snipara_api_key="rlm_your_api_key", snipara_project_slug="your-project")# LLM can now query your documentationresult = await rlm.completion( "Implement a new API endpoint following our coding standards")Autonomous Agent Runner
Full autonomous agent loop: observe → think → act → terminate. The model explores documentation, writes code, spawns sub-LLM calls, and terminates via the FINAL/FINAL_VAR protocol when ready.
AgentRunner API
Configure max iterations, cost limits, token budgets, and timeout. The agent autonomously loops through observe-think-act cycles until it reaches a conclusion.
FINAL / FINAL_VAR Protocol
FINAL("answer") returns a natural language answer. FINAL_VAR("var") returns a computed REPL variable as the result. Graceful degradation forces FINAL when limits are hit.
Python API
from rlm.agent import AgentRunnerrunner = AgentRunner( model="gpt-4o", environment="docker", max_iterations=10, # Max observe-think-act cycles (clamped to 50) cost_limit=2.0, # Dollar cap (max $10) token_budget=50000, # Token budget)result = await runner.run("Analyze all CSV files and generate a report")print(result.response)CLI Usage
rlm agent "Analyze all CSV files and generate a report"rlm agent --max-iterations 20 --cost-limit 5.0 "Complex multi-step task"Safety Limits
| Limit | Default | Hard Cap |
|---|---|---|
| Max iterations | 10 | 50 |
| Cost limit | $2.00 | $10.00 |
| Timeout | 300s | 600s |
| Recursion depth | 4 | 5 |
Sub-LLM Orchestration
Models can delegate focused sub-problems to fresh LLM calls with their own context window and budget. The parent model decides when delegation is beneficial.
rlm_sub_complete
Spawn a single sub-LLM call with its own context and budget. Supports auto-context injection via context_query parameter.
rlm_batch_complete
Parallel sub-LLM calls with shared budget. Execute multiple focused queries concurrently for faster results.
Budget Inheritance
Sub-calls inherit min(requested, remaining * 0.5) of the parent's budget. Per-session dollar caps, max sub-calls per turn, and depth limits enforce cost guardrails automatically.
rlm = RLM( model="gpt-4o", environment="docker", max_depth=4, # Controls nesting depth for sub-calls)# Sub-calls are automatic - the model decides when to delegateresult = await rlm.completion("Research and compare 3 approaches to caching")CLI Flags
rlm run --sub-calls "Complex task requiring delegation"rlm run --no-sub-calls "Simple task, no delegation"rlm run --max-sub-calls 5 "Limited delegation"MCP Server for Claude
RLM Runtime includes an MCP server that provides sandboxed Python execution to Claude Desktop and Claude Code. Zero API keys required — designed to work within Claude's billing.
Available MCP Tools
| Tool | Description |
|---|---|
execute_python | Run Python code in a sandboxed environment |
get_repl_context | Get current REPL context variables |
set_repl_context | Set a variable in REPL context |
clear_repl_context | Clear all REPL context |
rlm_agent_run | Start an autonomous agent that iteratively solves a task |
rlm_agent_status | Check the status of an autonomous agent run |
rlm_agent_cancel | Cancel a running autonomous agent |
Configuration
Add to your Claude Desktop or Claude Code configuration:
{ "mcpServers": { "rlm": { "command": "rlm", "args": ["mcp-serve"] } }}With Snipara (Optional)
For context retrieval alongside code execution, add snipara-mcp alongside rlm-runtime:
{ "mcpServers": { "rlm": { "command": "rlm", "args": ["mcp-serve"] }, "snipara": { "command": "snipara-mcp-server" } }}Configuration
Environment Variables
# LLM Provider (choose one)export OPENAI_API_KEY="sk-..."export ANTHROPIC_API_KEY="sk-ant-..."# RLM Settingsexport RLM_MODEL="gpt-4o-mini"export RLM_ENVIRONMENT="docker"# Snipara (optional)export SNIPARA_API_KEY="rlm_..."export SNIPARA_PROJECT_SLUG="my-project"Configuration File (rlm.toml)
[rlm]backend = "litellm"model = "gpt-4o-mini"environment = "docker" # "local", "docker", or "wasm"max_depth = 4max_subcalls = 12token_budget = 8000verbose = false# Docker settingsdocker_image = "python:3.11-slim"docker_cpus = 1.0docker_memory = "512m"# Snipara (optional)snipara_api_key = "rlm_..."snipara_project_slug = "your-project"# Agent memory (requires Snipara)memory_enabled = falseCLI Commands
| Command | Description | Example |
|---|---|---|
rlm init | Initialize project with config | rlm init |
rlm run | Execute a prompt | rlm run "Parse the JSON files" |
rlm agent | Run autonomous agent | rlm agent "Analyze CSV files" |
rlm logs | View execution trajectory | rlm logs |
rlm visualize | Launch trajectory dashboard | rlm visualize --port 8502 |
rlm mcp-serve | Start MCP server for Claude | rlm mcp-serve |
rlm doctor | Check system health | rlm doctor |
Execution Environments
Local
Fastest iteration with in-process execution. Uses RestrictedPython for sandboxing. Limited isolation. Suitable for development or trusted inputs only.
environment="local"Docker
Stronger isolation with container execution. Configurable resource limits (CPU, memory). Network disabled by default. Recommended for production.
environment="docker"WebAssembly
Browser-safe execution via Pyodide. Strongest isolation with no filesystem or network access. Suitable for web-based integrations.
environment="wasm"Advanced Features
Structured Outputs
JSON schema-constrained responses via the response_format parameter. Get deterministic, parseable output from any completion.
Multi-Modal Support
Image and audio input via list-based Message.content for vision and audio tasks alongside code execution.
Streaming
Real-time token streaming for simple completions via rlm.stream(). See tokens as they arrive.
async for chunk in rlm.stream("Explain X"): print(chunk, end="")Cost Tracking
Per-model pricing, cost budgets, and token breakdown. Token budget enforcement is active — not just configured. Per-call cost tracked in trajectory events.
Trajectory Visualization
Interactive Streamlit dashboard with execution tree, token charts, duration analysis, tool distribution, and cost breakdown.
rlm visualize --dir ./logsAgent Memory
Persistent context via Snipara rlm_remember/rlm_recall. Gated by memory_enabled config. Requires Snipara integration.
Snipara Tools
When Snipara is configured, these tools become available to the LLM:
| Tool | Description | Use Case |
|---|---|---|
context_query | Semantic search for documentation | "How does authentication work?" |
shared_context | Get team best practices | "What are our error handling conventions?" |
decompose | Break complex queries into sub-queries | "Plan how to implement user permissions" |
multi_query | Execute multiple queries efficiently | "Get info on auth, database, and API" |
search | Regex pattern search | "Find all TODO comments" |
sections | List all documentation sections | "What documentation is available?" |
rlm_remember | Store a memory for later recall | "Remember this API decision" |
rlm_recall | Semantically recall stored memories | "What did we decide about caching?" |
Example: Context-Aware Code Generation
from rlm import RLMrlm = RLM( model="claude-sonnet-4-20250514", environment="docker", snipara_api_key="rlm_...", snipara_project_slug="my-app")# The LLM will:# 1. Call context_query to find auth patterns# 2. Call shared_context to get coding standards# 3. Execute code to explore existing files# 4. Spawn sub-LLM calls for focused sub-tasks# 5. Generate new code following conventionsresult = await rlm.completion(""" Add a password reset endpoint to our auth system. Follow our existing patterns and coding standards. Include error handling and tests.""")print(result.response)print(f"Tool calls: {result.tool_calls}")print(f"Total tokens: {result.total_tokens}")With vs Without Snipara
| Feature | Without Snipara | With Snipara |
|---|---|---|
| File reading | Direct (full content) | Semantic (relevant only) |
| Token usage | High (500K+ possible) | Optimized (5K typical) |
| Search | Regex only | Hybrid (keyword + semantic) |
| Best practices | None | Shared team context |
| Summaries | None | Cached summaries |
| Agent memory | None | Persistent rlm_remember/recall |
Trajectory Logging
Every call emits JSONL events with:
trajectory_id- Unique ID for the requestcall_id- ID of this specific callparent_call_id- ID of parent call (for sub-calls)sub_call_type- Type of sub-call (for sub-LLM orchestration)depth- Recursion depthprompt/response- Input/outputtool_calls/tool_results- Tool usagetoken_usage/duration_ms- Metricscost- Per-call cost tracking
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| "No providers configured" | Missing API keys | Set OPENAI_API_KEY or ANTHROPIC_API_KEY |
| "Recursion depth exceeded" | Task too complex | Increase max_depth or reduce task scope |
| "Sandbox timeout" | Slow execution | Increase environment timeout |
| "Token budget exhausted" | Token limit hit | Increase token_budget or simplify task |
| "Cost limit exceeded" | Dollar cap reached | Increase cost_limit in agent config |
| "Agent iteration limit" | Too many cycles | Increase max_iterations (up to 50) |
| "Invalid API key" | Snipara key issue | Check key starts with rlm_ |
| "Project not found" | Wrong project slug | Verify slug in dashboard |