RLM Runtime v2.0.0

Python CLI and SDK for complex multi-step tasks. Combines sandboxed code execution with Snipara context optimization, autonomous agents, sub-LLM orchestration, and WebAssembly REPL for task decomposition, code generation, and validation.

Open Source

RLM Runtime is open source and works with any LLM provider. Snipara integration is optional but recommended. Available on PyPI and GitHub.

What is RLM Runtime?

RLM (Recursive Language Models) Runtime is a framework that enables LLMs to:

Recursively decompose complex tasks into sub-tasks
Execute code in sandboxed environments (local, Docker, or WebAssembly)
Query documentation using Snipara's context optimization
Log execution traces for debugging and review
Orchestrate sub-LLM calls (rlm_sub_complete, rlm_batch_complete)
Run autonomous agents (observe → think → act → terminate loop)
Execute code in WebAssembly REPL (browser-safe, no Docker needed)
Parallel tool calls via asyncio
Structured outputs with JSON schema constraints
Multi-modal support (images, audio)

┌─────────────────────────────────────────────────────────────────┐
│  RLM Orchestrator                                               │
│  • Manages recursion depth and token budgets                    │
│  • Coordinates LLM calls and tool execution                     │
│  • Cost tracking and budget enforcement                         │
├─────────────────────────────────────────────────────────────────┤
│  AgentRunner                                                    │
│  • Autonomous observe → think → act → terminate loop            │
│  • FINAL/FINAL_VAR termination protocol                         │
│  • Iteration, cost, and timeout budgets                         │
├─────────────────────────────────────────────────────────────────┤
│  LLM Backends              │  REPL Environments                 │
│  • LiteLLM (default)       │  • Local (RestrictedPython)        │
│  • OpenAI                  │  • Docker (isolated)               │
│  • Anthropic               │  • WebAssembly (Pyodide)           │
├─────────────────────────────────────────────────────────────────┤
│  Tool Registry                                                  │
│  • Builtin: file_read, execute_code                            │
│  • Sub-LLM: rlm_sub_complete, rlm_batch_complete              │
│  • Agent: FINAL, FINAL_VAR                                     │
│  • Snipara: context_query, sections, search (optional)         │
│  • Memory: rlm_remember, rlm_recall (optional)                 │
│  • Custom: your own tools                                       │
├─────────────────────────────────────────────────────────────────┤
│  MCP Server (rlm mcp-serve)                                     │
│  • 7 tools for Claude Desktop / Claude Code                     │
│  • Zero API keys required                                       │
└─────────────────────────────────────────────────────────────────┘

Installation

# Standard installation

pip install rlm-runtime

# With Docker support (recommended for isolation)

pip install rlm-runtime[docker]

# With WebAssembly support (browser-safe execution)

pip install rlm-runtime[wasm]

# With MCP server for Claude Desktop/Code

pip install rlm-runtime[mcp]

# With Snipara context optimization

pip install rlm-runtime[snipara]

# With trajectory visualizer

pip install rlm-runtime[visualizer]

# Full installation with all features

pip install rlm-runtime[all]

Quick Start

CLI Quick Start

# Initialize configuration

rlm init

# Run a completion

rlm run "Summarize the authentication flow"

# Run with Docker isolation

rlm run --env docker "Parse and analyze logs"

# Run an autonomous agent

rlm agent "Analyze all CSV files and generate a report"

# Launch the visualization dashboard

rlm visualize

Basic Python Usage

import asyncio

from rlm import RLM

async def main():

    rlm = RLM(

        model="gpt-4o-mini",

        environment="local"

    result = await rlm.completion("Count the lines in data.csv")

    print(result.response)

asyncio.run(main())

With Snipara Context Optimization

from rlm import RLM

rlm = RLM(

    model="claude-sonnet-4-20250514",

    environment="docker",

    # Snipara integration (optional but recommended)

    snipara_api_key="rlm_your_api_key",

    snipara_project_slug="your-project"

# LLM can now query your documentation

result = await rlm.completion(

    "Implement a new API endpoint following our coding standards"

Autonomous Agent Runner

Full autonomous agent loop: observe → think → act → terminate. The model explores documentation, writes code, spawns sub-LLM calls, and terminates via the FINAL/FINAL_VAR protocol when ready.

AgentRunner API

Configure max iterations, cost limits, token budgets, and timeout. The agent autonomously loops through observe-think-act cycles until it reaches a conclusion.

FINAL / FINAL_VAR Protocol

FINAL("answer") returns a natural language answer. FINAL_VAR("var") returns a computed REPL variable as the result. Graceful degradation forces FINAL when limits are hit.

Python API

from rlm.agent import AgentRunner

runner = AgentRunner(

    model="gpt-4o",

    environment="docker",

    max_iterations=10,   # Max observe-think-act cycles (clamped to 50)

    cost_limit=2.0,      # Dollar cap (max $10)

    token_budget=50000,  # Token budget

result = await runner.run("Analyze all CSV files and generate a report")

print(result.response)

CLI Usage

rlm agent "Analyze all CSV files and generate a report"

rlm agent --max-iterations 20 --cost-limit 5.0 "Complex multi-step task"

Safety Limits

Limit	Default	Hard Cap
Max iterations	10	50
Cost limit	$2.00	$10.00
Timeout	300s	600s
Recursion depth	4	5

Sub-LLM Orchestration

Models can delegate focused sub-problems to fresh LLM calls with their own context window and budget. The parent model decides when delegation is beneficial.

rlm_sub_complete

Spawn a single sub-LLM call with its own context and budget. Supports auto-context injection via context_query parameter.

rlm_batch_complete

Parallel sub-LLM calls with shared budget. Execute multiple focused queries concurrently for faster results.

Budget Inheritance

Sub-calls inherit min(requested, remaining * 0.5) of the parent's budget. Per-session dollar caps, max sub-calls per turn, and depth limits enforce cost guardrails automatically.

rlm = RLM(

    model="gpt-4o",

    environment="docker",

    max_depth=4,  # Controls nesting depth for sub-calls

# Sub-calls are automatic - the model decides when to delegate

result = await rlm.completion("Research and compare 3 approaches to caching")

CLI Flags

rlm run --sub-calls "Complex task requiring delegation"

rlm run --no-sub-calls "Simple task, no delegation"

rlm run --max-sub-calls 5 "Limited delegation"

MCP Server for Claude

RLM Runtime includes an MCP server that provides sandboxed Python execution to Claude Desktop and Claude Code. Zero API keys required — designed to work within Claude's billing.

Available MCP Tools

Tool	Description
`execute_python`	Run Python code in a sandboxed environment
`get_repl_context`	Get current REPL context variables
`set_repl_context`	Set a variable in REPL context
`clear_repl_context`	Clear all REPL context
`rlm_agent_run`	Start an autonomous agent that iteratively solves a task
`rlm_agent_status`	Check the status of an autonomous agent run
`rlm_agent_cancel`	Cancel a running autonomous agent

Configuration

Add to your Claude Desktop or Claude Code configuration:

  "mcpServers": {

    "rlm": {

      "command": "rlm",

      "args": ["mcp-serve"]

With Snipara (Optional)

For context retrieval alongside code execution, add snipara-mcp alongside rlm-runtime:

  "mcpServers": {

    "rlm": {

      "command": "rlm",

      "args": ["mcp-serve"]

},

    "snipara": {

      "command": "snipara-mcp-server"

Configuration

Environment Variables

# LLM Provider (choose one)

export OPENAI_API_KEY="sk-..."

export ANTHROPIC_API_KEY="sk-ant-..."

# RLM Settings

export RLM_MODEL="gpt-4o-mini"

export RLM_ENVIRONMENT="docker"

# Snipara (optional)

export SNIPARA_API_KEY="rlm_..."

export SNIPARA_PROJECT_SLUG="my-project"

Configuration File (rlm.toml)

[rlm]

backend = "litellm"

model = "gpt-4o-mini"

environment = "docker"   # "local", "docker", or "wasm"

max_depth = 4

max_subcalls = 12

token_budget = 8000

verbose = false

# Docker settings

docker_image = "python:3.11-slim"

docker_cpus = 1.0

docker_memory = "512m"

# Snipara (optional)

snipara_api_key = "rlm_..."

snipara_project_slug = "your-project"

# Agent memory (requires Snipara)

memory_enabled = false

CLI Commands

Command	Description	Example
`rlm init`	Initialize project with config	`rlm init`
`rlm run`	Execute a prompt	`rlm run "Parse the JSON files"`
`rlm agent`	Run autonomous agent	`rlm agent "Analyze CSV files"`
`rlm logs`	View execution trajectory	`rlm logs`
`rlm visualize`	Launch trajectory dashboard	`rlm visualize --port 8502`
`rlm mcp-serve`	Start MCP server for Claude	`rlm mcp-serve`
`rlm doctor`	Check system health	`rlm doctor`

Execution Environments

Local

Fastest iteration with in-process execution. Uses RestrictedPython for sandboxing. Limited isolation. Suitable for development or trusted inputs only.

environment="local"

Docker

Stronger isolation with container execution. Configurable resource limits (CPU, memory). Network disabled by default. Recommended for production.

environment="docker"

WebAssembly

Browser-safe execution via Pyodide. Strongest isolation with no filesystem or network access. Suitable for web-based integrations.

environment="wasm"

Advanced Features

Structured Outputs

JSON schema-constrained responses via the response_format parameter. Get deterministic, parseable output from any completion.

Multi-Modal Support

Image and audio input via list-based Message.content for vision and audio tasks alongside code execution.

Streaming

Real-time token streaming for simple completions via rlm.stream(). See tokens as they arrive.

async for chunk in rlm.stream("Explain X"):

    print(chunk, end="")

Cost Tracking

Per-model pricing, cost budgets, and token breakdown. Token budget enforcement is active — not just configured. Per-call cost tracked in trajectory events.

Trajectory Visualization

Interactive Streamlit dashboard with execution tree, token charts, duration analysis, tool distribution, and cost breakdown.

rlm visualize --dir ./logs

Agent Memory

Persistent context via Snipara rlm_remember/rlm_recall. Gated by memory_enabled config. Requires Snipara integration.

Snipara Tools

When Snipara is configured, these tools become available to the LLM:

Tool	Description	Use Case
`context_query`	Semantic search for documentation	"How does authentication work?"
`shared_context`	Get team best practices	"What are our error handling conventions?"
`decompose`	Break complex queries into sub-queries	"Plan how to implement user permissions"
`multi_query`	Execute multiple queries efficiently	"Get info on auth, database, and API"
`search`	Regex pattern search	"Find all TODO comments"
`sections`	List all documentation sections	"What documentation is available?"
`rlm_remember`	Store a memory for later recall	"Remember this API decision"
`rlm_recall`	Semantically recall stored memories	"What did we decide about caching?"

Example: Context-Aware Code Generation

from rlm import RLM

rlm = RLM(

    model="claude-sonnet-4-20250514",

    environment="docker",

    snipara_api_key="rlm_...",

    snipara_project_slug="my-app"

# The LLM will:

# 1. Call context_query to find auth patterns

# 2. Call shared_context to get coding standards

# 3. Execute code to explore existing files

# 4. Spawn sub-LLM calls for focused sub-tasks

# 5. Generate new code following conventions

result = await rlm.completion("""

    Add a password reset endpoint to our auth system.

    Follow our existing patterns and coding standards.

    Include error handling and tests.

""")

print(result.response)

print(f"Tool calls: {result.tool_calls}")

print(f"Total tokens: {result.total_tokens}")

With vs Without Snipara

Feature	Without Snipara	With Snipara
File reading	Direct (full content)	Semantic (relevant only)
Token usage	High (500K+ possible)	Optimized (5K typical)
Search	Regex only	Hybrid (keyword + semantic)
Best practices	None	Shared team context
Summaries	None	Cached summaries
Agent memory	None	Persistent rlm_remember/recall

Trajectory Logging

Every call emits JSONL events with:

trajectory_id - Unique ID for the request
call_id - ID of this specific call
parent_call_id - ID of parent call (for sub-calls)
sub_call_type - Type of sub-call (for sub-LLM orchestration)
depth - Recursion depth
prompt / response - Input/output
tool_calls / tool_results - Tool usage
token_usage / duration_ms - Metrics
cost - Per-call cost tracking

Troubleshooting

Issue	Cause	Solution
"No providers configured"	Missing API keys	Set OPENAI_API_KEY or ANTHROPIC_API_KEY
"Recursion depth exceeded"	Task too complex	Increase max_depth or reduce task scope
"Sandbox timeout"	Slow execution	Increase environment timeout
"Token budget exhausted"	Token limit hit	Increase token_budget or simplify task
"Cost limit exceeded"	Dollar cap reached	Increase cost_limit in agent config
"Agent iteration limit"	Too many cycles	Increase max_iterations (up to 50)
"Invalid API key"	Snipara key issue	Check key starts with rlm_
"Project not found"	Wrong project slug	Verify slug in dashboard

Next Steps

RELP: Recursive Context

Learn about recursive context decomposition.

MCP Tools

Complete reference for all Snipara MCP tools.

Shared Context

Share coding standards and best practices.

GitHub Repository

View source code and contribute.