Your agent sends 1,440 heartbeat checks per day to Claude in the cloud. Each one gets back "nothing to report." What if those checks ran on a local model—on your own hardware—for free?
The Problem: Paying Cloud Prices for Simple Tasks
Not every request needs a frontier model. Heartbeat checks, simple acknowledgments, and basic status queries are trivially easy for even a small language model. Yet by default, OpenClaw routes everything to your configured cloud provider.
The math is harsh:
- 1,440 heartbeats/day × 400 tokens each × $3/1M tokens = $5.18/month on "nothing to report"
- Add simple queries and acknowledgments: easily $10–$20/month on tasks a 7B parameter model handles perfectly.
The Solution: Local Lightweight Routing (Coming Soon)
Note: This feature is defined in ClawBridge's roadmap and will be available in a future release. This article explains the concept and how it will work.
ClawBridge's Diagnostic A08 will detect requests that can be handled locally and recommend deploying a lightweight model via Ollama on the same machine running your OpenClaw agent.
How It Will Work
- Detection: ClawBridge analyzes your request history and identifies patterns of simple, repetitive requests (heartbeats, acknowledgments, status checks).
- Hardware check: Verifies your machine has sufficient resources for a local model (typically 8GB+ RAM for a 7B model).
- Recommendation: Suggests installing Ollama with a lightweight model (e.g., Llama 3.1 8B, Phi-3, or Gemma 2B).
- Routing configuration: Sets up OpenClaw to route simple requests to
localhost:11434(Ollama's default endpoint) while keeping complex tasks on the cloud model.
Estimated Savings
| Request Type | Current (Cloud) | After (Local) | Savings |
|---|---|---|---|
| Heartbeats | $5.18/mo | $0.00 | 100% |
| Simple queries | $3–$8/mo | $0.00 | 100% |
| Acknowledgments | $2–$5/mo | $0.00 | 100% |
| Total | $10–$18/mo | Electricity only | ~$15/mo |
Trade-offs
- Hardware requirements: Running a local model requires spare CPU/RAM. Not suitable for resource-constrained servers.
- Latency: Local inference on CPU is slower than cloud GPUs. For heartbeats and simple responses, the delay is typically acceptable (1–3 seconds).
- Quality ceiling: Local models (7B–13B) can handle simple tasks well but struggle with nuanced reasoning. The routing logic must correctly classify request complexity.
- macOS users: OpenClaw's Ollama integration has known limitations on macOS (hardcoded
127.0.0.1:11434, LaunchAgent sandbox restrictions). Workarounds may require SSH tunnels.
Prepare Now
While this feature is in development, you can:
- Install Ollama and test a local model on your server.
- Familiarize yourself with
openclaw modelsCLI commands. - Use A02: Heartbeat Optimization to reduce heartbeat frequency in the meantime.
ClawBridge is free and open source (MIT License) — install it in seconds, own it forever. Get ClawBridge Free →