Every turn, your agent sends the same system prompt—thousands of tokens—to the API. Without caching, you pay full price for those same tokens every single time. With caching, subsequent turns cost 90% less.
The Problem: Re-Processing the Same Prompt
Your OpenClaw agent's system prompt includes:
- SOUL.md personality directives
- Skill instructions for every installed skill
- Memory context
- User preferences
This can easily total 3,000–10,000 tokens. On every interaction, these tokens are sent to the API as input. Without prompt caching:
- Turn 1: 5,000 system tokens → full price
- Turn 2: Same 5,000 tokens + new message → full price again
- Turn 100: Still paying full price for the same 5,000 tokens
At $3/1M input tokens (Sonnet), 5,000 tokens × 300 turns/day = $4.50/day just for the system prompt.
How ClawBridge Detects This (Diagnostic A06)
The Cost Control Center checks:
- Is prompt caching enabled? — If not, immediate recommendation.
- Cache hit rate — If caching is enabled but the hit rate is below 50%, there's likely a configuration or prompt ordering issue.
- Potential savings — Calculates the difference between full-price and cached-price for your daily system prompt volume.
One-Tap Fix
Tap Apply to enable prompt caching in your OpenClaw configuration. ClawBridge sets the appropriate caching parameters for your provider (Anthropic, OpenAI, etc.).
How Prompt Caching Works
When caching is enabled:
- The first turn sends the full system prompt at normal price.
- Subsequent turns reuse the cached version at a 90% discount on input tokens.
- The cache resets after a period of inactivity (typically ~5 minutes with no requests).
The savings are immediate and automatic—no changes to your agent's behavior or output.
Trade-offs
- Cache TTL: If your agent is idle for more than ~5 minutes between requests, the cache expires and the next turn pays full price. For sporadically-used agents, savings may be lower.
- Prompt changes invalidate cache: If your system prompt changes frequently (e.g., dynamic memory injection on every turn), caching is less effective.
- Provider support: Not all providers support prompt caching equally. Anthropic supports it natively; other providers may have different mechanisms.
Real Numbers
System prompt of 5,000 tokens, 300 interactions/day, Claude Sonnet:
| Scenario | Daily Input Cost | Monthly Cost |
|---|---|---|
| No caching | 300 × 5,000 × $3/1M = $4.50 | $135 |
| With caching (90% hit rate) | 30 full + 270 cached (90% off) = $0.86 | $25.80 |
| Savings | $109.20/mo |
Prompt caching is often the single highest-impact optimization in the entire Cost Control suite.
FAQ
Q: Does caching affect response quality? A: No. The AI receives exactly the same information. Caching is a provider-side optimization—the model doesn't even know it's happening.
Q: What if I update SOUL.md? A: The cache automatically invalidates when the system prompt changes. The next turn pays full price, then subsequent turns are cached again.
ClawBridge is free and open source (MIT License) — install it in seconds, own it forever. Get ClawBridge Free →