This is a beta feature according to Algolia’s Terms of Service (“Beta Services”).
How caching works
When a user sends a message, Agent Studio checks the cache before querying the LLM:- Agent Studio generates a cache key from the request (messages, agent configuration, and tools).
- If a cached response exists and hasn’t expired, it’s returned without calling the LLM.
- If no cache entry exists, the LLM generates a response, which is then cached for future requests.
Caching only applies to the first message in a conversation.
Follow-up messages in multi-turn conversations always call the LLM directly and don’t include cache headers.
- Message content is trimmed (leading/trailing whitespace removed).
- Tool call IDs, names, arguments, and outputs are included in the key.
- Tool configurations are included in the key.
Cache expiration
Cache entries expire based on your app’s data retention setting. Default is 90 days. For time-sensitive use cases, consider date awareness or invalidating a stale cache.Cache headers
Responses include headers indicating cache status:| Header | Description |
|---|---|
X-Cache | HIT or MISS |
Cache-Status | Compliant with RFC 9211. For example, AgentStudio; hit or AgentStudio; fwd=miss |
Cache behavior
| Scenario | Behavior |
|---|---|
| Identical messages | Returns cached response |
| Different message casing | Normalized, returns cached response |
| Different agent configuration | Different cache key, fresh LLM call |
| Tool output changes | Different cache key, fresh LLM call |
| Multi-turn conversation (2+ messages) | Cache bypassed, always calls LLM, no cache headers |
Streaming and non-streaming
Streaming (stream=true) and non-streaming responses share the same cache.
A cached non-streaming response can serve a streaming request and vice versa.
Turn off cache
You can turn off caching at different levels depending on your needs:- Per request
- Per agent
- Privacy mode
Turn off caching for a single request by adding the When
cache query parameter:cache=false, the request bypasses the cache entirely—it won’t check for cached responses or store new ones.Playground testing: The Agent Studio playground (test agent) always bypasses cache to ensure you see fresh responses during development. Production agents use caching unless explicitly turned off.
Date awareness and caching
Agents ignore the current date and time by default, maximizing cache efficiency. If you enabledate_aware or datetime_aware features,
the injected date and time becomes part of the system prompt,
which changes the cache key:
| Feature | Cache effect |
|---|---|
| None (default) | Cache persists until retention period expires |
date_aware | Cache effectively resets once per day |
datetime_aware | Cache effectively resets every minute |
Options
- Default: maximum cache hits, ideal for stable content.
- Date-aware: refresh once per day for news, events, and promotions.
- DateTime-aware: minute-level freshness for support at scale (still saves on repeated queries within 60 s).
- Client-side: full control. Inject time yourself in the first message.
Cache invalidation
Clear the cache for a specific agent using the API:editSettings ACL.
The response indicates the number of cache entries deleted:
JSON
Partial invalidation
Invalidate only entries created before a specific date:2025-12-01.
Entries from that date are kept.
Invalidate after index rebuild
Automatically invalidate stale cache when your search updates:Required ACLs
This pattern requires an API key with:listIndexesto read index metadata withlistIndices().editSettingsto invalidate the agent cache.
Error handling
If the cache service is temporarily unavailable, you receive a503 Service Unavailable error:
JSON
- Your invalidation request couldn’t be processed
- Try again after a short delay
- Consider adding a retry mechanism with increasing wait times between attempts for production systems
Cache invalidation errors don’t affect normal agent operations. If the cache is unavailable, requests proceed without caching.
How cache freshness works
Cache keys include your agent’s configuration, so responses are automatically fresh when you change:- Agent instructions or system prompt
- Model or provider settings
- Tool configurations
When to manually invalidate
Manually invalidate the cache only when your Algolia index data changes (for example, new products or updated content). The cache key includes tool configuration but not tool output. When your search index updates, cached responses may still reference stale search results.Monitoring
Track cache performance using theX-Cache response header.
A high hit rate means fewer requests to the LLM and lower costs.
JavaScript
Troubleshooting
Verify caching
Check the response headers to confirm caching behavior:JavaScript
X-Cache: HIT. Response served from cacheX-Cache: MISS. Fresh response from LLM
Debug all cache MISS responses
If you’re seeing onlyMISS responses when you expect cache hits:
-
Check retention settings. Ensure
maxRetentionDays > 0- Privacy mode (
maxRetentionDays: 0) turns off caching
- Privacy mode (
-
Verify
useCacheis enabled: check your agent configurationJSON - Confirm you’re not in playground: test agents in the playground bypass cache
-
Check query parameters: ensure you’re not sending
?cache=false -
Check whether this is a follow-up message: caching only applies to the first message in a conversation.
Any request with prior conversation history (2 or more messages) bypasses the cache entirely and won’t include
X-CacheorCache-Statusheaders. -
Verify identical requests. Cache keys are sensitive to:
- Message content (normalized for whitespace)
- Agent configuration
- Tool outputs
- Date/time awareness settings
Handle cache service errors
If cache invalidation returns a 503 error:JavaScript
Fallback behavior: If the cache service is unavailable, Agent Studio continues processing requests normally without caching. Your users won’t experience interruptions, but responses won’t be cached until the service recovers.