Skip to main content
This is a beta feature according to Algolia’s Terms of Service (“Beta Services”).
To reduce token costs and improve response time, Agent Studio caches identical LLM responses by default.

How caching works

When a user sends a message, Agent Studio checks the cache before querying the LLM:
  1. Agent Studio generates a cache key from the request (messages, agent configuration, and tools).
  2. If a cached response exists and hasn’t expired, it’s returned without calling the LLM.
  3. If no cache entry exists, the LLM generates a response, which is then cached for future requests.
Caching only applies to the first message in a conversation. Follow-up messages in multi-turn conversations always call the LLM directly and don’t include cache headers.
Cache keys are normalized to ensure consistent matching:
  • Message content is trimmed (leading/trailing whitespace removed).
  • Tool call IDs, names, arguments, and outputs are included in the key.
  • Tool configurations are included in the key.

Cache expiration

Cache entries expire based on your app’s data retention setting. Default is 90 days. For time-sensitive use cases, consider date awareness or invalidating a stale cache.

Cache headers

Responses include headers indicating cache status:
HeaderDescription
X-CacheHIT or MISS
Cache-StatusCompliant with RFC 9211. For example, AgentStudio; hit or AgentStudio; fwd=miss
Use these headers to monitor cache performance in your app.

Cache behavior

ScenarioBehavior
Identical messagesReturns cached response
Different message casingNormalized, returns cached response
Different agent configurationDifferent cache key, fresh LLM call
Tool output changesDifferent cache key, fresh LLM call
Multi-turn conversation (2+ messages)Cache bypassed, always calls LLM, no cache headers

Streaming and non-streaming

Streaming (stream=true) and non-streaming responses share the same cache. A cached non-streaming response can serve a streaming request and vice versa.

Turn off cache

You can turn off caching at different levels depending on your needs:
Turn off caching for a single request by adding the cache query parameter:
curl -X POST 'https://{APPLICATION_ID}.algolia.net/agent-studio/1/agents/{agentId}/completions?cache=false' \
  -H 'x-algolia-application-id: {APPLICATION_ID}' \
  -H 'x-algolia-api-key: {API_KEY}' \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Your message here"
      }
    ]
  }'
When cache=false, the request bypasses the cache entirely—it won’t check for cached responses or store new ones.
Playground testing: The Agent Studio playground (test agent) always bypasses cache to ensure you see fresh responses during development. Production agents use caching unless explicitly turned off.

Date awareness and caching

Agents ignore the current date and time by default, maximizing cache efficiency. If you enable date_aware or datetime_aware features, the injected date and time becomes part of the system prompt, which changes the cache key:
FeatureCache effect
None (default)Cache persists until retention period expires
date_awareCache effectively resets once per day
datetime_awareCache effectively resets every minute
This is a tradeoff: time awareness gives fresher responses but reduces cache hits.

Options

  • Default: maximum cache hits, ideal for stable content.
  • Date-aware: refresh once per day for news, events, and promotions.
  • DateTime-aware: minute-level freshness for support at scale (still saves on repeated queries within 60 s).
  • Client-side: full control. Inject time yourself in the first message.
For configuration and a client-side code example, see Experimental features.

Cache invalidation

Clear the cache for a specific agent using the API:
curl -X DELETE 'https://{{APPLICATION_ID}}.algolia.net/agent-studio/1/agents/{{agentId}}/cache' \
  -H 'x-algolia-application-id: {{APPLICATION_ID}}' \
  -H 'x-algolia-api-key: {{API_KEY}}'
Requires an API key with editSettings ACL. The response indicates the number of cache entries deleted:
JSON
{
  "deleted": 42
}

Partial invalidation

Invalidate only entries created before a specific date:
curl -X DELETE 'https://{{APPLICATION_ID}}.algolia.net/agent-studio/1/agents/{{agentId}}/cache?before=2025-12-01' \
  -H 'x-algolia-application-id: {{APPLICATION_ID}}' \
  -H 'x-algolia-api-key: {{API_KEY}}'
This deletes cache entries created before 2025-12-01. Entries from that date are kept.

Invalidate after index rebuild

Automatically invalidate stale cache when your search updates:
import algoliasearch from 'algoliasearch';

const client = algoliasearch('ALGOLIA_APPLICATION_ID', 'ALGOLIA_API_KEY');

async function invalidateCacheBeforeIndexUpdate(agentId, indexName) {
  // Retrieve index metadata and last update timestamp
  const { items } = await client.listIndices();
  const index = items.find(i => i.name === indexName);

  if (!index) throw new Error(`Index ${indexName} not found`);

  // Convert to YYYY-MM-DD format
  const before = index.updatedAt.split('T')[0];

  // This keeps recent cache entries and deletes those created before 2025-12-01.
  const response = await fetch(
    `https://${ALGOLIA_APPLICATION_ID}.algolia.net/agent-studio/1/agents/${agentId}/cache?before=${before}`,
    {
      method: 'DELETE',
      headers: {
        'x-algolia-application-id': process.env.ALGOLIA_APPLICATION_ID,
        'x-algolia-api-key': process.env.ALGOLIA_API_KEY,
      },
    }
  );

  return response.json(); // { deleted: 42 }
}

Required ACLs

This pattern requires an API key with:
  • listIndexes to read index metadata with listIndices().
  • editSettings to invalidate the agent cache.

Error handling

If the cache service is temporarily unavailable, you receive a 503 Service Unavailable error:
JSON
{
  "error": "Service temporarily unavailable",
  "status": 503
}
When this occurs:
  • Your invalidation request couldn’t be processed
  • Try again after a short delay
  • Consider adding a retry mechanism with increasing wait times between attempts for production systems
Cache invalidation errors don’t affect normal agent operations. If the cache is unavailable, requests proceed without caching.

How cache freshness works

Cache keys include your agent’s configuration, so responses are automatically fresh when you change:
  • Agent instructions or system prompt
  • Model or provider settings
  • Tool configurations
You don’t need to manually invalidate the cache for these changes. Each configuration automatically creates new cache entries.

When to manually invalidate

Manually invalidate the cache only when your Algolia index data changes (for example, new products or updated content). The cache key includes tool configuration but not tool output. When your search index updates, cached responses may still reference stale search results.

Monitoring

Track cache performance using the X-Cache response header. A high hit rate means fewer requests to the LLM and lower costs.
JavaScript
const response = await fetch(completionUrl, options);
const cacheStatus = response.headers.get('X-Cache');
console.log(`Cache: ${cacheStatus}`); // Values can be "HIT" or "MISS", depending on cache status

Troubleshooting

Verify caching

Check the response headers to confirm caching behavior:
JavaScript
const response = await fetch(completionUrl, options);

// Check cache status
const cacheStatus = response.headers.get('X-Cache');
console.log(`Cache status: ${cacheStatus}`); // "HIT" or "MISS"

// For more detailed information
const cacheStatusRFC = response.headers.get('Cache-Status');
console.log(`Cache details: ${cacheStatusRFC}`); // For example, "AgentStudio; hit"
Expected values:
  • X-Cache: HIT. Response served from cache
  • X-Cache: MISS. Fresh response from LLM

Debug all cache MISS responses

If you’re seeing only MISS responses when you expect cache hits:
  1. Check retention settings. Ensure maxRetentionDays > 0
    • Privacy mode (maxRetentionDays: 0) turns off caching
  2. Verify useCache is enabled: check your agent configuration
    JSON
    {
      "config": {
        "useCache": true  // Should be true (default)
      }
    }
    
  3. Confirm you’re not in playground: test agents in the playground bypass cache
  4. Check query parameters: ensure you’re not sending ?cache=false
  5. Check whether this is a follow-up message: caching only applies to the first message in a conversation. Any request with prior conversation history (2 or more messages) bypasses the cache entirely and won’t include X-Cache or Cache-Status headers.
  6. Verify identical requests. Cache keys are sensitive to:
    • Message content (normalized for whitespace)
    • Agent configuration
    • Tool outputs
    • Date/time awareness settings

Handle cache service errors

If cache invalidation returns a 503 error:
JavaScript
async function invalidateCacheWithRetry(agentId, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(
        `https://${APP_ID}.algolia.net/agent-studio/1/agents/${agentId}/cache`,
        {
          method: 'DELETE',
          headers: {
            'x-algolia-application-id': APP_ID,
            'x-algolia-api-key': API_KEY,
          },
        }
      );

      if (response.status === 503) {
        if (attempt < maxRetries) {
          // Wait with exponential backoff
          await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
          continue;
        }
        throw new Error('Cache service unavailable after retries');
      }

      return await response.json();
    } catch (error) {
      if (attempt === maxRetries) throw error;
    }
  }
}
Fallback behavior: If the cache service is unavailable, Agent Studio continues processing requests normally without caching. Your users won’t experience interruptions, but responses won’t be cached until the service recovers.

See also

Last modified on March 5, 2026