Caching - Algolia

This is a beta feature according to Algolia’s Terms of Service (“Beta Services”).

To reduce token costs and improve response time, Agent Studio caches identical LLM responses by default.

How caching works

When a user sends a message, Agent Studio checks the cache before querying the LLM:

Agent Studio generates a cache key from the request (messages, agent configuration, and tools).
If a cached response exists and hasn’t expired, it’s returned without calling the LLM.
If no cache entry exists, the LLM generates a response, which is then cached for future requests.

Caching only applies to the first message in a conversation. Follow-up messages in multi-turn conversations always call the LLM directly and don’t include cache headers.

Cache keys are normalized to ensure consistent matching:

Message content is trimmed (leading/trailing whitespace removed).
Tool call IDs, names, arguments, and outputs are included in the key.
Tool configurations are included in the key.

Cache expiration

Cache entries expire based on your app’s data retention setting. Default is 90 days. For time-sensitive use cases, consider date awareness or invalidating a stale cache.

Cache headers

Responses include headers indicating cache status:

Header	Description
`X-Cache`	`HIT` or `MISS`
`Cache-Status`	Compliant with RFC 9211. For example, `AgentStudio; hit` or `AgentStudio; fwd=miss`

Use these headers to monitor cache performance in your app.

Cache behavior

Scenario	Behavior
Identical messages	Returns cached response
Different message casing	Normalized, returns cached response
Different agent configuration	Different cache key, fresh LLM call
Tool output changes	Different cache key, fresh LLM call
Multi-turn conversation (2+ messages)	Cache bypassed, always calls LLM, no cache headers

Streaming and non-streaming

Streaming (stream=true) and non-streaming responses share the same cache. A cached non-streaming response can serve a streaming request and vice versa.

Turn off cache

You can turn off caching at different levels depending on your needs:

Per request
Per agent
Privacy mode

Turn off caching for a single request by adding the cache query parameter:

Command line

curl -X POST "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID/completions?cache=false" \
  -H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
  -H "x-algolia-api-key: $ALGOLIA_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Your message here"
      }
    ]
  }'

When cache=false, the request bypasses the cache entirely—it won’t check for cached responses or store new ones.

Turn off caching for all requests to a specific agent using the useCache configuration:Dashboard:

Go to your agent settings.
Under Performance settings, deselect Use cache.
Save your changes.

API:

JSON

{
  "id": "my-agent",
  "name": "My Agent",
  "config": {
    "useCache": false
  }
}

With useCache: false, all requests to this agent bypass caching.

When data retention is set to 0 (privacy mode), caching is automatically turned off:

JSON

{
  "id": "private-agent",
  "name": "Private Agent",
  "config": {
    "maxRetentionDays": 0
  }
}

In privacy mode (maxRetentionDays: 0), no data is stored or cached.

Playground testing: The Agent Studio playground (test agent) always bypasses cache to ensure you see fresh responses during development. Production agents use caching unless explicitly turned off.

Date awareness and caching

Agents ignore the current date and time by default, maximizing cache efficiency. If you enable date_aware or datetime_aware features, the injected date and time becomes part of the system prompt, which changes the cache key:

Feature	Cache effect
None (default)	Cache persists until retention period expires
`date_aware`	Cache effectively resets once per day
`datetime_aware`	Cache effectively resets every minute

This is a tradeoff: time awareness gives fresher responses but reduces cache hits.

Options

Default: maximum cache hits, ideal for stable content.
Date-aware: refresh once per day for news, events, and promotions.
DateTime-aware: minute-level freshness for support at scale (still saves on repeated queries within 60 s).
Client-side: full control. Inject time yourself in the first message.

For configuration and a client-side code example, see Experimental features.

Cache invalidation

Clear the cache for a specific agent using the API:

Command line

curl -X DELETE "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID/cache" \
  -H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
  -H "x-algolia-api-key: $ALGOLIA_API_KEY"

Requires an API key with editSettings ACL. The response indicates the number of cache entries deleted:

JSON

{
  "deleted": 42
}

Partial invalidation

Invalidate only entries created before a specific date:

Command line

curl -X DELETE "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID/cache?before=2025-12-01" \
  -H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
  -H "x-algolia-api-key: $ALGOLIA_API_KEY"

This deletes cache entries created before 2025-12-01. Entries from that date are kept.

Invalidate after index rebuild

Automatically invalidate stale cache when your search updates:

import algoliasearch from 'algoliasearch';

const appID = "ALGOLIA_APPLICATION_ID";
const apiKey = "ALGOLIA_API_KEY";

const client = algoliasearch(appID, apiKey);

async function invalidateCacheBeforeIndexUpdate(agentId, indexName) {
  // Retrieve index metadata and last update timestamp
  const { items } = await client.listIndices();
  const index = items.find(i => i.name === indexName);

  if (!index) throw new Error(`Index ${indexName} not found`);

  // Convert to YYYY-MM-DD format
  const before = index.updatedAt.split('T')[0];

  // This keeps recent cache entries and deletes those created before 2025-12-01.
  const response = await fetch(
    `https://${appID}.algolia.net/agent-studio/1/agents/${agentId}/cache?before=${before}`,
    {
      method: 'DELETE',
      headers: {
        'x-algolia-application-id': appID,
        'x-algolia-api-key': apiKey,
      },
    }
  );

  return response.json(); // { deleted: 42 }
}

Required ACLs

This pattern requires an API key with:

listIndexes to read index metadata with listIndices().
editSettings to invalidate the agent cache.

Error handling

If the cache service is temporarily unavailable, you receive a 503 Service Unavailable error:

JSON

{
  "error": "Service temporarily unavailable",
  "status": 503
}

When this occurs:

Your invalidation request couldn’t be processed
Try again after a short delay
Consider adding a retry mechanism with increasing wait times between attempts for production systems

Cache invalidation errors don’t affect normal agent operations. If the cache is unavailable, requests proceed without caching.

How cache freshness works

Cache keys include your agent’s configuration, so responses are automatically fresh when you change:

Agent instructions or system prompt
Model or provider settings
Tool configurations

You don’t need to manually invalidate the cache for these changes. Each configuration automatically creates new cache entries.

When to manually invalidate

Manually invalidate the cache only when your Algolia index data changes (for example, new products or updated content). The cache key includes tool configuration but not tool output. When your search index updates, cached responses may still reference stale search results.

Monitoring

Track cache performance using the X-Cache response header. A high hit rate means fewer requests to the LLM and lower costs.

JavaScript

const response = await fetch(completionUrl, options);
const cacheStatus = response.headers.get('X-Cache');
console.log(`Cache: ${cacheStatus}`); // Values can be "HIT" or "MISS", depending on cache status

Troubleshooting

Verify caching

Check the response headers to confirm caching behavior:

JavaScript

const response = await fetch(completionUrl, options);

// Check cache status
const cacheStatus = response.headers.get('X-Cache');
console.log(`Cache status: ${cacheStatus}`); // "HIT" or "MISS"

// For more detailed information
const cacheStatusRFC = response.headers.get('Cache-Status');
console.log(`Cache details: ${cacheStatusRFC}`); // For example, "AgentStudio; hit"

Expected values:

X-Cache: HIT. Response served from cache
X-Cache: MISS. Fresh response from LLM

Debug all cache MISS responses

If you’re seeing only MISS responses when you expect cache hits:

Check retention settings. Ensure maxRetentionDays > 0
- Privacy mode (maxRetentionDays: 0) turns off caching

Verify useCache is enabled: check your agent configuration

JSON

{
  "config": {
    "useCache": true  // Should be true (default)
  }
}

Confirm you’re not in playground: test agents in the playground bypass cache
Check query parameters: ensure you’re not sending ?cache=false
Check whether this is a follow-up message: caching only applies to the first message in a conversation. Any request with prior conversation history (2 or more messages) bypasses the cache entirely and won’t include X-Cache or Cache-Status headers.
Verify identical requests. Cache keys are sensitive to:
- Message content (normalized for whitespace)
- Agent configuration
- Tool outputs
- Date/time awareness settings

Handle cache service errors

If cache invalidation returns a 503 error:

JavaScript

async function invalidateCacheWithRetry(agentId, maxRetries = 3) {
  const appID = "ALGOLIA_APPLICATION_ID";
  const apiKey = "ALGOLIA_API_KEY";

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(
        `https://${appID}.algolia.net/agent-studio/1/agents/${agentId}/cache`,
        {
          method: 'DELETE',
          headers: {
            'x-algolia-application-id': appID,
            'x-algolia-api-key': apiKey,
          },
        }
      );

      if (response.status === 503) {
        if (attempt < maxRetries) {
          // Wait with exponential backoff
          await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
          continue;
        }
        throw new Error('Cache service unavailable after retries');
      }

      return await response.json();
    } catch (error) {
      if (attempt === maxRetries) throw error;
    }
  }
}

Fallback behavior: If the cache service is unavailable, Agent Studio continues processing requests normally without caching. Your users won’t experience interruptions, but responses won’t be cached until the service recovers.

​How caching works

​Cache expiration

​Cache headers

​Cache behavior

​Streaming and non-streaming

​Turn off cache

​Date awareness and caching

​Options

​Cache invalidation

​Partial invalidation

​Invalidate after index rebuild

​Required ACLs

​Error handling

​How cache freshness works

​When to manually invalidate

​Monitoring

​Troubleshooting

​Verify caching

​Debug all cache MISS responses

​Handle cache service errors

​See also

How caching works

Cache expiration

Cache headers

Cache behavior

Streaming and non-streaming

Turn off cache

Date awareness and caching

Options

Cache invalidation

Partial invalidation

Invalidate after index rebuild

Required ACLs

Error handling

How cache freshness works

When to manually invalidate

Monitoring

Troubleshooting

Verify caching

Debug all cache MISS responses

Handle cache service errors

See also