You can configure Agent Studio at two levels: app-wide and per agent.
Both are updated from the API with changes taking effect immediately.
App settings
Configure app-wide behavior using the /configuration endpoint.
Data retention
Control how long Agent Studio retains your data:
curl -X PATCH "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/configuration" \
-H 'Content-Type: application/json' \
-H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
-H "x-algolia-api-key: $ALGOLIA_API_KEY" \
-d '{ "maxRetentionDays": 30 }'
This operation requires an API key with the logs ACL.
| Value | Effect |
|---|
90 (default) | Data retained for 90 days |
60 | Data retained for 60 days |
30 | Data retained for 30 days |
0 | Privacy mode (see below) |
Data affected by retention settings
| Data | Behavior |
|---|
| Completion cache | Cached responses expire after the retention period |
| Conversations | Conversation history deleted after the retention period |
| Messages | Message content deleted after the retention period |
Privacy mode (maxRetentionDays: 0)
When set to 0, Agent Studio operates in privacy mode:
- Completion caching is turned off (every request calls the LLM)
- Agent Studio saves conversation metadata but the message content isn’t stored.
- Ideal for strict data privacy requirements
In privacy mode, the agent only sees the messages your client sends in each completion request.
To preserve context across turns, include the full message history every time.
Sending only the latest user message makes each request stateless.
Conversation history
Conversations are automatically stored per retention settings. Each conversation gets an auto-generated title based on content.
What’s stored:
- Conversation metadata (ID, timestamps, user token)
- Message content (user queries, assistant responses, tool calls)
- Auto-generated titles for browsing
For GDPR compliance, users can export or delete their data with the
GET /user-data/{userToken} and DELETE /user-data/{userToken} endpoints.
For more information, see the API reference.
Agent settings
Configure individual agents using the /agents/{agentId} endpoint.
Agent properties
| Property | Type | Description |
|---|
name | string | Display name (1-128 chars) |
description | string | Optional description |
providerId | UUID | LLM provider credentials |
model | string | Model identifier. For example, gpt-5, gemini-2.5-pro |
instructions | string | System prompt |
config | object | Feature flags and settings |
tools | array | Algolia search and custom tools |
Update agent settings
Update any property without affecting others:
curl -X PATCH "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID" \
-H 'Content-Type: application/json' \
-H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
-H "x-algolia-api-key: $ALGOLIA_API_KEY" \
-d '{ "instructions": "You are a helpful shopping assistant." }'
This operation requires an API key with the editSettings ACL.
Configuration options
The config object controls agent behavior:
| Option | Type | Default | Description |
|---|
sendUsage | boolean | false | Include token usage in response |
sendReasoning | boolean | false | Include model reasoning (if supported) |
useCache | boolean | true | Enable response caching |
features | array | [] | Experimental features |
suggestions | object | null | Prompt suggestions (see below) |
max_tokens | integer | 0 | Cap on output tokens per LLM call (see Cost control) |
max_iterations | integer | 50 | Maximum tool or reasoning loops per request |
thread_depth | object | null | Conversation length limit |
rate_limit | object | null | Per-agent and per-IP request limits (see Rate limiting) |
Prompt suggestions
Generate contextual follow-up questions after each agent response. Suggestions help users discover capabilities and continue conversations naturally.
{
"config": {
"suggestions": {
"enabled": true,
"model": "gpt-5-mini"
}
}
}
When enabled, the agent streams a suggestions-chunk after the main response:
{
"type": "suggestions-chunk",
"suggestions": ["How do I filter by price?", "Show me trending products", "What categories are available?"]
}
Configuration options
| Option | Type | Default | Description |
|---|
enabled | boolean | false | Enable prompt suggestions |
model | string | Agent’s model | Model for generating suggestions |
system_prompt | string | Built-in | Custom prompt for suggestion generation |
Generation settings (suggestions.generation):
| Option | Range | Default | Description |
|---|
max_count | 1-5 | 3 | Number of suggestions |
max_words | 5-15 | 8 | Max words per suggestion |
timeout_seconds | 1-30 | 10 | Timeout for generation |
Context settings (suggestions.context):
| Option | Range | Default | Description |
|---|
max_messages | 1-50 | 10 | Conversation history to include |
include_tool_outputs | - | false | Include tool results in context |
Client-side handling
With AI SDK:
import { useChat } from '@ai-sdk/react';
function Chat() {
const { messages, data } = useChat({ /* ... */ });
// Suggestions arrive in the data stream
const suggestions = data?.find(d => d.type === 'suggestions-chunk')?.suggestions;
return (
<>
{/* Chat messages */}
{suggestions && (
<div className="suggestions">
{suggestions.map(s => <button key={s}>{s}</button>)}
</div>
)}
</>
);
}
Use a faster, cheaper model (like gpt-5-mini) for suggestions. They don’t need the same reasoning depth as the main response.
Cost control
Cost control settings limit these sources of token usage:
output per call, the number of reasoning or tool loops, and conversation history size.
{
"config": {
"max_tokens": 1500,
"max_iterations": 20,
"thread_depth": {
"max_messages": 100
}
}
}
| Option | Type | Default | Description |
|---|
max_tokens | integer | 0 | Maximum output tokens per LLM call. 0 uses the model or provider default. |
max_iterations | integer | 50 | Maximum tool or reasoning loops per request. Each loop is a separate LLM call |
thread_depth.max_messages | integer | null | Maximum messages (user and assistant) in a conversation. The API rejects new requests when a conversation exceeds this limit. |
For example, to update an agent’s cost control settings:
curl -X PATCH "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID" \
-H 'Content-Type: application/json' \
-H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
-H "x-algolia-api-key: $ALGOLIA_API_KEY" \
-d '{ "config": { "max_tokens": 1500, "max_iterations": 20, "thread_depth": { "max_messages": 100 } } }'
Default values and no-limit behavior
Each cost control setting has either a default value or no limit:
max_tokens: 0 (or omitted) uses the model or provider default.
max_iterations: 0 (or omitted) uses the default of 50.
thread_depth.max_messages: null, 0, or omitted: the conversation doesn’t have a message limit.
Each iteration is billed as a separate LLM call.
Lower max_iterations if your agent doesn’t need long tool chains.
Rate limiting
Limit how often clients can call an agent’s /completions endpoint.
You can configure two independent rate limits:
- Per-agent: maximum requests an agent can receive within a time interval
- Per-IP: maximum requests a client IP can make to an agent within a time interval
When a limit is exceeded, the API returns a 429 response.
{
"config": {
"rate_limit": {
"agent": {
"enabled": true,
"max_requests": 100,
"window_seconds": 60
},
"ip": {
"enabled": true,
"max_requests": 300,
"window_seconds": 60
}
}
}
}
rate_limit.agent
| Field | Type | Default | Description |
|---|
enabled | boolean | true | If you set any fields on this layer, enabled defaults to true. If you set enabled: false, there’s no request limit. |
max_requests | integer | none | Maximum requests allowed per time interval (minimum is 1). Required when this rate limit is enabled |
window_seconds | integer | 60 | Time interval in seconds. Must be 30 or 60 |
rate_limit.ip
| Field | Type | Default | Description |
|---|
enabled | boolean | true (when any field is set) | If false, this layer is unlimited |
max_requests | integer | none | Maximum requests allowed per IP per time interval (minimum is 1). Required when this rate limit is enabled |
window_seconds | integer | 60 | Time interval in seconds. Must be 30 or 60 |
For example, to update an agent’s rate limit settings:
curl -X PATCH "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID" \
-H 'Content-Type: application/json' \
-H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
-H "x-algolia-api-key: $ALGOLIA_API_KEY" \
-d '{ "config": { "rate_limit": { "agent": { "max_requests": 50, "window_seconds": 60 } } } }'
Default behavior and when limits don’t apply
You can configure the agent and IP rate limits independently:
- If you omit
rate_limit, the API doesn’t enforce agent or IP request limits.
- To turn off either the agent or IP rate limit, set
enabled: false for that limit
429 response
When a limit is exceeded, the API returns:
{
"error": "TOO_MANY_REQUESTS",
"message": "Rate limit exceeded. Retry after 60 seconds."
}
| Header | Description |
|---|
X-RateLimit-Limit | Maximum requests allowed in the current time interval |
X-RateLimit-Remaining | Remaining requests in the current time interval |
Retry-After | Seconds until the current time interval resets |
On successful responses, X-RateLimit-Limit and X-RateLimit-Remaining reflect the configured per-agent limit.
Publish workflow
Agents have two states:
- Draft: test changes in preview.
- Published: live for API consumers.
curl -X POST "https://$ALGOLIA_APPLICATION_ID.algolia.net/agent-studio/1/agents/$AGENT_ID/publish" \
-H "x-algolia-application-id: $ALGOLIA_APPLICATION_ID" \
-H "x-algolia-api-key: $ALGOLIA_API_KEY"
When you make changes to an agent using the PATCH /agents/{agentId} endpoint,
you’re modifying the draft version of the agent.
These changes aren’t visible to API consumers until you publish the agent using the POST /agents/{agentId}/publish endpoint.
See also