curl --request GET \
--url https://crawler.algolia.com/api/1/crawlers/{id}/config/versions/{version} \
--header 'Authorization: Basic <encoded-value>'{
"version": 2,
"config": {
"actions": [
{
"indexName": "algolia_website",
"recordExtractor": {
"__type": "function",
"source": "<string>"
},
"autoGenerateObjectIDs": true,
"cache": {
"enabled": true
},
"discoveryPatterns": [
"https://www.algolia.com/**"
],
"fileTypesToMatch": [
"html",
"pdf"
],
"hostnameAliases": {
"dev.example.com": "example.com"
},
"name": "<string>",
"pathAliases": {
"example.com": {
"/foo": "/bar"
}
},
"pathsToMatch": [
"https://www.algolia.com/**"
],
"schedule": "<string>",
"selectorsToMatch": [
".products",
"!.featured"
]
}
],
"appId": "<string>",
"rateLimit": 4,
"apiKey": "<string>",
"exclusionPatterns": [
"https://www.example.com/excluded",
"!https://www.example.com/this-one-url",
"https://www.example.com/exclude/**"
],
"externalData": [
"testCSV"
],
"extraUrls": [
"<string>"
],
"ignoreCanonicalTo": true,
"ignoreNoFollowTo": true,
"ignoreNoIndex": true,
"ignorePaginationAttributes": true,
"ignoreQueryParams": [
"ref",
"utm_*"
],
"ignoreRobotsTxtRules": true,
"indexPrefix": "crawler_",
"initialIndexSettings": {},
"linkExtractor": {
"__type": "function",
"source": "({ $, url, defaultExtractor }) => {\n if (/example.com\\/doc\\//.test(url.href)) {\n // For all pages under `/doc`, only extract the first found URL.\n return defaultExtractor().slice(0, 1)\n }\n // For all other pages, use the default.\n return defaultExtractor()\n}\n"
},
"login": {
"url": "https://example.com/secure/login-with-post",
"requestOptions": {
"method": "POST",
"headers": {
"Content-Type": "application/x-www-form-urlencoded"
},
"body": "id=my-id&password=my-password",
"timeout": 5000
}
},
"maxDepth": 5,
"maxUrls": 250,
"renderJavaScript": true,
"requestOptions": {
"proxy": "<string>",
"timeout": 30000,
"retries": 3,
"headers": {
"Accept-Language": "fr-FR",
"Authorization": "Bearer Aerehdf==",
"Cookie": "session=1234"
}
},
"safetyChecks": {
"beforeIndexPublishing": {
"maxLostRecordsPercentage": 10,
"maxFailedUrls": 123
}
},
"saveBackup": true,
"schedule": "every weekday at 12:00 pm",
"sitemaps": [
"https://example.com/sitemap.xyz"
],
"startUrls": [
"https://www.example.com"
]
},
"createdAt": "2023-07-04T12:49:15Z",
"authorId": "7d79f0dd-2dab-4296-8098-957a1fdc0637"
}Retrieves the specified version of the crawler configuration.
You can use this to restore a previous version of the configuration.
curl --request GET \
--url https://crawler.algolia.com/api/1/crawlers/{id}/config/versions/{version} \
--header 'Authorization: Basic <encoded-value>'{
"version": 2,
"config": {
"actions": [
{
"indexName": "algolia_website",
"recordExtractor": {
"__type": "function",
"source": "<string>"
},
"autoGenerateObjectIDs": true,
"cache": {
"enabled": true
},
"discoveryPatterns": [
"https://www.algolia.com/**"
],
"fileTypesToMatch": [
"html",
"pdf"
],
"hostnameAliases": {
"dev.example.com": "example.com"
},
"name": "<string>",
"pathAliases": {
"example.com": {
"/foo": "/bar"
}
},
"pathsToMatch": [
"https://www.algolia.com/**"
],
"schedule": "<string>",
"selectorsToMatch": [
".products",
"!.featured"
]
}
],
"appId": "<string>",
"rateLimit": 4,
"apiKey": "<string>",
"exclusionPatterns": [
"https://www.example.com/excluded",
"!https://www.example.com/this-one-url",
"https://www.example.com/exclude/**"
],
"externalData": [
"testCSV"
],
"extraUrls": [
"<string>"
],
"ignoreCanonicalTo": true,
"ignoreNoFollowTo": true,
"ignoreNoIndex": true,
"ignorePaginationAttributes": true,
"ignoreQueryParams": [
"ref",
"utm_*"
],
"ignoreRobotsTxtRules": true,
"indexPrefix": "crawler_",
"initialIndexSettings": {},
"linkExtractor": {
"__type": "function",
"source": "({ $, url, defaultExtractor }) => {\n if (/example.com\\/doc\\//.test(url.href)) {\n // For all pages under `/doc`, only extract the first found URL.\n return defaultExtractor().slice(0, 1)\n }\n // For all other pages, use the default.\n return defaultExtractor()\n}\n"
},
"login": {
"url": "https://example.com/secure/login-with-post",
"requestOptions": {
"method": "POST",
"headers": {
"Content-Type": "application/x-www-form-urlencoded"
},
"body": "id=my-id&password=my-password",
"timeout": 5000
}
},
"maxDepth": 5,
"maxUrls": 250,
"renderJavaScript": true,
"requestOptions": {
"proxy": "<string>",
"timeout": 30000,
"retries": 3,
"headers": {
"Accept-Language": "fr-FR",
"Authorization": "Bearer Aerehdf==",
"Cookie": "session=1234"
}
},
"safetyChecks": {
"beforeIndexPublishing": {
"maxLostRecordsPercentage": 10,
"maxFailedUrls": 123
}
},
"saveBackup": true,
"schedule": "every weekday at 12:00 pm",
"sitemaps": [
"https://example.com/sitemap.xyz"
],
"startUrls": [
"https://www.example.com"
]
},
"createdAt": "2023-07-04T12:49:15Z",
"authorId": "7d79f0dd-2dab-4296-8098-957a1fdc0637"
}settingsBasic authentication header of the form Basic <encoded-value>, where <encoded-value> is the base64-encoded string username:password.
Crawler ID. Universally unique identifier (UUID) of the crawler.
"e0f6db8a-24f5-4092-83a4-1b2c6cb6d809"
This crawler's version nmber.
OK
Version of the configuration. Version 1 is the initial configuration you used when creating the crawler.
x >= 1Crawler configuration.
Show child attributes
A list of actions.
1 - 30 elementsShow child attributes
Reference to the index used to store the action's extracted records.
indexName is combined with the prefix you specified in indexPrefix.
256"algolia_website"
Function for extracting information from a crawled page and transforming it into Algolia records for indexing.
The Crawler has an editor with autocomplete and validation to help you update the recordExtractor.
For details, see the recordExtractor documentation.
Whether to generate an objectID for records that don't have one.
Whether the crawler should cache crawled pages.
For more information, see Partial crawls with caching.
Show child attributes
Whether the crawler cache is active.
Which intermediary web pages the crawler should visit.
Use discoveryPatterns to define pages that should be visited just for their links to other pages,
not their content.
It functions similarly to the pathsToMatch action but without record extraction.
Use micromatch for negation, wildcards, and more.
File types for crawling non-HTML documents.
100For more information, see Extract data from non-HTML documents.
doc, email, html, odp, ods, odt, pdf, ppt, xls ["html", "pdf"]Key-value pairs to replace matching hostnames found in a sitemap, on a page, in canonical links, or redirects.
During a crawl, this action maps one hostname to another whenever the crawler encounters specific URLs.
This helps with links to staging environments (like dev.example.com) or external hosting services (such as YouTube).
For example, with this hostnameAliases mapping:
{
hostnameAliases: {
'dev.example.com': 'example.com'
}
}The crawler encounters https://dev.example.com/solutions/voice-search/.
hostnameAliases transforms the URL to https://example.com/solutions/voice-search/.
The crawler follows the transformed URL (not the original).
hostnameAliases only changes URLs, not page text. In the preceding example, if the extracted text contains the string dev.example.com, it remains unchanged.
The crawler can discover URLs in places such as:
Crawled pages
Sitemaps
Redirects.
However, hostnameAliases doesn't transform URLs you explicitly set in the startUrls or sitemaps parameters,
nor does it affect the pathsToMatch action or other configuration elements.
Show child attributes
Hostname that should be added in the records.
{ "dev.example.com": "example.com" }Unique identifier for the action. This option is required if schedule is set.
Key-value pairs to replace matching paths with new values.
It doesn't replace:
startUrls, sitemaps, pathsToMatch, and other settings.The crawl continues from the transformed URLs.
For example, if you create a mapping for { "dev.example.com": { '/foo': '/bar' } } and the crawler encounters https://dev.example.com/foo/hello/,
it’s transformed to https://dev.example.com/bar/hello/.
Compare with the
hostnameAliasesaction.
{ "example.com": { "/foo": "/bar" } }URLs to which this action should apply.
Uses micromatch for negation, wildcards, and more.
1 - 100 elementsUse micromatch for negation, wildcards, and more.
How often to perform a complete crawl for this action.
For mopre information, consult the schedule parameter documentation.
DOM selectors for nodes that must be present on the page to be processed. If the page doesn't match any of the selectors, it's ignored.
100Prefix a selector with ! to ignore matching pages.
[".products", "!.featured"]Algolia application ID where the crawler creates and updates indices.
Determines the number of concurrent tasks per second that can run for this configuration.
A higher rate limit means more crawls per second. Algolia prevents system overload by ensuring the number of URLs added in the last second and the number of URLs being processed is less than the rate limit:
max(new_urls_added, active_urls_processing) <= rateLimitStart with a low value (for example, 2) and increase it if you need faster crawling.
Be aware that a high rateLimit can have a huge impact on bandwidth cost and server resource consumption.
The number of pages processed per second depends on the average time it takes to fetch, process, and upload a URL.
For a given rateLimit if fetching, processing, and uploading URLs takes (on average):
rateLimit pages per second.rateLimit / 4 pages per second.In the latter case, increasing rateLimit improves performance, up to a point.
However, if the processing time remains at four seconds, increasing rateLimit won't increase the number of pages processed per second.
1 <= x <= 1004
The Algolia API key the crawler uses for indexing records. If you don't provide an API key, one will be generated by the Crawler when you create a configuration.
The API key must have:
search, addObject, deleteObject, deleteIndex, settings, editSettings, listIndexes, browseindexPrefix.
For example, if the prefix is crawler_, the API key must have access to crawler_*.Don't use your Admin API key.
URLs to exclude from crawling.
100Use micromatch for negation, wildcards, and more.
[
"https://www.example.com/excluded",
"!https://www.example.com/this-one-url",
"https://www.example.com/exclude/**"
]References to external data sources for enriching the extracted records.
10For more information, see Enrich extracted records with external data.
The Crawler treats extraUrls the same as startUrls.
Specify extraUrls if you want to differentiate between URLs you manually added to fix site crawling from those you initially specified in startUrls.
9999Determines if the crawler should extract records from a page with a canonical URL.
If ignoreCanonicalTo is set to:
true all canonical URLs are ignored.Determines if the crawler should follow links with a nofollow directive.
If true, the crawler will ignore the nofollow directive and crawl links on the page.
The crawler always ignores links that don't match your configuration settings.
ignoreNoFollowTo applies to:
robots meta tag contains nofollow or none.rel attribute containing the nofollow directive.Whether to ignore the noindex robots meta tag.
If true, pages with this meta tag will be crawled.
Whether the crawler should follow rel="prev" and rel="next" pagination links in the <head> section of an HTML page.
true, the crawler ignores the pagination links.false, the crawler follows the pagination links.Query parameters to ignore while crawling.
All URLs with the matching query parameters are treated as identical. This prevents indexing URLs that just differ by their query parameters.
9999Use wildcards to match multiple query parameters.
["ref", "utm_*"]Whether to ignore rules defined in your robots.txt file.
A prefix for all indices created by this crawler. It's combined with the indexName for each action to form the complete index name.
64"crawler_"
Crawler index settings.
These index settings are only applied during the first crawl of an index.
Any subsequent changes won't be applied to the index. Instead, make changes to your index settings in the Algolia dashboard.
Show child attributes
Index settings.
Show child attributes
Attributes used for faceting.
Facets are attributes that let you categorize search results. They can be used for filtering search results. By default, no attribute is used for faceting. Attribute names are case-sensitive.
Modifiers
filterOnly("ATTRIBUTE").
Allows the attribute to be used as a filter but doesn't evaluate the facet values.
searchable("ATTRIBUTE").
Allows searching for facet values.
afterDistinct("ATTRIBUTE").
Evaluates the facet count after deduplication with distinct.
This ensures accurate facet counts.
You can apply this modifier to searchable facets: afterDistinct(searchable(ATTRIBUTE)).
[
"author",
"filterOnly(isbn)",
"searchable(edition)",
"afterDistinct(category)",
"afterDistinct(searchable(publisher))"
]Creates replica indices.
Replicas are copies of a primary index with the same records but different settings, synonyms, or rules. If you want to offer a different ranking or sorting of your search results, you'll use replica indices. All index operations on a primary index are automatically forwarded to its replicas. To add a replica index, you must provide the complete set of replicas to this parameter. If you omit a replica from this list, the replica turns into a regular, standalone index that will no longer be synced with the primary index.
Modifier
virtual("REPLICA").
Create a virtual replica,
Virtual replicas don't increase the number of records and are optimized for Relevant sorting.[
"virtual(prod_products_price_asc)",
"dev_products_replica"
]Maximum number of search results that can be obtained through pagination.
Higher pagination limits might slow down your search. For pagination limits above 1,000, the sorting of results beyond the 1,000th hit can't be guaranteed.
x <= 20000100
Attributes that can't be retrieved at query time.
This can be useful if you want to use an attribute for ranking or to restrict access, but don't want to include it in the search results. Attribute names are case-sensitive.
["total_sales"]Creates a list of words which require exact matches. This also turns off word splitting and concatenation for the specified words.
["wheel", "1X2BCD"]Attributes, for which you want to support Japanese transliteration.
Transliteration supports searching in any of the Japanese writing systems. To support transliteration, you must set the indexing language to Japanese. Attribute names are case-sensitive.
["name", "description"]Attributes for which to split camel case words. Attribute names are case-sensitive.
["description"]Searchable attributes to which Algolia should apply word segmentation (decompounding). Attribute names are case-sensitive.
Compound words are formed by combining two or more individual words, and are particularly prevalent in Germanic languages—for example, "firefighter". With decompounding, the individual components are indexed separately.
You can specify different lists for different languages.
Decompounding is supported for these languages:
Dutch (nl), German (de), Finnish (fi), Danish (da), Swedish (sv), and Norwegian (no).
Decompounding doesn't work for words with non-spacing mark Unicode characters.
For example, Gartenstühle won't be decompounded if the ü consists of u (U+0075) and ◌̈ (U+0308).
{ "de": ["name"] }Languages for language-specific processing steps, such as word detection and dictionary settings.
You should always specify an indexing language.
If you don't specify an indexing language, the search engine uses all supported languages,
or the languages you specified with the ignorePlurals or removeStopWords parameters.
This can lead to unexpected search results.
For more information, see Language-specific configuration.
ISO code for a supported language.
af, ar, az, bg, bn, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, he, hi, hu, hy, id, is, it, ja, ka, kk, ko, ku, ky, lt, lv, mi, mn, mr, ms, mt, nb, nl, no, ns, pl, ps, pt, pt-br, qu, ro, ru, sk, sq, sv, sw, ta, te, th, tl, tn, tr, tt, uk, ur, uz, zh ["ja"]Searchable attributes for which you want to turn off prefix matching. Attribute names are case-sensitive.
["sku"]Whether arrays with exclusively non-negative integers should be compressed for better performance. If true, the compressed arrays may be reordered.
Numeric attributes that can be used as numerical filters. Attribute names are case-sensitive.
By default, all numeric attributes are available as numerical filters. For faster indexing, reduce the number of numeric attributes.
To turn off filtering for all numeric attributes, specify an attribute that doesn't exist in your index, such as NO_NUMERIC_FILTERING.
Modifier
equalOnly("ATTRIBUTE").
Support only filtering based on equality comparisons = and !=.["equalOnly(quantity)", "popularity"]Control which non-alphanumeric characters are indexed.
By default, Algolia ignores non-alphanumeric characters like hyphen (-), plus (+), and parentheses ((,)).
To include such characters, define them with separatorsToIndex.
Separators are all non-letter characters except spaces and currency characters, such as $€£¥.
With separatorsToIndex, Algolia treats separator characters as separate words.
For example, in a search for "Disney+", Algolia considers "Disney" and "+" as two separate words.
"+#"
Attributes used for searching. Attribute names are case-sensitive.
By default, all attributes are searchable and the Attribute ranking criterion is turned off.
With a non-empty list, Algolia only returns results with matches in the selected attributes.
In addition, the Attribute ranking criterion is turned on: matches in attributes that are higher in the list of searchableAttributes rank first.
To make matches in two attributes rank equally, include them in a comma-separated string, such as "title,alternate_title".
Attributes with the same priority are always unordered.
For more information, see Searchable attributes.
Modifier
unordered("ATTRIBUTE").
Ignore the position of a match within the attribute.Without a modifier, matches at the beginning of an attribute rank higher than matches at the end.
[
"title,alternative_title",
"author",
"unordered(text)",
"emails.personal"
]An object with custom data.
You can store up to 32kB as custom data.
{
"settingID": "f2a7b51e3503acc6a39b3784ffb84300",
"pluginVersion": "1.6.0"
}Characters and their normalized replacements. This overrides Algolia's default normalization.
{ "default": { "ä": "ae", "ü": "ue" } }Attribute that should be used to establish groups of results. Attribute names are case-sensitive.
All records with the same value for this attribute are considered a group.
You can combine attributeForDistinct with the distinct search parameter to control
how many items per group are included in the search results.
If you want to use the same attribute also for faceting, use the afterDistinct modifier of the attributesForFaceting setting.
This applies faceting after deduplication, which will result in accurate facet counts.
"url"
Maximum number of facet values to return when searching for facet values.
x <= 100Characters for which diacritics should be preserved.
By default, Algolia removes diacritics from letters.
For example, é becomes e. If this causes issues in your search,
you can specify characters that should keep their diacritics.
"øé"
Attributes to use as custom ranking. Attribute names are case-sensitive.
The custom ranking attributes decide which items are shown first if the other ranking criteria are equal.
Records with missing values for your selected custom ranking attributes are always sorted last. Boolean attributes are sorted based on their alphabetical order.
Modifiers
asc("ATTRIBUTE").
Sort the index by the values of an attribute, in ascending order.
desc("ATTRIBUTE").
Sort the index by the values of an attribute, in descending order.
If you use two or more custom ranking attributes, reduce the precision of your first attributes, or the other attributes will never be applied.
["desc(popularity)", "asc(price)"]Attributes to include in the API response To reduce the size of your response, you can retrieve only some of the attributes. Attribute names are case-sensitive
* retrieves all attributes, except attributes included in the customRanking and unretrievableAttributes settings.*: ["*", "-ATTRIBUTE"].objectID attribute is always included.["author", "title", "content"]Determines the order in which Algolia returns your results.
By default, each entry corresponds to a ranking criteria. The tie-breaking algorithm sequentially applies each criterion in the order they're specified. If you configure a replica index for sorting by an attribute, you put the sorting attribute at the top of the list.
Modifiers
asc("ATTRIBUTE").
Sort the index by the values of an attribute, in ascending order.desc("ATTRIBUTE").
Sort the index by the values of an attribute, in descending order.Before you modify the default setting, you should test your changes in the dashboard, and by A/B testing.
Relevancy threshold below which less relevant results aren't included in the results
You can only set relevancyStrictness on virtual replica indices.
Use this setting to strike a balance between the relevance and number of returned results.
90
Attributes to highlight
By default, all searchable attributes are highlighted.
Use * to highlight all attributes or use an empty array [] to turn off highlighting.
Attribute names are case-sensitive
With highlighting, strings that match the search query are surrounded by HTML tags defined by highlightPreTag and highlightPostTag.
You can use this to visually highlight matching parts of a search query in your UI
For more information, see Highlighting and snippeting.
["author", "title", "conten", "content"]Attributes for which to enable snippets.
Attribute names are case-sensitive
Snippets provide additional context to matched words.
If you enable snippets, they include 10 words, including the matched word.
The matched word will also be wrapped by HTML tags for highlighting.
You can adjust the number of words with the following notation: ATTRIBUTE:NUMBER,
where NUMBER is the number of words to be extracted.
["content:80", "description"]HTML tag to insert before the highlighted parts in all highlighted results and snippets.
HTML tag to insert after the highlighted parts in all highlighted results and snippets.
String used as an ellipsis indicator when a snippet is truncated.
Whether to restrict highlighting and snippeting to items that at least partially matched the search query. By default, all items are highlighted and snippeted.
Number of hits per page.
1 <= x <= 1000Whether typo tolerance is enabled and how it is applied.
If typo tolerance is true, min, or strict, word splitting and concatenation are also active.
Whether to allow typos on numbers in the search query Turn off this setting to reduce the number of irrelevant matches when searching in large sets of similar numbers.
Attributes for which you want to turn off typo tolerance. Attribute names are case-sensitive Returning only exact matches can help when
disableTypoToleranceOnWords or adding synonyms if your attributes have intentional unusual spellings that might look like typos.["sku"]Treat singular, plurals, and other forms of declensions as equivalent. You should only use this feature for the languages used in your index.
ISO code for a supported language.
af, ar, az, bg, bn, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, he, hi, hu, hy, id, is, it, ja, ka, kk, ko, ku, ky, lt, lv, mi, mn, mr, ms, mt, nb, nl, no, ns, pl, ps, pt, pt-br, qu, ro, ru, sk, sq, sv, sw, ta, te, th, tl, tn, tr, tt, uk, ur, uz, zh ["ca", "es"]Removes stop words from the search query.
Stop words are common words like articles, conjunctions, prepositions, or pronouns that have little or no meaning on their own. In English, "the", "a", or "and" are stop words.
You should only use this feature for the languages used in your index.
ISO code for a supported language.
af, ar, az, bg, bn, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, he, hi, hu, hy, id, is, it, ja, ka, kk, ko, ku, ky, lt, lv, mi, mn, mr, ms, mt, nb, nl, no, ns, pl, ps, pt, pt-br, qu, ro, ru, sk, sq, sv, sw, ta, te, th, tl, tn, tr, tt, uk, ur, uz, zh ["ca", "es"]Languages for language-specific query processing steps such as plurals, stop-word removal, and word-detection dictionaries
This setting sets a default list of languages used by the removeStopWords and ignorePlurals settings.
This setting also sets a dictionary for word detection in the logogram-based CJK languages.
To support this, you must place the CJK language first
You should always specify a query language.
If you don't specify an indexing language, the search engine uses all supported languages,
or the languages you specified with the ignorePlurals or removeStopWords parameters.
This can lead to unexpected search results.
For more information, see Language-specific configuration.
ISO code for a supported language.
af, ar, az, bg, bn, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, he, hi, hu, hy, id, is, it, ja, ka, kk, ko, ku, ky, lt, lv, mi, mn, mr, ms, mt, nb, nl, no, ns, pl, ps, pt, pt-br, qu, ro, ru, sk, sq, sv, sw, ta, te, th, tl, tn, tr, tt, uk, ur, uz, zh ["es"]Whether to split compound words in the query into their building blocks
For more information, see Word segmentation.
Word segmentation is supported for these languages: German, Dutch, Finnish, Swedish, and Norwegian.
Decompounding doesn't work for words with non-spacing mark Unicode characters.
For example, Gartenstühle won't be decompounded if the ü consists of u (U+0075) and ◌̈ (U+0308).
Whether to enable rules.
Whether to enable Personalization.
Determines if and how query words are interpreted as prefixes.
By default, only the last query word is treated as a prefix (prefixLast).
To turn off prefix search, use prefixNone.
Avoid prefixAll, which treats all query words as prefixes.
This might lead to counterintuitive results and makes your search slower.
For more information, see Prefix searching.
prefixLast, prefixAll, prefixNone Strategy for removing words from the query when it doesn't return any results. This helps to avoid returning empty search results.
none.
No words are removed when a query doesn't return results.
lastWords.
Treat the last (then second to last, then third to last) word as optional,
until there are results or at most 5 words have been removed.
firstWords.
Treat the first (then second, then third) word as optional,
until there are results or at most 5 words have been removed.
allOptional.
Treat all words as optional.
For more information, see Remove words to improve results.
none, lastWords, firstWords, allOptional "firstWords"
Search mode the index will use to query for results.
This setting only applies to indices, for which Algolia enabled NeuralSearch for you.
neuralSearch, keywordSearch Settings for the semantic search part of NeuralSearch.
Only used when mode is neuralSearch.
Show child attributes
Indices from which to collect click and conversion events.
If null, the current index and all its replicas are used.
Whether to support phrase matching and excluding words from search queries
Use the advancedSyntaxFeatures parameter to control which feature is supported.
Words that should be considered optional when found in the query.
By default, records must match all words in the search query to be included in the search results. Adding optional words can help to increase the number of search results by running an additional search query that doesn't include the optional words. For example, if the search query is "action video" and "video" is an optional word, the search engine runs two queries. One for "action video" and one for "action". Records that match all words are ranked higher.
For a search query with 4 or more words and all its words are optional, the number of matched words required for a record to be included in the search results increases for every 1,000 records:
optionalWords has less than 10 words, the required number of matched words increases by 1:
results 1 to 1,000 require 1 matched word, results 1,001 to 2000 need 2 matched words.optionalWords has 10 or more words, the number of required matched words increases by the number of optional words divided by 5 (rounded down).
For example, with 18 optional words: results 1 to 1,000 require 1 matched word, results 1,001 to 2000 need 4 matched words.For more information, see Optional words.
Searchable attributes for which you want to turn off the Exact ranking criterion. Attribute names are case-sensitive This can be useful for attributes with long values, where the likelihood of an exact match is high, such as product descriptions. Turning off the Exact ranking criterion for these attributes favors exact matching on other attributes. This reduces the impact of individual attributes with a lot of content on ranking.
["description"]Determines how the Exact ranking criterion is computed when the search query has only one word.
attribute.
The Exact ranking criterion is 1 if the query word and attribute value are the same.
For example, a search for "road" will match the value "road", but not "road trip".
none.
The Exact ranking criterion is ignored on single-word searches.
word.
The Exact ranking criterion is 1 if the query word is found in the attribute value.
The query word must have at least 3 characters and must not be a stop word.
Only exact matches will be highlighted,
partial and prefix matches won't.
attribute, none, word Determine which plurals and synonyms should be considered an exact matches By default, Algolia treats singular and plural forms of a word, and single-word synonyms, as exact matches when searching. For example
ignorePlurals.
Plurals and similar declensions added by the ignorePlurals setting are considered exact matchessingleWordSynonym.
Single-word synonyms, such as "NY" = "NYC", are considered exact matchesmultiWordsSynonym.
Multi-word synonyms, such as "NY" = "New York", are considered exact matches.ignorePlurals, singleWordSynonym, multiWordsSynonym, ignoreConjugations Advanced search syntax features you want to support
exactPhrase.
Phrases in quotes must match exactly.
For example, sparkly blue "iPhone case" only returns records with the exact string "iPhone case"excludeWords.
Query words prefixed with a - must not occur in a record.
For example, search -engine matches records that contain "search" but not "engine"
This setting only has an effect if advancedSyntax is true.exactPhrase, excludeWords Determines how many records of a group are included in the search results.
Records with the same value for the attributeForDistinct attribute are considered a group.
The distinct setting controls how many members of the group are returned.
This is useful for deduplication and grouping.
The distinct setting is ignored if attributeForDistinct is not set.
1
Whether to replace a highlighted word with the matched synonym
By default, the original words are highlighted even if a synonym matches.
For example, with home as a synonym for house and a search for home,
records matching either "home" or "house" are included in the search results,
and either "home" or "house" are highlighted
With replaceSynonymsInHighlight set to true, a search for home still matches the same records,
but all occurrences of "house" are replaced by "home" in the highlighted response.
Minimum proximity score for two matching words
This adjusts the Proximity ranking criterion
by equally scoring matches that are farther apart
For example, if minProximity is 2, neighboring matches and matches with one word between them would have the same score.
1 <= x <= 7Properties to include in the API response of search and browse requests
By default, all response properties are included.
To reduce the response size, you can select which properties should be included
An empty list may lead to an empty API response (except properties you can't exclude)
You can't exclude these properties:
message, warning, cursor, abTestVariantID,
or any property added by setting getRankingInfo to true
Your search depends on the hits field. If you omit this field, searches won't return any results.
Your UI might also depend on other properties, for example, for pagination.
Before restricting the response size, check the impact on your search experience.
Maximum number of facet values to return for each facet.
x <= 1000Order in which to retrieve facet values
count.
Facet values are retrieved by decreasing count.
The count is the number of matching records containing this facet valuealpha.
Retrieve facet values alphabetically
This setting doesn't influence how facet values are displayed in your UI (see renderingContent).
For more information, see facet value display.Whether the best matching attribute should be determined by minimum proximity
This setting only affects ranking if the Attribute ranking criterion comes before Proximity in the ranking setting.
If true, the best matching attribute is selected based on the minimum proximity of multiple matches.
Otherwise, the best matching attribute is determined by the order in the searchableAttributes setting.
Extra data that can be used in the search UI.
You can use this to control aspects of your search UI, such as the order of facet names and values without changing your frontend code.
Show child attributes
Order of facet names and facet values in your UI.
Show child attributes
Order of facet names.
Show child attributes
Explicit order of facets or facet values.
This setting lets you always show specific facets or facet values at the top of the list.
Order of facet values. One object for each facet.
Show child attributes
Show child attributes
Explicit order of facets or facet values.
This setting lets you always show specific facets or facet values at the top of the list.
Order of facet values that aren't explicitly positioned with the order setting.
count.
Order remaining facet values by decreasing count.
The count is the number of matching records containing this facet value.
alpha.
Sort facet values alphabetically.
hidden.
Don't show facet values that aren't explicitly positioned.
count, alpha, hidden Hide facet values.
Widgets returned from any rules that are applied to the current search.
Show child attributes
Banners defined in the Merchandising Studio for a given search.
Show child attributes
Image to show inside a banner.
Show child attributes
Whether this search will use Dynamic Re-Ranking This setting only has an effect if you activated Dynamic Re-Ranking for this index in the Algolia dashboard.
Restrict Dynamic Re-Ranking to records that match these filters.
Function for extracting URLs from links on crawled pages.
For more information, see the linkExtractor documentation.
Show child attributes
function "({ $, url, defaultExtractor }) => {\n if (/example.com\\/doc\\//.test(url.href)) {\n // For all pages under/doc, only extract the first found URL.\n return defaultExtractor().slice(0, 1)\n }\n // For all other pages, use the default.\n return defaultExtractor()\n}\n"
Authorization method and credentials for crawling protected content.
The Crawler supports these authentication methods:
oauthRequest).
The Crawler uses OAuth 2.0 client credentials to obtain an access token for authentication.Basic authentication
The Crawler extracts the Set-Cookie response header from the login page, stores that cookie,
and sends it in the Cookie header when crawling all pages defined in the configuration.
This cookie is retrieved only at the start of each full crawl. If it expires, it isn't automatically renewed.
The Crawler can obtain the session cookie in one of two ways:
fetchRequest).
The Crawler sends a direct request with your credentials to the login endpoint, similar to a curl command.browserRequest).
The Crawler emulates a web browser by loading the login page, entering the credentials,
and submitting the login form as a real user would.OAuth 2.0
The crawler supports OAuth 2.0 client credentials grant flow:
Authorization headerThis token is only fetched at the beginning of each complete crawl. If it expires, it isn't automatically renewed.
Client authentication passes the credentials (client_id and client_secret) in the request body.
The Azure AD v1.0 provider is supported.
Show child attributes
URL with your login form.
"https://example.com/login"
Options for the HTTP request for logging in.
Show child attributes
HTTP method for sending the request.
"POST"
Headers to add to all requests.
Show child attributes
Preferred natural language and locale.
"fr-FR"
Basic authentication header.
"Bearer Aerehdf=="
Cookie. The header will be replaced by the cookie retrieved when logging in.
"session=1234"
Form content.
"id=user&password=s3cr3t"
Timeout for the request.
{
"url": "https://example.com/secure/login-with-post",
"requestOptions": {
"method": "POST",
"headers": {
"Content-Type": "application/x-www-form-urlencoded"
},
"body": "id=my-id&password=my-password",
"timeout": 5000
}
}Determines the maximum path depth of crawled URLs.
Path depth is calculated based on the number of slash characters (/) after the domain (starting at 1).
For example:
http://example.comhttp://example.com/http://example.com/foohttp://example.com/foo/http://example.com/foo/barhttp://example.com/foo/bar/URLs added with startUrls and sitemaps aren't checked for maxDepth..
1 <= x <= 1005
Limits the number of URLs your crawler processes.
Change it to a low value, such as 100, for quick crawling tests.
Change it to a higher explicit value for full crawls to prevent it from getting "lost" in complex site structures.
Because the Crawler works on many pages simultaneously, maxUrls doesn't guarantee finding the same pages each time it runs.
1 <= x <= 15000000250
If true, use a Chrome headless browser to crawl pages.
Because crawling JavaScript-based web pages is slower than crawling regular HTML pages, you can apply this setting to a specific list of pages. Use micromatch to define URL patterns, including negations and wildcards.
Lets you add options to HTTP requests made by the crawler.
Show child attributes
Proxy for all crawler requests.
Timeout in milliseconds for the crawl.
Maximum number of retries to crawl one URL.
Headers to add to all requests.
Show child attributes
Preferred natural language and locale.
"fr-FR"
Basic authentication header.
"Bearer Aerehdf=="
Cookie. The header will be replaced by the cookie retrieved when logging in.
"session=1234"
Checks to ensure the crawl was successful.
For more information, see the Safety checks documentation.
Show child attributes
Checks triggered after the crawl finishes but before the records are added to the Algolia index.
Show child attributes
Maximum difference in percent between the numbers of records between crawls.
1 <= x <= 100Stops the crawler if a specified number of pages fail to crawl.
Whether to back up your index before the crawler overwrites it with new records.
Schedule for running the crawl.
Instead of manually starting a crawl each time, you can set up a schedule for automatic crawls.
Use the visual UI or add the schedule parameter to your configuration.
schedule uses Later.js syntax to specify when to crawl your site.
Here are some key things to keep in mind when using Later.js syntax with the Crawler:
"every weekday at 12:00 pm"
Sitemaps with URLs from where to start crawling.
9999URLs from where to start crawling.
9999Date and time when the object was created, in RFC 3339 format.
"2023-07-04T12:49:15Z"
Universally unique identifier (UUID) of the user who created this version of the configuration.
"7d79f0dd-2dab-4296-8098-957a1fdc0637"
Was this page helpful?