1234-XYZ-B5
with any of these queries:
1234-XYZ-B5
1234XYZB5
1234XYZ-B5
1234-XYZ
XYZ-B5
- Formatting the numbers in your records
- Configuring the relevance
Formatting records
Formatting non-alphanumeric characters
Searching through hyphenated attributes is tricky, and it’s not just because users search with different formats. It’s also tricky because these attributes include non-alphanumeric characters, such as hyphens (-
), pound signs (#
), or plus signs (+
).
By default, Algolia doesn’t index non-alphanumeric characters, or “separators”, meaning they aren’t searchable. They’re, however, essential for tokenization.
For example, the string 1234-XYZ-B5
is tokenized as 1234
, -
, XYZ
, -
, B5
, because the hyphen (-
) is a separator, and all the other characters aren’t. Then, by default, Algolia only indexes the non-separator tokens 1234
, XYZ
, B5
. The same is true for the string 1234 XYZ B5
, since a space is also a separator. That’s why 1234-XYZ-B5
and 1234 XYZ B5
are functionally the same for the engine. Both would be a match, whether your user searches for 1234-XYZ-B5
or 1234 XYZ B5
.
JSON
JSON
1234-XYZ-B5
must be different than the results or 1234 XYZ B5
.
This is very rarely the case for ISBN, SKU, phone number, or serial number use cases, but could be true for other use cases, such as when searching for the programming languages C
vs. C++
.
Include all possible formats
Difficulties can arise when users search with a format that removes spaces or special characters, for example,1234XYZB5
. While Algolia handles some splitting and concatenation, there are special considerations when numbers are involved, and Algolia may not handle several concatenations at once. For that reason, index all possible formats your users search with, not counting using different separators.
For example, this indexing format includes all different formats but only uses spaces and doesn’t include any versions with hyphens:
JSON
1234-XYZ-B5
1234 XYZ B5
1234XYZB5
1234XYZ-B5
1234XYZ B5
1234-XYZB5
1234 XYZB5
1234-XYZ
1234 XYZ
XYZ-B5
XYZ B5
JSON
["1234", "XYZ", "B5", "1234XYZB5", "1234XYZ", "XYZB5"]
, these are the only ones that need indexing.
However, you may not want to undertake the work necessary to deduplicate redundant tokens.
Additionally, having all variants allows for a more accurate proximity score, if your users tend to search with tokens in the same order as the original version.
For example, while a user may search for 1234 XYZB5
, they probably won’t search for XYZB5 1234
.
Suffix search
Since Algolia doesn’t support infix or suffix matching it can’t find sub-strings in the middle or the end of a string. If you want users to be able to search on suffixes, you need to index them.JSON
One format for search and another for display
If you want to display one particular format, you need to include another attribute for the display value.JSON
attributesToRetrieve
.
Configuring the relevance
Configuring searchableAttributes
To ensure that various formats are searchable, you first need to add the formatted attribute to thesearchableAttributes
. These attributes can be either strings or arrays.
searchableAttributes
.
searchableAttributes
.
Hyphenated attributes need to match exactly (without typos)
Often, when a user searches for a particular number, they’re looking for only that record and aren’t interested in close matches. In that case, you can turn off typo tolerance on the hyphenated number attribute usingdisableTypoToleranceOnAttributes
.
Hyphenated attributes need to match entirely (without prefix)
Similarly, if you would like to display results only on complete (and not prefixed) matches, you can turn off prefix matching on the hyphenated attribute usingdisablePrefixOnAttributes
.
Handling non-alphanumeric characters
By default, the engine ignores non-alphanumeric characters like hyphen (-
), plus (+
), and parentheses ((
,)
). Whether these characters are in the query or the index, Algolia won’t search for them.
This is by design: searching for +33
returns all records with an attribute starting with +33
or 33
, because the engine ignores the plus sign (+
). If you would like users to search with special characters, you must let the engine know to index these characters. You can do so with separatorsToIndex
.
For example, if you include +
in separatorsToIndex
, searching for +33
will only return records containing both +
and 33
. Since adding separatorsToIndex
can make a search more restrictive and complex, it’s generally not desirable to do so for these use cases.
Don’t include a character as a separatorsToIndex
unless its presence distinguishes between different records. For example, if searches for 1234-XYZ-B5
and 1234 XYZ B5
should return different results. This is rarely true for SKU, ISBN, phone numbers, and serial number use cases.
Searching with removeWordsIfNoResults
enabled
The removeWordsIfNoResults
parameter helps you avoid showing empty results by removing less critical words when a query returns nothing.
It lets you trade precision for recall, depending on your use case.
Due to the treatment of separator characters,
this parameter might not work as expected when searching for hyphenated attributes.
For more information, see Non-alphanumeric characters.