30 x 50
or 30mm x 50mm
, while your records may use a different format like 30x50mm
.
Algolia’s features support some formatting variations but may not handle all dimension formats reliably.
This is because:
- User query formats vary. A query like
30mm x 50mm
won’t necessarily match a record with30x50mm
. - Unit differences cause mismatches. Queries may include
"
,inches
,mm
,cm
, orft
, while records might use only one format. - Typo tolerance has limits. While typo tolerance can match slight variations such as
30by50
or30 x 50
, but not different units or separators. - AI doesn’t consistently interpret dimensions. Although NeuralSearch can identify dimensions, it doesn’t do so consistently, due to the ambiguity of the input.
Transform data into dimension-friendly formats
To address this issue, pre-process your data with a transformation function like the one below. For each record you pass to it, thetransform
function returns a transformed record or undefined
if no dimensions are found.
To run this function, create a Push to Algolia connector, using the following transformation code.
JavaScript
Customization
You can customize the function to support other measurement units or non-standard formats. To add new units (for example,yd
, mil
, µm
, kg
) or handle alternative patterns (for example, D30
, H50
, Ø20mm x 40mm
),
update the following:
DIMENSIONS_RE
. Extend the regular expression to detect new unit symbols or structural patterns. Consider using AI-assisted tools to build and test regular expressions.normalizeUnit(raw, fallback)
. Map any new unit symbol or abbreviation to a standard form. For example, ‘yard’ and ‘yd’ become ‘yd’.dimensionKeywords()
. The function defaults tomm
if it doesn’t find a unit. To change this default (for example, tocm
orin
), update the second argument in thedimensionKeywords()
call.unitForms(unit)
. Add alternative spellings and symbols for each unit. For example,["yd", "yard", "yards"]
.
How the transformation function works
The function improves search by extracting keyword variants from dimension patterns, by performing the following steps:- Identify dimensions with a regular expression
- Extract numbers and units
- Generate variants
- Attach keywords to a new attribute
Identify dimensions
The function uses theDIMENSIONS_RE
regular expression to detect one-part, two-part, or three-part dimensions,
such as:
- One part:
600mm
,2.4m
- Two-part:
30x50
,3"x6"
,20mm x 30mm
- Three-part:
245x148x65mm
,30mm x 50mm x 2m
x
, *
, by
) and units (including mm
, "
, inches
, and ft
),
with or without spaces.
The regular expression handles a wide range of edge cases,
but test it against your data to confirm it captures the formats you use.
Extract numbers and units
For each match, the function extracts the numbers and their associated units. The function standardizes unit variants like"
, inch
,
and inches
to in
.
Generate variants
Each detected dimension expands into these keyword-friendly formats:- Bare numbers:
30
,50
,2
- Normalized units:
30mm
,2m
,3in
- Commonly-accepted synonyms:
30in
,30"
,30inch
,30inches
- Joined forms:
- Without spacing:
30mm50mm
- With separators:
30x50
,30 mm by 50 mm
,30mmx50mm
- Without spacing:
30mm x 50mm
,
the function generates:
JavaScript
Attach keywords to an attribute
The function adds a new attribute,dimension_keywords
, to each record it processes.
Add dimension keywords to your index
To use the generated keywords in Algolia:- With the
taskID
generated by the Push to Algolia connector, send your data to Algolia with the Ingestion APIpushTask
method, making sure each one includes thedimension_keywords
attribute. - Configure
dimension_keywords
as a searchable attribute in your index settings.