Prepare a spreadsheet in a crawler-compatible format
Your spreadsheet’s format must be compatible with the crawler:- It must contain a header row
- One of the header columns must be
url
- The
url
column should contain all URLs to which you want to add external data. - The remaining columns are for data that you want to add to the record for each URL. For an example, see this spreadsheet.
Publish your spreadsheet online
Add your spreadsheet data to Google Sheets and publish it online.- The spreadsheet’s sharing settings(Restrict access to the following) must allow the Crawler to access it.
- The spreadsheet must be published as a CSV file. Change the setting from Web page to Comma-separated values (.csv).
- The crawler uses the latest data in your Google spreadsheet. To prevent this, and just use the initially uploaded data, clear the Automatically republish when changes are made setting.

The CSV data doesn’t have to be in Google Sheets but it must be available online.
Link the published spreadsheet in your crawler configuration
To link your spreadsheet, create an external data source and add it to therecordExtractor
function in your crawler configuration.
Create an external data source
- From the Crawler page, select the External Data tab.
- Click Add External Data
-
As External Data type, select CSV, add your CSV file’s URL, and click Create.
- To test the data source, click Explore Data and then Refresh Data. It should extract the correct number of rows from your spreadsheet.
Add external data to extracted records
- Go to the Crawler page, select your crawler, and click Editor.
-
Add the
externalData
parameter to your crawler. For example:JavaScript -
Add the
dataSources
parameter to therecordExtractor
function and reference the columns from your CSV data source.JavaScript
startUrls
, extraUrls
, and pathsToMatch
parameters.
Test the data
- In the URL Tester, enter the URL of a page with CSV data attached to it (one of those you added to your spreadsheet).
- Click Run Test.
- Confirm that the extracted records contain the data from your CSV.