Scheduled import of an externally hosted xlsx file

I am trying to ingest data from an xlsx file hosted on a public website (something like https://acme.org/downloads/my_excel_file.xlsx). What is the best way to do this ?

I could not find a data connector that allows doing this directly (did I miss it ?), so I did the following:

  • run a script that fetches it on a scheduled basis and drops it in an S3 bucket
  • Import it in foundry using a data connection to this bucket

I now have a dataset containing xslx files, but I don’t know how I can parse its content into a tabular dataset.

Any suggestion ? Thanks :slight_smile:

A source-based external transform with either the “Rest API” or “Generic” source type (it doesn’t matter which) is the standard way to ingest an arbitrary file hosted on a public website with HTTP/S.

For the parsing, transforms-excel-parser and the Pipeline Builder “Extract rows from an Excel file” transform are both popular options.

1 Like