Best practice to ingesting multiple Excel files added daily to Foundry Compass Folder via Pipeline Builder

Hi: We have old-school set of excel files that are being submitted and made available in a Compass Folder.

Note:

  • Each Excel file may have multiple tabs with different schemas
  • Plan is to eventually transform the data into single ontology dataset

What’s the best practice way of ingesting these growing number of excel files using Pipeline Builder?

cc: @Xander

If all excel files are in the same dataset then it’s possible to use the Parse Excel board to create a structured dataset. If they all have different schemas then this becomes somewhat more difficult but you could use the board multiple times with different schemas and filter / coalesce the outputs.

If each excel file is it’s own dataset then this can’t be done in pipeline builder today but people have used Logic Flows https://www.palantir.com/docs/foundry/building-pipelines/logic-flows-overview/ to do this kind of thing in the past.

The situation of Excels with different schemas is actually quite straightforward to handle now. As long as the tables in the various Excel sheets have headers and those headers are consistently on the same line, you can just use the “Treat first row (after skipping) as header” option and you will get a “wide union” that respects which fields are present in which sheets (and in what order).