Getting Input File Metadata in Pipeline Builder for non-CSV Inputs

Context: As explained at https://www.palantir.com/docs/foundry/data-integration/csv-parsing/#textdataframereader-options, for CSV datasets, it is possible to get input file metadata such as file path and imported timestamp via schema options. Additionally, in a Code Repository, it is possible to use the combination of pyspark.sql.functions.input_file_name() and a join with the dataframe returned from the files() method of a Filesystem object to retrieve this information for an input dataset of any file type.

Question: Is there any way to get file metadata such as file name and imported timestamp for a parquet input dataset in Pipeline Builder, or is it necessary to use the abovementioned Code Repository-based method?

Hey @sandpiper in Pipeline Builder we actually just added the functionality to get the file path (should see it on your environment in a few days) and we’re currently working on adding in the timestamp

1 Like