I’m ingesting parquet files into Foundry, and I’d like Foundry to automatically infer schema when these files land in the platform.
When ingesting a Parquet file through a sync in Foundry, schema inference is automated. It’s also possible for CSV files to try to infer a schema from the dataset view.
Is there a setting I need to toggle on? I’ve ended up with a dataset full of parquet files which should all have the same schema.
sourceId: {{SOURCE_RID}}
datasetId: {{DATASET_RID}}
branchName: master
extractName: enrollment
extractConfig:
sourceAdapter:
type: file-based-source-adapter
processors:
- type: fileChangedSinceLastUpload
filterCriteria:
fileProperties:
- NAME
- LAST_MODIFIED
- SIZE
- type: inLastNFiles
numFilesToKeep: 10000
from: START
subfolder: {{REDACTED}}
transforms: []
completionStrategies: []
outputOptions:
transactionType: UPDATE
maxAllowedDuration: null
sparkProfiles: []
There isn’t in the sync itself - the main options are either to infer the schema as a one time click in the UI if it’s static, and if it’s dynamic writing a transform downstream that parses the files and adjust the schema in a snapshot fashion.