Can I auto-apply schema on file based sync?

aszekely · November 13, 2024, 4:27pm

I’m ingesting parquet files into Foundry, and I’d like Foundry to automatically infer schema when these files land in the platform.

redboyben · November 14, 2024, 10:29am

When ingesting a Parquet file through a sync in Foundry, schema inference is automated. It’s also possible for CSV files to try to infer a schema from the dataset view.

aszekely · November 14, 2024, 11:26am

Is there a setting I need to toggle on? I’ve ended up with a dataset full of parquet files which should all have the same schema.

sourceId: {{SOURCE_RID}}
datasetId: {{DATASET_RID}}
branchName: master
extractName: enrollment
extractConfig:
  sourceAdapter:
    type: file-based-source-adapter
    processors:
      - type: fileChangedSinceLastUpload
        filterCriteria:
          fileProperties:
            - NAME
            - LAST_MODIFIED
            - SIZE
      - type: inLastNFiles
        numFilesToKeep: 10000
        from: START
    subfolder: {{REDACTED}}
  transforms: []
  completionStrategies: []
  outputOptions:
    transactionType: UPDATE
maxAllowedDuration: null
sparkProfiles: []

redboyben · November 14, 2024, 5:44pm

There isn’t in the sync itself - the main options are either to infer the schema as a one time click in the UI if it’s static, and if it’s dynamic writing a transform downstream that parses the files and adjust the schema in a snapshot fashion.