Incremental pipelines on MongoDB results in inconsistent schemas

We ingest data from a mongoDB collection, as json.
We would like this ingestion to be incremental.

The issue we’re facing is that the resulting inferred schema is inconsistent, when the new rows don’t have values for ALL columns that have existed previously in the collection.

Using transforms, I can see how we could ensure stable schemas by looking at the previous version of the dataset, but I’m wondering how to do this at ingest stage.

Can we hard code a schema somewhere in the Data Connection config ?

Thanks,

Julien

Have you tried configuring the option to dis-allow schema changes in the batch sync configuration? Does this have the desired effect?

The intent of this feature is to allow you to choose if you prefer to fail the build of tabular batch syncs if the data coming from the external system doesn’t match the schema of the data you’ve already synced.

Hey @aczarnecki, I’ve seen the feature, but it doesn’t quite fit the need in our case.

Dis-allowing schema changes would fail the build, while I’m looking at ways to do schema reconciliation between old rows and new rows at the ingestion stage, without failing the build when there’s a schema change, especially with json where the schema is dynamically inferred from the available data.

We ended up managing this in Code repo and it works really fine.

Thanks !