Best approach to migrate data pipelines from Palantir to Databricks?

I’m looking for guidance on replatforming a data pipeline from Palantir to Databricks, with an additional requirement to keep both systems in sync during the transition.

Currently, the pipeline consists of multiple source tables that are cleaned and then unioned with other tables to produce a set of downstream datasets. These final datasets are used by a workshop application.

A key requirement is to sync these pipelines with Databricks—whenever new data arrives in Palantir datasets, it should also be propagated to Databricks (ideally in near real-time or via incremental updates).

I’m trying to understand the best approach to both sync and migrate this pipeline efficiently and reliably. Some specific questions I have:

  • What’s the recommended strategy for syncing new/updated data from Palantir into Databricks?

  • Should this be handled via batch jobs, streaming, or CDC (change data capture)?

  • What’s the best way to translate transformations into Databricks (e.g., Spark/Delta)?

  • Are there best practices for handling table dependencies and unions during migration?

  • How should I validate that the migrated pipeline produces consistent results?

  • Any tooling or frameworks that can help automate parts of this process?

Would appreciate any advice, patterns, or lessons learned from similar migrations.

Hey @494bbf6f8e78866c98a3, thanks for getting in touch!

We have good support for cross integration with Databricks using virtual tables and compute pushdown, see our docs here for virtual tables:

https://www.palantir.com/docs/foundry/data-integration/virtual-tables

And here for Databricks specific connectivity:

https://www.palantir.com/docs/foundry/available-connectors/databricks

Let us know if that does not cover what you need for any reason!