Databricks and Foundry

Hi,

I’m in research to understand how other customers who has databricks and foundry use both environment together

If you are using both Foundry and Databricks then can you let me know

1- If you could successfully virtualize the data both ways
2- If you could then what is the techology and methodology you use

What we suffer is

1- Virtualize Foundry on Databricks: S3 API is not suitable for Databricks to federate via external tables on Unity Catalog. There is a way to do it on Hive but not on unity catalog

2- Virtualize Databricks on Foundry: Databricks connector doesn’t support virtualization. We utilize both S3 and ADLS Gen2 options.
Both works fine. S3 can also give unity catalog browsability but only for iceberg.
Unfortunately our version of databricks doesn’t support the iceberg
Problem with virtualizing without catalog is having to use a mapper table from unity catalog ID to the table so it is a bit cumbersome.

We use JDBC both ways which is not really ideal as thats too much data movement.

Want to get fellow databricks users opinions and experiences how they operate using 2 data platforms.

Hi, jumping in on this question to surface a few integration points in context of our recently announced Palantir/Databricks partnership.

There are a few focal areas where we’re working to make interoperating these technologies as seamless as possible, including deepening integration points around:

  • Data virtualization
  • Governance
  • Cross-platform compute orchestration
  • Modeling/AI/workflows

Specifically to the question on data virtualization:

  • You can virtualize Databricks data in Palantir today via Palantir virtual tables, using either:
    • The Databricks JDBC connector (virtual tables support in Beta status)
    • Or by connecting at the underlying ABFS or S3 storage layer for Databricks external tables (virtual tables support in GA status)
    • With either route, you can leverage connectivity to Unity Catalog for table metadata & registration (with Delta UniForm).
  • For virtualizing Palantir data in Databricks
    • We have a couple new features currently in Beta/Limited Beta around virtual table outputs for Pipeline Builder and Python Transforms, which allow you to write the output of your Palantir pipeline into Databricks storage (rather than Palantir storage) and register the table with Unity Catalog.
    • Looking forward, we are exploring with Databricks the potential to leverage their foreign Iceberg catalogs for registering Palantir Iceberg tables directly in Unity Catalog.

Happy to share more preview information live together with Databricks if you reach out via your Palantir rep. We’re always keen to hear from customers who are using these technologies together!

Also keep an eye on our release notes in the coming weeks as we’re planning a number of new developments in the space.

1 Like