Attempting to sync Databricks using ABFS file sync by Year and would like to leverage the Views capability for a single View of each of the years.
We’ve been struggling a bit with Agent performance and the Databricks connector and are thinking this may be a way to solve for that.
We know we can run a Pipeline to Union them all, but that takes an extremely long time with 100M+ rows.
I also tried the Union Files approach in Pipeline Builder…and you guessed it, failed due to Memory Error.
We’re a fairly large retailer, but I think we’re really pushing Foundry a bit given our infrastructure (Databricks with Delta tables). I REALLY wish Databricks could mimic the Delta functionality/JSONs used in Databricks.
Would you be able use “direct connection” rather than “agent-based” connection?
If so, would recommend using the “virtual tables” feature to register your Databricks Delta tables stored in ABFS. This avoids the need to do file-based syncing and unioning. See the documentation here.
Alternatively, if you want to do data ingests (syncs) rather than virtualization, you can reach out your Palantir admin for help enabling “Cloud Fetch” on your Databricks (JDBC) connector. This should help with the performance issues you were experiencing. This approach is supported for both direct connections and agent-based connections.
Note: our product team is exploring enabling Cloud Fetch by default, which would make this easier in future.
+1 for using virtual tables, it’s working extremely well!
1 Like
We have had a lot of success with Virtual Tables, but our scale of data in some of these tables is a challenge even with Virtual.
We’ve reached out re: Cloud Fetch, and still digging into Views (if that’s not deprecated).