Hi there,
We want to get data from Foundry in DataBricks. There seems to be two options, either use the “read table” API or the JDBC connector.
Is there any recommendation on the best method or is there a better solution?
Many thanks.
Do you want to push the data out or have databricks pull the data?
The most efficient way will be to use a delta virtual table that you write into. That is an S3 Source with configuration unity catalogue.
Pulling from the read table API or using the JDBC Connector will be slow for big data, you will be better of downloading the parquet files of a dataset in parallel and importing them into databricks.
Thank you for your answer.
The idea is to pull the data, ideally without using an intermediate storage system.
We are also exploring the other options, but we are not sure about the pros and cons between API and JDBC.
https://palantir.com/docs/foundry/data-integration/foundry-s3-api//
I would recommend this API, there is an example for Spark on the page.