Is the filter transform from pipeline builder on virtual tables fetch required data only or does it read all the data?

Hi, Is pipeline builder’s filter rows transform on virtual table automatically converted to the query condition and fetch required data only? Or will it try to fetch all the data and filter in the spark?

We are using databricks virtual table.

Hey! For databricks tables this depends on if you have external access enabled.
See: https://www.palantir.com/docs/foundry/available-connectors/databricks#external-access-to-storage-locations-virtual-tables-only

If external access is enabled, we delegate to delta/iceberg spark connectors which will attempt to do file pruning based off of metadata. If the data is partitioned correctly, this can lead to less data being fetched.

If external access is disabled, we rely on databricks JDBC which should push down filters.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.