If your pipeline is append-only, a projection would be the best way to improve query/filtering performance when reading this dataset. A projection is able to compact files to maintain performance even after a large number of incremental transactions.
The docs for projections are here, and specifically for incremental pipelines here.
Thanks for your reply. we’re indeed aware of the projections, but we can’t use them for now, as the schema of our incremental datasets is evolving and we need to keep a certain flexibility to be able to delete files/transactions when necessary.
That’s why we’re experimenting with hive partitioning and bucketing to avoid any blocking.
related to my question Would you know if bucketing is currently possible on incremental datsets?