How to only read latest transaction in an incremental build?

sandpiper · December 2, 2024, 7:53am

Are you by any chance referring to the following UI element in the history tab for the transaction/build?

If so, you may be misinterpreting what this means - this is simply telling you the latest dataset view that the build used, not the data that was actually “read in.”

A lot of things go on in the gap between the start of the timeline and the first Spark task execution stages (query plan generation, etc.) - you can’t assume that it’s just reading in the files in that time (and in fact, the actual reading of files should happen in the subsequent Spark tasks, unless you’re doing something custom with the filesystem API).