Handle both incremental updates and historical data retrievals

wrose · August 15, 2024, 2:15pm

I’m working on integrating an API where I initially need to fetch a large amount of historical data. After this initial load, I want to handle incremental changes, adding new data. However, there may be situations where I need to fetch additional historical data as well.

I’m trying to avoid creating snapshots and overwriting the existing dataset whenever possible. What would be the best approach to manage this scenario, ensuring that I can efficiently handle both incremental updates and occasional historical data retrievals without overwriting the entire dataset?

Any suggestions or best practices would be greatly appreciated!

Thanks in advance!

rishir · August 16, 2024, 2:23am

Hey @wrose

You can use a state file to track the incremental behavior of your external transforms. This is explained here.

As long as you only need to append the historical data to the output, I believe this should work. Let me know if this does not satisfy your use case.