Im currently working an incremental transaction, that reads incrementally from an input and adds appending to the output, the size of the data is huge, so recomputing or rewriting as an “snapshot”/“replace” is not desirable. My question is:
Is there any possibility, to add a column to previous output just modifying the schema (accepting that all the values there will be null) and start adding a new column incrementally? (supposedly this wont fail as schema will be matching)
Example:
Iteration 1:
- Input: 10 columns
- Transform selects 5 columns
- Output mode append write 5 columns
Iteration 2:
- Input: 10 columns
- Transforms selects 6 columns
- <somehow I make previous output to have those 6 cols, being the new one null values>
- Output mode append write 6 columns
the idea is to avoid bring previous output, union with the new incoming data and write into output replacing (unless it is the most efficient way of doing it)