In an incremental python transform, if I want to find the current state of the output before all the processing has started, should I be using mode=‘previous’ or mode=‘current’.
What would be the difference between these two Options in this example.
@incremental(
semantic_version=1
)
@transform(
output_sub_alerts=Output("ri.foundry.main.dataset.eef67890-xxxxxx"),
scenario_hits=Input("ri.foundry.main.dataset.b7c12345-xxxxxx"),
)
def compute(output_sub_alerts, scenario_hits):
# Option 1
previous_output_sub_alerts = output_sub_alerts.dataframe(mode='previous')
# Option 2
current_output_sub_alerts = output_sub_alerts.dataframe(mode='current')
.... stuff to be done here
The modes are very similar, and described in the documentation here: https://www.palantir.com/docs/foundry/transforms-python/incremental-reference/#reading-data-from-the-previous-run-valid-combinations
I add emphasis below:
Although default read mode is current , in most cases you actually want to use previous . Other read modes should be used to read dataset after writing to it.
When using current to get the previous dataframe you don’t have to provide schema. This is because current uses the schema of the output which must already have been built. However current mode is more fragile than previous. The current mode will fail if:
the transform is run non-incrementally and you don’t override the write_mode to modify before calling dataframe on the output
the transform has never been computed before, so it’s not possible to construct an empty DataFrame because the schema is not known.
So the difference would be the ways in which this would fail or provide a valid operation. You should probably use previous.
Previous will always get the data from the previous transaction. Current will be dependent on whether you have already written to the output already, and does not require you to pass a schema.