Workaround for template not supporting transform semantic version

egan · December 2, 2024, 10:56pm

Hey team!

Semantic versioning is not supported by templates, meaning that transforms are forced into snapshot modes after a template deployment if a transform has a semantic version > 1. I have been trying to read from the output in snapshot mode and try to union that with the input and replace the output dataset by the result of the union but it looks like reading from the output in snapshot mode is not supported, I have been trying

previous_output_df = output_df.dataframe('current', schema=schema_test)
previous_output_df = output_df.dataframe('previous', schema=schema_test)

but previous_output_df.isEmpty() is always True. Does anyone using templates and semantic versioning in your transforms have a good way around this? It seems that reading from the output in a Java transform is allowed but not in a Python transform
Thanks!!

Ben · December 3, 2024, 10:26am

It’s generally preferred to use marketplace if possible, but I assume it’s not compatible with your usecase.

It is generally better to avoid using semantic version for handling snapshots in a templated pipeline. Instead, try using an empty snapshot trigger dataset as an input, which you can build to force a snapshot downstream. This allows you to avoid the limitation in templates and not have to encode complex logic into the pipeline. It also lets you independently snapshot installations instead of having to cut a new version. You should restrict permissions on this dataset so that it doesn’t get accidentally built regularly.

system · February 1, 2025, 10:27am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.