I want to dynamically create a new output dataset

azu · November 7, 2024, 7:40am

Hello,
I would like to implement the following process.

I want to create “datasetB_yyyymmdd” as a backup of “datasetA”.
※yyyymmdd will be the date the process is executed.
I want the process to be executed once a month.
(One dataset will be created each month,
such as
datasetB_20250101,
datasetB_20250201,
datasetB_20250301, etc.)

【Question】

Is it possible to output a new dataset every time it is executed?
(If 1 is possible,) even if the output dataset is dynamic, is it possible to run it by setting a schedule?

comew · November 7, 2024, 11:13am

CI checks must run in order to have a new output or input to a transform, so you cannot have a new output created dynamically.

However, you could create N empty datasets for the next N months in a multi-output transform, and have code to choose and write to the output corresponding to the desired month. Something like this for instance:

@transform(
    my_input=Input("ri.foundry.main.dataset.XXX"),
    **{
        month: Output(
            f"folder_path/datasetB_{month}"
        )
        for month in ["202411", "202412"]
    },

)
def my_transform(my_input, **outputs):
    current_month = datetime.now().strftime("%Y%m")

    if current_month not in outputs.keys():
        raise ValueError("Current month has no corresponding output")

    outputs[current_month].write_dataframe(my_input.dataframe())

And you can indeed setup a monthly schedule with a time-based trigger.

azu · November 18, 2024, 11:02am

I understand. Thank you.