Batch Typecasting

Does anyone know if there is a way to batch typecast a dataset created as an output of a code repository? If I imported the data to foundry as a csv, I could easily go in and click “Edit Schema.” This option does not seem to exist for datasets created in code repositories. I understand that typecasting isn’t so difficult in pyspark, but when working with datsets with 300+ columns, I would like to find a way to automatically typecast the columns or just be able to click through them and assign the correct type.

I think Pipeline Builder should include the ability to “batch cast” by selecting more than one column instead of doing only being able to select one at a time. Does this functionality already exist?

Hi @guyhartstein, I use the “Apply to multiple columns” expression when I have a similar need. For example, the screenshot below casts all decimal types to doubles:

This is fantastic, thank you!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.