Is there a way to create new columns from existing column values w/ PipelineBuilder transforms?

elim · May 10, 2024, 3:40pm

I would like to create new columns in a dataset (or a new dataset) based on existing column values. So if an existing column of interest has 3 unique values, I would like to create 3 new columns who’s names correspond to the 3 unique values. The new columns’ values would then correspond to another existing column who’s value would be part of the same row.

In the example below (data is notional, from dev-stack), I would like to create new columns where the values in p__type would be the new column names, and the corresponding p_propertValue_... would be the column values.

So after transforming this dataset, ideally there would be new columns that look like:

col_name: com.palantir.property.Name w/ values oakley, elijah, baker, and aaa
col_name: com.palantir.property.Language w/ values house, above, itself
etc.

I’m wondering if this is possible using built in PipelineBuilder transforms or if I’ll need to write a UDF / External Transform.

helenq · May 10, 2024, 5:08pm

Hey @elim try using the pivot board in Pipeline Builder! If you run into any trouble/confusion let us know

elim · May 10, 2024, 6:23pm

thanks @helenq ! is there a way to avoid having to do any aggregations? I would just like to pivot the string values to the new column as values instead of applying an expression and having the result of the expression as the values

helenq · May 10, 2024, 6:56pm

Hey @elim can you try “First” in the aggregation field? Or “collect array” and then explode the array after the aggregation