How do I Create UID in Pipeline?

New to pipeline builder, I’ve got a loaded data set that doesn’t have a Unique ID. Is there an expression or easy way to add a column “uid” to my data set as an early transform? I’ll need a UID to make this useful for later use in the Ontology. Thanks!

Yes, you can do this using our create UUID expression, which you can find by searching for “uuid” in the expressions list! Please note the expression is not deterministic and the UUID will change each time the pipeline is run. If you need deterministic UUIDs, I recommend running an early pipeline which outputs a dataset with the UUID, and then having that dataset as an input in your current pipeline.

3 Likes

A good way to create a deterministic Unique ID is to use the “Concatenate strings” transform to join some column values together to make a unique string that will be the same every time you build the pipleine, then apply the “Hash sha256” transform to this string so that it becomes a non-reversible hex value.

For example, if you needed to create a unique identifier for a dataset of employees, you could concatenate their first name, middle initial, surname, hiring date and branch office location together. This of course assumes that you’d never hire someone with the exact same name on the same day at the same place!

So “John B. Smith” hired on 2 September 2022 in London might concatenate to JohnBSmith20250902London. After applying the “Hash sha256” their Unique ID would be 794d15d7d31c04a612e000ac5483c9fb48d8d7b0ef6864ccf8a1d23a8ce51f05.

Note that this approach only works if you have values in the dataset that are guaranteed to be unique when combined.

1 Like