Consolidating Row Pairs using Activate / Deactivate Flags

neerajsk · August 26, 2024, 8:08pm

We have fault data coming with with the IDs for machines and fault types along with timestamps and active flags.
A activate/deactivate value of 1 corresponds to the fault turning active and 0 corresponds to the fault ending.
I’d like to to implement something that works to take something like the left blue table to the right brat green table
I’ve been trying to use windows and group bys but can’t figure out how to implement this logic using pipeline builder transforms.
Thanks!!

helenq · August 26, 2024, 8:34pm

Hey @neerajsk you could try the pivot board where the pivot by column is the activate/deactivate column (the values are 0 and 1) and the group by column (where it says “Rows”) is Machine id

For the aggregation you can use First and under expression choose your Timestamp column

Let me know if you have any questions!

neerajsk · August 26, 2024, 10:04pm

Correct me if I’m wrong, but I don’t think that approach would work if the same machine can report the same fault ID further down in the data, which is the case.
With the data sorted according to timestamp, the 1 row (activate) for a given machine and fault has to be paired to the next soonest 0 row (deactivate) of the same machine and fault code to be reported as one fault occurrence.
We might, for example, have something like this.

helenq · August 27, 2024, 7:29pm

Hi @neerajsk yes my solution above only works with the first example. If you want to take into account repeated activations you should use the window function to get another column that ties together each start/end row pair.

You can try out having a sorted window by id (sorted by time) and use the row number expression.

After you have the row number, case based on the activate/deactivate being 1 (if it’s 0, make the row number null). This way, you can do a second window partitioned by id and sorted by time, where the expression takes the last of the row number (hence tying together the start/end pairs based on id and time). You can then use this new column combined with the machine id as a key to self join the dataset to get the start/end pairs

For the self join, I would filter your dataset based on the activate/deactive values so you have two datasets one of all the active and one with all the deactive and then join based on the new column above and the original id