Union Between datasets, but prefer left dataset for overlapping PK's

I want to do a union between datasets A and B. They have the exact same columns and I’d like to keep every single row in dataset A which is my “truest” dataset. I only want to add rows from B to A that don’t have a pk that is already in A. What’s this join / union called or is it a chaining of operations?

I see an “anti join” but no “left anti join” join which would be useful for getting all of the rows in B that are not in A.

Hey! You should be able to use the anti join. You would do an anti join to only get the rows in B that don’t exist on A and then union that result with dataset A using B as the left side of the anti join and A as the right side.

Another way you could do this is union both datasets first and then deduplicate using a window + filter so that you always take the row from dataset A before the row from dataset B

Let me know if you’re running into specific problems and we can iterate here

2 Likes

I see! So the “Anti Join” Join operation acts as a “Left Anti Join”. Makes sense.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.