I want to do a union between datasets A and B. They have the exact same columns and I’d like to keep every single row in dataset A which is my “truest” dataset. I only want to add rows from B to A that don’t have a pk that is already in A. What’s this join / union called or is it a chaining of operations?
I see an “anti join” but no “left anti join” join which would be useful for getting all of the rows in B that are not in A.
Hey! You should be able to use the anti join. You would do an anti join to only get the rows in B that don’t exist on A and then union that result with dataset A using B as the left side of the anti join and A as the right side.
Another way you could do this is union both datasets first and then deduplicate using a window + filter so that you always take the row from dataset A before the row from dataset B
Let me know if you’re running into specific problems and we can iterate here