Is there a recommended way to perform a null-safe equality check in a pipeline builder join similar to PySpark’s null safe equality check( pyspark.sql.Column.eqNullSafe — PySpark 3.5.3 documentation)? For example, suppose I have two tables for customers and transactions where both tables identify customers by “ID” (never null), and “sub ID” (not null only for non-unique IDs).
I’d like to join on “ID” = “ID” and “sub ID” <=> “sub ID” preferably without coalescing literal “NULL” strings into null sub IDs. When I try setting this up to join when sub IDs are equal or both null (shown below), Pipeline Builder warns to not do that because “Using non-equality conditions within a join can be computationally expensive and may result in slower previews and builds.”