Is it possible to do a KNN join in pipeline builder using cosine similarity of two embeddings (in two tables)? I essentially want to replicate KNN with Levenshtein distance, but with embedding cosine similarity.
Hey @acapras currently, Pipeline Builder does not support KNN joins using cosine similarity out of the box. However, this is actively being worked upon and we can update this thread when it’s ready for use! You can also manually use the transforms already available in Pipeline Builder to do cosine similarity:
Sample pipeline with screenshots (shoutout to @david for the example!)
First make sure to do the text to embeddings on the two columns in question
Cross join the two datasets
Cosine similarity logic:
2 Likes