Semantic Search Between Two Objects with Many-to-Many Relationship in Foundry

Semantic Search Between Two Objects with Many-to-Many Relationship in Foundry

I have two objects with embeddings and need to establish a many-to-many relationship based on semantic similarity:

  • Object1: Contains id1 (primary key) and embedding1 (vector)
  • Object2: Contains id2 (primary key) and embedding2 (vector)
  • Requirement: For each id1, I need to find and assign the two most semantically similar id2 values based on comparing embedding1 and embedding2

Example Data:

Desired Output: What I need is to re-establish this mapping based on semantic similarity between embeddings, so for each id1, I’d have the two most similar id2 values based on cosine similarity (or other appropriate vector similarity measure) between their respective embeddings.

My Questions:

  1. What’s the most efficient way to implement this semantic search in Foundry?
  2. Should I use Foundry’s vector functions, or do I need to implement a custom transform?
  3. Is there a best practice pattern for handling this type of many-to-many relationship based on vector similarity?

Any code snippets or transform examples would be greatly appreciated!

1 Like

Few questions:

  • Where are you showing the result?
  • Do you need to store this information?
  • If you need to store this, how often does the information change?

If you want to bring up these two objects in a Workshop application, it’s relatively simple to do so:

  • If your workflow is one where the user selects single objects to look at their properties/links, you can create a variable based on the selected object, where the vector of Object1 is compared to the vectors of Object 2 and k=2 objects are selected.
  • If you need to show this for a large number of objects, a typescript function would be relatively performant:
@Function()
    private async vectorCosineRetrieval(
        query_vector: string,
        k: Integer,
        obj2: ObjectSet<Object2>
    ): Promise<Object2[]> {
        return results
        .nearestNeighbors((obj2) => obj2.embeddedings.near(query_vector, kValue: k))
        .orderByRelevance()
        .take(k);

This code returns an array of k objects of Object2, but can be refactored to return specific properties from or aggregations on these objects.

If you want use this to create a mapping table, this might be computationally expensive depending on your scale and use case, since an additional object added could theoretically change the entire mapping, meaning that you would need to re-run the transform with every object added to Object1 or Object 2.

If the scale is small or the updates are very infrequent (and the cost is worth it), you can do a simple transform in Pipeline builder (look up KNN-join for a simple example).

Let me know if any of this works for your use case. If not, feel free to explain it further.

3 Likes

Thanks for your response! Yes, I need to store these selections permanently. To clarify:

  • This is for a customer-offer matching system where id1 = customers (~2M+) and id2 = offers
  • The top 2 semantically similar offers need to be assigned to each customer and stored for future actions like email, sms, sales rep interactions etc.
  • These assignments will be accessed by sales reps when they pull up customer profiles in their applications
  • We need this to scale to 2M+ customers efficiently and frequently based on customer actions.

I am new to foundry. Any guidance/suggestions will be greatly appreciated.
Let me try your recommendations and get back to you with more questions.
Appreciate your help.

1 Like

Exciting problem!

There’s one major design decision here, that I think is worth spending a few cycles thinking about: Whether you actually need to store these matches, or whether it’s better to calculate them on the fly.

Points to consider:

  • Will these offers change a lot? If so, storing this information will require rerunning the matching pipeline over potentially all customers.
  • Will all customer profiles be looked at by a sales rep, receive an email or an sms, at the same time? If not, then the information needs to be updated at every point in this timeline.
  • A scenario where <5% of your data is used, but the remaining >95% is still updated can be computationally expensive and wasteful.

I will stress, that you can build something meaningful, even if you are not an experienced Foundry user!

I would suggest starting with a version of the code I posted previously, and adapt it to your object property names. I’ve added an example below – only changes is setting scope to public (so it’s available to your applications) and adjusting property names to make it easier to read.

You pass it an Object1, the number of returns you want (2), and the ObjectSet of Object2, and it will return the two semantically nearest offers to that customer object’s embeddings. You can also hardcode Object2 into the function, but this allows you to do some filtering beforehand and pass a subset.

If you’re building this in Workshop, you can use any single object output from a widget (e.g. Object List, Object Table, etc.) as the input, and then get it to return the two offers as it’s output, which you can then display in the detailed view.

Are you building a proof-of-concept right now, or are you working on the production application?

@Function()
    public async vectorCosineRetrieval(
        obj1: Object1,
        k: Integer,
        obj2: ObjectSet<Object2>
    ): Promise<Object2[]> {
        const id1Embeddings = obj1.embedding1;
        return results
        .nearestNeighbors((obj2) => obj2.embedding2.near(id1Embeddings, kValue: k))
        .orderByRelevance()
        .take(k);
2 Likes

Working on a POC. Hope to convince enough Executives so that eventually it will lead to a real boot camp. Thanks.

1 Like

Hey Mathewanand,

Very cool use case, I hope you get to building this out at your organization.
Have you tried mapping this out in AIP’s Solution Designer?The AIP Architect can give you and executives a single-pane of glass to view your process & a great way to collaborate on an Ontology.

best of luck,
Maverick

1 Like

Loved to read through this thread! Mighty fist bump :right_facing_fist::left_facing_fist: to @Bellerophon for sharing your exciting challenge (it immediately sparked my curiosity to hear more - please keep us updated) and :right_facing_fist::left_facing_fist: to @jakehop for your excellent ideas and support.

2 Likes

bookmarked it! hope we get an update in a few weeks..

1 Like

Super cool use case @Bellerophon !

I would argue, that the dynamic calculation might be the best solution for your use case, rather than the mapping table.

Here’s a quick step-by-step to get started:

(step 0. Create a TypeScript Code Repository, import your two object types and tailor the code above to return two order objects. Call your function something better (e.g. matchSimilarOrdersToCustomer) and make sure you tag and release it. See the tutorials/docs for help on this, they explain it well. Also make sure that you have created your embeddings for all of the objects – use Pipeline builder for that, and let me suggest Text Embeddings 3 Small for that. Make sure not to start with too much data, as your dev work could get expensive.)

  1. Create your workshop application and import both object types
  2. Add an Object List widget and input your customer objects. This list will create a subset, that will contain the selected object. Remove object auto-selection for the next part to work. Select one customer object and proceed to the next step.
  3. Add a new Object Set variable, and use the function you created. As it’s input, pick the selected object variable that the list above outputs and set k=2, and if you didn’t hardcode the order-object in the function, create a new ObjectSet variable where you just import the entire Order object into, and use that as the last function input. Make sure the value is set to automatically recalculate.
  4. Create an overlay in Workshop. Add an Object Property widget in the top with the selected Customer object as the input, and as a proof of concept, add an object table in there and use the ObjectSet you just created from the function as it’s input. You should now see two (or what your k-value is set to) order objects in that table.
  5. Close the overlay, click on the Object List again, and where it says “On Object Selection” you add an event, and pick “Open [whatever you called your overlay]”.
  6. Now, when you click on a customer in the list, it should open the overlay, showing the customer details, and the object table below will show the two offer-objects with the nearest embeddings to the customer’s embeddings.

This is very rudimentary, but it will be a good way for you to see how things are connected in Foundry. Understanding the whole “lots of objects => drill down => find links / related objects => show to user”-workflow is helpful, as it’s a very common pattern.

With the approach above, you will always have the freshest data, and you can tell the Executives, that it literally calculates this on the fly, rather than having a set update schedule.

Embeddings for new Customer/Order-objects can be automated as part of the ingestion pipeline, requiring calculating only the embedding for that specific entry, rather than recomputing everything as would be the case with a mapping table.

Lastly, for larger comparisons (e.g. for generating email/sms-pairs of customers/orders), you can do that in a Pipeline, where the output can be used to feed an email service with customer information + the most relevant orders. That way your company only pays for the work that’s needed to generate value.

Hope this helps, and let us know if you have more questions.

2 Likes

Hi Jakehop, Appreciate all the detailed explanation. Unable to make this Typescript working. Could you help me a bit more? Here is what I have done.

Created the datasets and Ontology Objects.

Imported Objects and started your code.

I am getting teh following compilation error.

src/index.ts(12,15): error TS2552: Cannot find name ‘ObjectSet’. Did you mean ‘Object’?

src/index.ts(13,16): error TS2552: Cannot find name ‘Object2’. Did you mean ‘Object’?

src/index.ts(14,36): error TS2551: Property ‘Desired_Customer_Action_Embedding’ does not exist on type ‘ToyCustomer’. Did you mean ‘desiredCustomerActionEmbedding’?

src/index.ts(15,16): error TS2304: Cannot find name ‘results’.

src/index.ts(16,28): error TS7006: Parameter ‘obj2’ implicitly has an ‘any’ type.

src/index.ts(16,83): error TS2304: Cannot find name ‘kValue’.

src/index.ts(16,89): error TS1005: ‘,’ expected.

src/index.ts(21,1): error TS1005: ‘}’ expected.

I really appreciate your help.

Thanks

1 Like

Hi @Bellerophon,

You need to customise the code, in order for it to properly fetch the right objects and properties on your stack.

From your screenshots, it looks like you need to do the following:

  1. Import ObjectSet from @foundry/ontology-api
  2. Change the Promise in line 13. We are basically “promising” the code what the return will be, which is needed for asynchronous functions due to evaluation issues. We are returning an array of ToyOffer-objects, so we should change Promise<Object2> to Promise<ToyOffer>
  3. The property you are calling in line 14 is not the right name. There are usually two naming conventions – what the property is called in the dataset (what you’ve entered) and what it is called in the API. The latter is usually in camelCase, and it’s very likely called desiredCustomerActionEmbedding. This can all be seen in the Ontology Manager, but Code Repos should also display it as an option, when you write the variable name (obj1) and add a dot after it.
  4. Line 16 has the same issue as above – the property name needs to be the correct API name.
  5. I made a mistake (oops!) in line 16: It shouldn’t say “kValue: k”, but rather “{ kValue: k}” (without the quotation marks).

Let me know if this fixes it.

1 Like

Thanks Jakehop.
It helped a lot. Still facing some compilation challenges.

I am getting these errors:
src/index.ts(15,16): error TS2304: Cannot find name ‘results’.
src/index.ts(16,28): error TS7006: Parameter ‘obj2’ implicitly has an ‘any’ type.

Could you please take a look?
Thanks.

Ah, that one’s easy: if you replace results with obj2, it should work.

Thanks Jakehop. Working on it.

1 Like