Best Approach to Match a Single Input Sentence Against Workshop Object Set Properties

Hi,

I’m trying to solve the following problem:
Given a single input sentence, I want to compare it against the properties of an object set retrieved from a workshop and return the top 5 most similar objects.

I’ve already tried embeddings, but the results were poor. I also tested using AIP Logic with the entire object type loaded into a tool — this gave decent results — but I couldn’t figure out a way to load the workshop’s dynamic object set into the tool, so that approach failed.

Has anyone implemented a method that works well for this type of sentence-to-object matching without relying on embeddings?
Any suggestions for robust alternatives or practical implementation tips would be appreciated.

Hi Jacob,

Why hasn’t embeddings worked for you?

Imprecise results are sometimes due to the embeddings covering too much text, so one approach to be to change your chunking strategy from e.g. whole documents or pages to a smaller unit.

Alternatively you could try a reranking function, as well as looking into different methods of search.

There’s a great project in the Build with AIP-stack: Evaluate Retrieval-Augmented Generation (RAG) methods

You can use that to try and build an evaluation suite for your current use case – or you could just lift the method and try it out directly.

I’ve found that a combination of HyDE and augmented keyword search works well in a lot of cases.

Thank you for your answer but I am saying search result rather than RAG.

I’ll try HyDE + augmented keyword search option as well!

Simple embedding search is result in as below.

Search items: Round bar GB/T 699-2015 Grade 45 75*6000 mm

Return Result:

45 ELBOW; 4"; A234 GR.WPB; SMLS; BW; SCH40; HOT-DIP GALV.; N8576

Dia 450 - Elbow 22.5 Deg TB-TB GRE 10 Bar 83°C

45 ELBOW; 18"; A420 GR.WPL6; SMLS; BW; XS

45 ELBOW; 16"; A420 GR.WPL6; SMLS; BW; STD

Thanks for sharing this example!

I think your issue might be, that embeddings (especially with the common text embedding models) might not be a great fit for your use case, since it involves a lot of acronyms, shorthand, etc. which can be difficult to infer the semantic context of.

A few ideas on how to move forward:

  1. Test your AIP Logic function again. You can pass an input variable to the function, which can take an objectset of object type A, which you can then pass from Workshop, as well as the query string for your search.
  2. Try writing a filtering function, using fuzzy search. On the results, you can run orderByRelevance() and then take(5) to get the 5 most relevant results. This might be more suited to your data, so let me know how that works out for you. You can find a couple of code snippets in the Ontology API docs.
  3. Use the sample search project I recommended earlier, and implement fuzzy search there, and then use that + semantic search, to see if this is better.
  4. If you are still not quite getting the results you’d want, there are still more ways to try out, e.g. classifying your data, or using a cheap, small and fast LLM model to pick out the most relevant items, based on some logic you provide it with in the prompt. This prompt should explain the logic behind the naming convention you use for your products, so the LLM can spot these patterns and use them at scale.
  5. Once this prompt is sorted and the results are great, you can reuse the prompt + some example search and returns to have e.g. Claude Code build this into a ruleset, that can be used with some of the previous search paradigms, potentially skipping the LLM step.

Let me know what works for you - and feel free to share more examples of what the logic is behind a ‘good result’ for a specific search time, and we can take a deep dive into this, if none of these suggestions are hitting the spot.