Hey Sam,
Glad to hear you’re exploring building this in AIP, and happy to help! I’m going to start from the top and try to make this answer as comprehensive as possible. I suspect some of the info at the beginning will be info you already know as an attendee of a couple past AIP events, but want to make sure this helps anyone else who finds the thread.
First, I’ll call out that you’re right in exploring RAG. It’s almost certainly the right way to solve this problem. I’ll start by sharing how to implement RAG in AIP, and then we can explore some techniques to make it better.
Implementing RAG in AIP
In Foundry/AIP, we ship a product called “Build with AIP”. BwAIP is a library of reference examples for building common workflows in the platform. We have a reference example called “Semantic Search with Palantir-Provided Models", which also includes a RAG example. I think we could clean up the name of this example to make this more obvious. You can find the public facing version of this example here. Even better, you can find it in your own AIP instance. Log in, open the application search in the side bar and type “Build with AIP”. Once the BwAIP app loads, there is a search bar at the top: type Semantic Search. Click into the reference example, hit install and wait ~5 minutes for installation to complete.
Installing the reference example deploys an end to end working example with Pipeline Builder, Ontology Objects and a Workshop App. Inside the Workshop App, you can ask a question, see semantic search retrieve a set of 3 chunks from documents, and then below see a generated summarization for a full RAG implementation. The Workshop App effectively serves two purposes (1) as a guide for how to build your own Semantic Search or RAG implementation and (2) as a UI to visualize the process. An important point to remember on (2) is that virtually any workflow you set up using Workshop can also be set up outside of Foundry/AIP using our Python, Typescript and Java OSDKs.
The reference example is fully editable, so you can swap in your media set full of documents, regenerate the embeddings by running the pipeline and asking whatever questions those documents can answer.
Enhancing RAG in AIP
There are broadly two primary focus areas for enhancing a RAG pipeline: Chunking and Retrieval.
Chunking
You’re already on the right path with using the table of contents to improve how you’re indexing data. Generally, the goal of chunking needs to be to maximize the semantic context of each individual chunks. The table of contents is one way to do this via a guaranteed hierarchy. Other techniques you could try in the same domain are to chunk based on paragraphs and font size. Document authors tend to use features like this to segment their thinking (kind of like how I’ve segmented this document…). If it helps provide humans context, it probably helps provide machines context too.
Second, chunking often causes context to be lost. As an example, refer back to the three paragraphs I wrote in the “Implementing RAG in AIP section”. In the first, I referred to the reference example as Semantic Search with Palantir-Provided Models
. In the subsequent paragraphs, I did not use this title again. Instead I referred to it just as “the reference example”. If you were to chunk this document by paragraph, you would lose a lot of semantic context in those second two paragraphs. Pre-processing the data with a coreference resolution model is likely a good way to improve the semantic content of chunks. This is one of the places where building AIP becomes really powerful. Using the Foundry Modeling Suite, which comes out of the box for both AIP and Foundry deployments, you can build traditional models and integrate those models into your pipeline. Check out the docs on the modeling suite broadly here. And then the batch deployment docs for how to integrate a model into your pipeline here
Retrieval
For RAG pipelines, retrieval can be generically interpreted as “What set of chunks most likely contain the answer to my question?” In many traditional RAG workflows, the actual implementation of this generic question is Semantic Search. In the reference example, we perform semantic search across all chunks using a K-Nearest Neighbors (KNN) algorithm. Depending on the usecase, this is sometimes sufficient, but usually not. The power of the Ontology really becomes apparent when we combine semantic search with other types of search.
Ontology/Knowledge Graph Search
I’ll use your specific example and assume you have an ontology that includes at least four objects: Customers, Sales, Devices and Manuals. A few more assumptions:
- The content sets are manuals for devices
- Customer support requests can be solved by reading the manuals, but nobody reads the manuals because they are way too long and technical.
- With the Sales object, we know which customers have purchased which devices
- Manuals (PDF form or similar) are linked to Devices
Using the ontology and its built in semantic graph, we can pre-filter the documents that we are searching for answers. This is effectively layering knowledge graph search on top of semantic search. By traversing the links between customer ↔ sale ↔ device ↔ manual, we can filter down to only documents corresponding to devices that were actually purchased by the customer filing the support request. This alone should dramatically reduce the amount of hallucinations. The same logic can be applied to past support requests. What tickets has this customer previously filed? What problems have we historically encountered with the set of devices they have purchased?
Keyword Search
The Ontology comes with keyword search out of the box. In some cases, augmenting the semantic search results with plain old keyword search is a powerful tool. For example, if a customer types their specific device name in their support request, you could search for all documents that contain that keyword. Of course, this loses the semantic meaning of their search so it should be used to augment, rather than replace the semantic search. There are several ways to combine the search algorithms, but the most popular tend to be reranking algorithms where a document or chunks ranking is based on how high it scores on the keyword search (# of occurrences) and how high it scores on the KNN similarity. Alternatively, keywords can be used to pre-filter search results. What works best typically depends on the data asset and requires a bit of tinkering to find the right match.
So How to Do it in AIP?
The answer to this is usually going to be AIP Logic. Logic is our tool for writing no-code functions against the ontology. These can be purely deterministic functions like filters and unions, or they can leverage AI capabilities like semantic search with LLMs. Your best bet for learning AIP Logic is to deploy more of the Build with AIP reference examples. I’d suggest Building your AIP intuition: AI assisted cricket
and Leveraging feedback loops in AIP Logic
to get started.
In AI Assisted Cricket, you’ll learn how to use Tools in Logic to query the Ontology instead of relying on context in the prompt. This can be directly translated to how you identify which manuals/documents to pass into the semantic search logic board.
In Leveraging feedback loops, you’ll learn how to incorporate outcomes back into the AIP Logic function. This could be translated to a customer satisfaction metric (or a CS Representatives approval/disapproval) for the generated answer.