Replicating a RAG CoPilot for use with Customer Service Engine

Samwise_AIP · August 1, 2024, 9:54am

Hey folks

I work in Service Operations for a Medical Device company and have been playing around in Azure with GPT3.5/4/4o and RAG, building CoPilots using our product manuals.

We’ve been seeing approx 70% response accuracy and are working to improve that. Response accuracy is considered a technically correct response that might only need formatting changes before sending out.

Some colleagues and I attended the AIP Expo in Palantir’s London office in June and saw the Customer Service Engine which was impressive.

I then engaged them for a Bootcamp to see if CSE would be something we could use in our service desk.

We wanted to demonstrate the automation capabilities of ingesting and responding to customer cases from our CRM, essentially the idea would be to replicate the RAG based CoPilots we’re already trialing, injecting technical responses into the CSE generated mails back to customers.

I built out a content pipeline using one of the same content sets as the Azure copilots(approx 7,000 pages), but the system was returning way too many chunks and the responses were not of use or failing as the number of chunks were too large to be summarised. I then stripped it back to one document, we used the table of contents to try and index the content more accurately which improved things, but it’s still not anywhere near as good as the RAG based copilots.

Can anyone recommend strategies to more closely replicate a RAG model in Foundry/AIP with pipeline builder? Is this even the best way to do it?

Thanks

Sam

george · August 1, 2024, 8:06pm

Hey Sam,

Glad to hear you’re exploring building this in AIP, and happy to help! I’m going to start from the top and try to make this answer as comprehensive as possible. I suspect some of the info at the beginning will be info you already know as an attendee of a couple past AIP events, but want to make sure this helps anyone else who finds the thread.

First, I’ll call out that you’re right in exploring RAG. It’s almost certainly the right way to solve this problem. I’ll start by sharing how to implement RAG in AIP, and then we can explore some techniques to make it better.

Implementing RAG in AIP

In Foundry/AIP, we ship a product called “Build with AIP”. BwAIP is a library of reference examples for building common workflows in the platform. We have a reference example called “Semantic Search with Palantir-Provided Models", which also includes a RAG example. I think we could clean up the name of this example to make this more obvious. You can find the public facing version of this example here. Even better, you can find it in your own AIP instance. Log in, open the application search in the side bar and type “Build with AIP”. Once the BwAIP app loads, there is a search bar at the top: type Semantic Search. Click into the reference example, hit install and wait ~5 minutes for installation to complete.

Installing the reference example deploys an end to end working example with Pipeline Builder, Ontology Objects and a Workshop App. Inside the Workshop App, you can ask a question, see semantic search retrieve a set of 3 chunks from documents, and then below see a generated summarization for a full RAG implementation. The Workshop App effectively serves two purposes (1) as a guide for how to build your own Semantic Search or RAG implementation and (2) as a UI to visualize the process. An important point to remember on (2) is that virtually any workflow you set up using Workshop can also be set up outside of Foundry/AIP using our Python, Typescript and Java OSDKs.

The reference example is fully editable, so you can swap in your media set full of documents, regenerate the embeddings by running the pipeline and asking whatever questions those documents can answer.

Enhancing RAG in AIP

There are broadly two primary focus areas for enhancing a RAG pipeline: Chunking and Retrieval.

Chunking

You’re already on the right path with using the table of contents to improve how you’re indexing data. Generally, the goal of chunking needs to be to maximize the semantic context of each individual chunks. The table of contents is one way to do this via a guaranteed hierarchy. Other techniques you could try in the same domain are to chunk based on paragraphs and font size. Document authors tend to use features like this to segment their thinking (kind of like how I’ve segmented this document…). If it helps provide humans context, it probably helps provide machines context too.

Second, chunking often causes context to be lost. As an example, refer back to the three paragraphs I wrote in the “Implementing RAG in AIP section”. In the first, I referred to the reference example as Semantic Search with Palantir-Provided Models. In the subsequent paragraphs, I did not use this title again. Instead I referred to it just as “the reference example”. If you were to chunk this document by paragraph, you would lose a lot of semantic context in those second two paragraphs. Pre-processing the data with a coreference resolution model is likely a good way to improve the semantic content of chunks. This is one of the places where building AIP becomes really powerful. Using the Foundry Modeling Suite, which comes out of the box for both AIP and Foundry deployments, you can build traditional models and integrate those models into your pipeline. Check out the docs on the modeling suite broadly here. And then the batch deployment docs for how to integrate a model into your pipeline here

Retrieval

For RAG pipelines, retrieval can be generically interpreted as “What set of chunks most likely contain the answer to my question?” In many traditional RAG workflows, the actual implementation of this generic question is Semantic Search. In the reference example, we perform semantic search across all chunks using a K-Nearest Neighbors (KNN) algorithm. Depending on the usecase, this is sometimes sufficient, but usually not. The power of the Ontology really becomes apparent when we combine semantic search with other types of search.

Ontology/Knowledge Graph Search

I’ll use your specific example and assume you have an ontology that includes at least four objects: Customers, Sales, Devices and Manuals. A few more assumptions:

The content sets are manuals for devices
Customer support requests can be solved by reading the manuals, but nobody reads the manuals because they are way too long and technical.
With the Sales object, we know which customers have purchased which devices
Manuals (PDF form or similar) are linked to Devices

Using the ontology and its built in semantic graph, we can pre-filter the documents that we are searching for answers. This is effectively layering knowledge graph search on top of semantic search. By traversing the links between customer ↔ sale ↔ device ↔ manual, we can filter down to only documents corresponding to devices that were actually purchased by the customer filing the support request. This alone should dramatically reduce the amount of hallucinations. The same logic can be applied to past support requests. What tickets has this customer previously filed? What problems have we historically encountered with the set of devices they have purchased?

Keyword Search

The Ontology comes with keyword search out of the box. In some cases, augmenting the semantic search results with plain old keyword search is a powerful tool. For example, if a customer types their specific device name in their support request, you could search for all documents that contain that keyword. Of course, this loses the semantic meaning of their search so it should be used to augment, rather than replace the semantic search. There are several ways to combine the search algorithms, but the most popular tend to be reranking algorithms where a document or chunks ranking is based on how high it scores on the keyword search (# of occurrences) and how high it scores on the KNN similarity. Alternatively, keywords can be used to pre-filter search results. What works best typically depends on the data asset and requires a bit of tinkering to find the right match.

So How to Do it in AIP?

The answer to this is usually going to be AIP Logic. Logic is our tool for writing no-code functions against the ontology. These can be purely deterministic functions like filters and unions, or they can leverage AI capabilities like semantic search with LLMs. Your best bet for learning AIP Logic is to deploy more of the Build with AIP reference examples. I’d suggest Building your AIP intuition: AI assisted cricket and Leveraging feedback loops in AIP Logic to get started.

In AI Assisted Cricket, you’ll learn how to use Tools in Logic to query the Ontology instead of relying on context in the prompt. This can be directly translated to how you identify which manuals/documents to pass into the semantic search logic board.

In Leveraging feedback loops, you’ll learn how to incorporate outcomes back into the AIP Logic function. This could be translated to a customer satisfaction metric (or a CS Representatives approval/disapproval) for the generated answer.

jdisimone · August 5, 2024, 1:08pm

Hi Sam!

George’s answer is indeed comprehensive and offers a lot of valuable insights. I’d like to expand on a few points, particularly regarding the implementation of the Customer Service Engine (CSE) and how you can embed this functionality to replicate a RAG-based model.

Chunking: Divide your documents into manageable chunks.
Embedding: Convert the content of each chunk into a numerical representation (vector).
Indexing: Store these embeddings into the Ontology for efficient retrieval.
Retrieval: Use Semantic Search to find relevant chunks to answer customer queries.

Chunking

Has already been covered by George.

Embedding

Once you’ve chunked your documentation, transform the text in each chunk into a numerical representation (vector). You can achieve this using the pipeline builder expression: Text to Embedding.

Indexing

With your chunks transformed into embeddings, create a new Object Type in the Ontology.

Here’s an example of what the [Customer Service] Doc Object Type might look like:

content: The text content of the chunk
content_embedding: The embedding of the text, defined as a Vector type with the following Embedding Model:
- Language Modeling Service Model
- OpenAI’s text-embedding-ada-002

Retrieval

Indexing your documentation makes it ready for efficient retrieval. To enhance the generation of emails with relevant context, modify the Generate Response for Customer Service Alert AIP Logic file:

The retrieval block is the most critical operation in enhancing the generation of responses. As George mentioned, this block is responsible for fetching the most relevant chunks of documentation based on the customer’s query. By default, I set it to return the top 50 relevant chunks, but you can adjust this number depending on the capabilities of the LLM used to generate the answer.

Once you have the variable Most Relevant Documentation populated with these relevant chunks, you can pass it as an input to the reply_to_customer block. This integration ensures that the generated response is enriched with precise and contextually appropriate information, thereby improving the overall response accuracy.

Enhancing Retrieval with Ontology

The example provided is a basic version of semantic search, where embeddings are used to find the most relevant chunks of text. However, you can significantly improve the performance and relevance of the retrieved documents by leveraging the Ontology as suggested by George This can involve setting up deterministic filters in AIP Logic to be coupled with the AI Semantic Search Capability presented above.

For example, if you have ontology objects for products, you can link the sub-set of docs related to each product in the ontology. Leveraging this sort of Ontologized document structure while looking up information in your Logic function will make retrieval significantly more performant and accurate.

I hope this adds clarity and provides a structured approach to implementing and enhancing your Customer Service Engine. Feel free to reach out if you have any more questions or need further assistance!

Best, Jacopo

Samwise_AIP · August 7, 2024, 7:53am

@george @jdisimone

Thanks for the replies so far, the pipeline I put together for the CSE build out was essentially a 1:1 copy of the semantic search example

I’ll do some more work on the pipeline and config based on your replies and update as I progress. I’m currently editing the Semantic Search demo with my entire document set and some different chunk sizes/overlaps to see if that provides some better responses.

One thing - my original reply keeps repeating in the thread, not sure why that is.

I delete it, then it gets reposted. Not sure if I’m triggering something to make that happen by deleting it!

Samwise_AIP · August 7, 2024, 12:11pm

So I’ve generated a new set of document chunks from the full document set, and am trying to save the new chunks

I’m getting an error when I try to save the chunks about the primary key being an issue, for some reason the embeddings are being selected as the primary key.

I understand the error, why would the embeddings be the primary key?

The chunk ID would make more sense, however I don’t seem to be able to change the primary key in the ontology object creation step, or within the chunk object itself

The Primary Key switch is greyed out in the ontology object creator, and I don’t see anything obvious

I’m probably missing something obvious, can you point me in the right direction?

thanks!

Samwise_AIP · August 8, 2024, 8:50am

nevermind, resolved it, you have to set something else to the primary key to remove it from the field already acting as the key!

system · January 30, 2025, 5:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.