Does AIP Agent Studio support Scanned document?

Mouhcin · January 28, 2025, 11:12am

Hello There,

I was wondering if it’s possible in AIP Agent Studio to limit the number of input and output tokens. The main reason I want this is to reduce costs and to shorten the responses to include only the important information.

Additionally, I have another question: is it feasible to provide a document containing images as input to the AIP Agent? I tested this, but when I asked the LLM to answer questions based on the document has content as images, it couldn’t provide an answer because it couldn’t read information from images.

Finally, does AIP Agent Studio support only documents in PDF format when you upload a document within Retrieve context? Can you please help me with this?

narmbrust · January 28, 2025, 4:13pm

Hi @Mouhcin,

(1) Currently you cannot set input/output token limit in Agent Studio. We are tracking the feature request.

(2) For the “Document-context” mode of Agent Studio, we do text extraction and do not support extraction of images within the document. It is also a feature request we are tracking.

As an alternative, if your document has images you would like to be given to an LLM, you could build a pipeline that does the preprocessing you would like, OCR or extracting each page as an image and then using a custom Function to do the semantic search. This can be used in Agent Studio’s new “Function-backed context” or via a Function tool.

(3) We natively support PDF but if your Media Set has a secondary type associated with it you can upload that type. For example, in Agent Studio, I can go to Document-context, Upload Documents, Create a new media set and add a docx file

Mouhcin · January 29, 2025, 3:02am

Hey Narmburst,

Thank you for the detailed response!

Regarding the second point, is there any example, tutorial, or guide available that can help me meet this setup?
Any pointers, documentation, or best practices would be greatly appreciated.

Thanks!

narmbrust · January 29, 2025, 7:04pm

Hi @Mouhcin,

A guide to parsing documents in Pipeline Builder exists here.

Our Function-backed Retrieval Context just got released so the documentation is coming ASAP.

But to show some of that here so you can get started:

Writing a Context Retrieval Function (RAG Function)

In a Typescript code repo, import the AipAgentsContextRetrieval function interface:

image (32)2090×1272 198 KB
Import the types into your TS code
a. the Get Started panel under the imported function interface is very helpful
b. It should take minute or so to start the typescript server
Write your function!

image (34)2028×554 52.1 KB
Commit and tag a release for your function

Giving your Agent a RAG Function

In the Context tab, select Function-backed content

image (35)958×832 59.5 KB
Select your function

image (36)2248×1304 263 KB
Use your Agent; you should see your Function RAG content in the view reasoning panel and in the rendered prompt!

image (37)3268×1898 394 KB

Level Up: Map Application Variables to Function Inputs
You may add optional string and object set inputs to your RAG function and fill those inputs with Agent Studio application variables!

Add inputs to your function
a. Below is a contrived example
b. Note that in order to query the ontology, we have to use async and Promise on our function signature

image (38)2534×806 219 KB
Tag and release your function
In Agent Studio, in the Function-backed context configuration, optionally map Application Variables to these inputs

image (39)1046×1314 85.7 KB
Use your Agent; you should see the current values of your mapping in the reasoning panel and you should see the output of your function in the raw prompt!

image (40)2384×1886 341 KB

Mouhcin · January 30, 2025, 10:26am

Hello @narmbrust
Thans for your reply.

I was wondering how I can meet my requirements as I am still new to Palantir Foundry. Could you please help me by providing an example of an image or scanned document? Based on what you showed me; can you implement a solution where an AIP agent can answer user questions from that image or scanned doc? Please implement the solution you provided by using a pipeline with a custom function and import that function in the Function-backed context.

Could you help me step-by-step? I really lack this information and couldn’t find it anywhere.

I would really appreciate it.

narmbrust · January 31, 2025, 6:48pm

You can follow the tutorial to parse documents I gave above which is a step-by-step for how to do this in pipeline builder.

Next step would be to chunk & embed those documents, this is a great example of how to do that in Pipeline builder.

Next step would be to pull those embeddings into the Ontology via a new Object Type, Documents & Document Chunks.

Then in Agent Studio you can semantically search the Document Chunk object type by adding Ontology Context to the agent, see docs here. The agent will embed the user query with the same embedding model used in the Document Chunk object type and semantically search that object type to give context to the agent on every message sent.

If a more granular hand-on walkthrough would help, I suggest reaching out to your Palantir representative!

system · April 1, 2025, 6:48pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.