Does AIP Agent Studio support Scanned document?

Hello There,

I was wondering if it’s possible in AIP Agent Studio to limit the number of input and output tokens. The main reason I want this is to reduce costs and to shorten the responses to include only the important information.

Additionally, I have another question: is it feasible to provide a document containing images as input to the AIP Agent? I tested this, but when I asked the LLM to answer questions based on the document has content as images, it couldn’t provide an answer because it couldn’t read information from images.

Finally, does AIP Agent Studio support only documents in PDF format when you upload a document within Retrieve context? Can you please help me with this?

Hi @Mouhcin,

(1) Currently you cannot set input/output token limit in Agent Studio. We are tracking the feature request.

(2) For the “Document-context” mode of Agent Studio, we do text extraction and do not support extraction of images within the document. It is also a feature request we are tracking.

As an alternative, if your document has images you would like to be given to an LLM, you could build a pipeline that does the preprocessing you would like, OCR or extracting each page as an image and then using a custom Function to do the semantic search. This can be used in Agent Studio’s new “Function-backed context” or via a Function tool.

(3) We natively support PDF but if your Media Set has a secondary type associated with it you can upload that type. For example, in Agent Studio, I can go to Document-context, Upload Documents, Create a new media set and add a docx file

Hey Narmburst,

Thank you for the detailed response!

Regarding the second point, is there any example, tutorial, or guide available that can help me meet this setup?
Any pointers, documentation, or best practices would be greatly appreciated.

Thanks!

Hi @Mouhcin,

A guide to parsing documents in Pipeline Builder exists here.

Our Function-backed Retrieval Context just got released so the documentation is coming ASAP.

But to show some of that here so you can get started:

Writing a Context Retrieval Function (RAG Function)

  1. In a Typescript code repo, import the AipAgentsContextRetrieval function interface:

  2. Import the types into your TS code
    a. the Get Started panel under the imported function interface is very helpful
    b. It should take minute or so to start the typescript server

  3. Write your function!

  4. Commit and tag a release for your function

Giving your Agent a RAG Function

  1. In the Context tab, select Function-backed content

  2. Select your function

  3. Use your Agent; you should see your Function RAG content in the view reasoning panel and in the rendered prompt!

Level Up: Map Application Variables to Function Inputs
You may add optional string and object set inputs to your RAG function and fill those inputs with Agent Studio application variables!

  1. Add inputs to your function
    a. Below is a contrived example
    b. Note that in order to query the ontology, we have to use async and Promise on our function signature

  2. Tag and release your function

  3. In Agent Studio, in the Function-backed context configuration, optionally map Application Variables to these inputs

  4. Use your Agent; you should see the current values of your mapping in the reasoning panel and you should see the output of your function in the raw prompt!

Hello @narmbrust
Thans for your reply.

I was wondering how I can meet my requirements as I am still new to Palantir Foundry. Could you please help me by providing an example of an image or scanned document? Based on what you showed me; can you implement a solution where an AIP agent can answer user questions from that image or scanned doc? Please implement the solution you provided by using a pipeline with a custom function and import that function in the Function-backed context.

Could you help me step-by-step? I really lack this information and couldn’t find it anywhere.

I would really appreciate it.

You can follow the tutorial to parse documents I gave above which is a step-by-step for how to do this in pipeline builder.

Next step would be to chunk & embed those documents, this is a great example of how to do that in Pipeline builder.

Next step would be to pull those embeddings into the Ontology via a new Object Type, Documents & Document Chunks.

Then in Agent Studio you can semantically search the Document Chunk object type by adding Ontology Context to the agent, see docs here. The agent will embed the user query with the same embedding model used in the Document Chunk object type and semantically search that object type to give context to the agent on every message sent.

If a more granular hand-on walkthrough would help, I suggest reaching out to your Palantir representative!