I was wondering if it’s possible in AIP Agent Studio to limit the number of input and output tokens. The main reason I want this is to reduce costs and to shorten the responses to include only the important information.
Additionally, I have another question: is it feasible to provide a document containing images as input to the AIP Agent? I tested this, but when I asked the LLM to answer questions based on the document has content as images, it couldn’t provide an answer because it couldn’t read information from images.
Finally, does AIP Agent Studio support only documents in PDF format when you upload a document within Retrieve context? Can you please help me with this?
(1) Currently you cannot set input/output token limit in Agent Studio. We are tracking the feature request.
(2) For the “Document-context” mode of Agent Studio, we do text extraction and do not support extraction of images within the document. It is also a feature request we are tracking.
As an alternative, if your document has images you would like to be given to an LLM, you could build a pipeline that does the preprocessing you would like, OCR or extracting each page as an image and then using a custom Function to do the semantic search. This can be used in Agent Studio’s new “Function-backed context” or via a Function tool.
(3) We natively support PDF but if your Media Set has a secondary type associated with it you can upload that type. For example, in Agent Studio, I can go to Document-context, Upload Documents, Create a new media set and add a docx file
Regarding the second point, is there any example, tutorial, or guide available that can help me meet this setup?
Any pointers, documentation, or best practices would be greatly appreciated.
Import the types into your TS code
a. the Get Started panel under the imported function interface is very helpful
b. It should take minute or so to start the typescript server
Level Up: Map Application Variables to Function Inputs
You may add optional string and object set inputs to your RAG function and fill those inputs with Agent Studio application variables!
Add inputs to your function
a. Below is a contrived example
b. Note that in order to query the ontology, we have to use async and Promise on our function signature
Use your Agent; you should see the current values of your mapping in the reasoning panel and you should see the output of your function in the raw prompt!
I was wondering how I can meet my requirements as I am still new to Palantir Foundry. Could you please help me by providing an example of an image or scanned document? Based on what you showed me; can you implement a solution where an AIP agent can answer user questions from that image or scanned doc? Please implement the solution you provided by using a pipeline with a custom function and import that function in the Function-backed context.
Could you help me step-by-step? I really lack this information and couldn’t find it anywhere.
Next step would be to pull those embeddings into the Ontology via a new Object Type, Documents & Document Chunks.
Then in Agent Studio you can semantically search the Document Chunk object type by adding Ontology Context to the agent, see docs here. The agent will embed the user query with the same embedding model used in the Document Chunk object type and semantically search that object type to give context to the agent on every message sent.
If a more granular hand-on walkthrough would help, I suggest reaching out to your Palantir representative!