I’m looking to build an AI agent using Palantir AIP to analyze PDFs, images, and documents—extracting insights, summarizing key data, and answering follow-up questions.
I’d appreciate any guidance on best practices, tools/frameworks within AIP, integration tips, or relevant documentation/examples to get started.
For uploading PDFs to an AIP Agent to provide the Agent with the document text as context, you can use the Document context retrieval option (documentation available here).
You can then provide custom prompts with your agent for how to analyze the document, and ask follow up questions with the agent in the conversation.
For long documents that exceed the context limit of the model for your Agent, or to also extract images from your documents, you could build a pipeline to handle the extraction and store your extracted document chunks to the ontology.
The examples in the related question here, and the example for parsing PDFs with pipeline builder here, may be helpful starting points here.
I want the agent to allow users to upload PDFs dynamically, summarize the content, and answer any follow-up questions based on the same document. The goal is for users to interact with the document seamlessly without manually extracting information.
Currently, AIP only supports image input for users—there’s no direct option for PDFs or other document types. Is there a recommended approach for handling this?
One example of how you could achieve this is to build an application with Workshop which would allow users to upload PDF documents using the Media Uploader widget.
You could then trigger an Action to save the document to an object and pass this to an AIP Agent embedded in your Workshop application with the AIP Interactive Widget as an application variable.