PDFs within AIP Logic Functions

ianferre · February 17, 2025, 12:25pm

Often I have the need to load a PDF to a Mediaset and then have the need to parse that PDF – I have to use OCR before passing to an LLM which feels like an unnecessary step.

Would like the feature to just use a PDF Mediaset directly in AIP logic just like I can with an image Mediaset.

aash · February 19, 2025, 6:56pm

Hi Ian, thanks for asking!

The way we currently have it set up in AIP Logic is to give users maximal flexibility in how PDF documents are passed to LLMs. They don’t generally accept PDF documents directly, and only accept plain text or images.

There are a bunch of choices to make in how to pass the PDF to the LLM in different scenarios:

OCR to get plaintext? Or render as an image? Or both?
For plaintext, might want structured text with markdown-style headers or html/xml structure.
How should any tables/images/figures/charts be passed?
For rendering as an image:
1. what image format to use? JPG/PNG/TIFF etc have different sweet spots
2. what dots per inch (dpi) to use? PDF documents have metadata about DPI, but we’ve seen it be incorrect and too high or too low for how much detail is in the document
3. how should the resulting image be downscaled? LLMs have a maximum resolution of image they support
4. should the document be 1 image per page, or an image for each visual “thing” on the page? That requires running a model for image detection.
for OCR, there are different models with different configuration settings you might want to use. Some are optimized for certain languages/scripts, some do better with structured layouts vs top-to-bottom text, etc.

Hope that helps with why we have the extra step for how documents are passed into LLMs, as compared to images which can be fed directly in.

That said, we’re always looking for ways to make workflows cleaner and simpler. So if you have any ideas on how to make this better, please let us know!

For example, maybe some sort of “Format document for LLM” board with toggles/options for the above choices that gets inserted when referencing a document in the Use LLM board’s prompt field. I think we do something similar for formatting object sets.

bkaplan · February 20, 2025, 3:24pm

One thing you can use is the convert document page to image. This will allow you to turn the PDF into an image format which can then be passed to various LLM vision models.

system · March 22, 2025, 3:24pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.