I’ve done the AIP tutorial, but want to incorporate structured visual data that’s also in the PDF’s; things like graphs and tables of values. My issue is “Use LLM” transforms cannot read the PDF’s, the vision models can only read image mediasets.
Is there a no-code way to convert the PDF’s into images within pipeline builder? or is this some function I gotta manually code in?
My plan is to use an LLM to scan the PDF and identify specific page(s) where structured visual data is located; outputting a nested array, or similar structure, to handle cases where the visual data spans multiple pages. Next, apply hOCR to extract both the positional data and text for each identified section. The LLM will then generate a text summary of the hOCR-extracted data & a summary of the structured visual content, in order to create relevant entities. Finally, the summarized information gets embedded into a vector, similar to the approach in the AIP tutorial for text data in PDF.