Extract Text from PDF in Pipeline Builder

frwu93 · July 18, 2024, 2:13pm

Is the PDF Text Extraction board backed by GPT? If so, does it also work with self-hosted models?

tiffany · July 18, 2024, 2:36pm

Hi, the PDF text extraction board is not backed by GPT. It is backed by text extraction models that are specialized at reading and parsing PDFs. You cannot use self-hosted models for the PDF parsing. If you want to use self-hosted models, you can import a UDF that uses the model directly into your pipeline.

frwu93 · July 18, 2024, 5:33pm

Thank you! What model is it exactly that the board uses?