Hello,
Working for a while on a versatile intelligent document processing pipeline that will allow to ingest large amount of unstructured documents until objects, I would like to exchange with the community that might have similar use-cases.
Of course, AIP tools are a possible way but processing cost could reach high amount so I will not really detail this one below.
An equivalent to AIP that would require lower compute would be the implementation of projects like Docling or small VLM (Qwen 2B, InternVL, SmolVLM).
Docling
My pipeline today is working via 2 inputs:
- A batch part where documents are ingested, rotated and clustered per similarities (Purchase Orders, Good receipts, Invoice, etc…)
- A Template declaration part - where user will declare a set of documents that they want to ingest. This will include creation a JSON schema for each template as well as assigning the template to the right cluster.
After that, both the expected schema and the individual document are given to the model to process and perform key-value pair / KIE. From my point of view, this ensure a low compute, versatile tool across document type and consistent output that can be transformed as objects.
Today, I am only using a small VLM (in a model asset) for now but Docling could be an interesting way forward thanks to the Doctag output format.
Looking forward if anyone is working on similar idea,
Note: I could not find a suitable tag.