Hi everyone,
I uploaded a Docker image containing the docling library to Palantir Foundry using the artifact feature. The image was pulled and uploaded successfully, and I can confirm that the library is present inside the container when I test it locally.
However, I’m not sure how to actually use or import this library in a Python Code Repository in Foundry. I just don’t know the correct approach or configuration to make Python recognize and use the library from the Docker artifact.
I really want import libraries in palantir.
from docling.document_converter import DocumentConverter
source = “https://arxiv.org/pdf/2408.09869”
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
1 Like
Checkout the „Bring your own container section“:
https://palantir.com/docs/foundry/transforms-python/lightweight-examples/
I think you can’t use the Python library directly but you could call the cli.
Looking at docling, why don’t you add it as pypi dependency and use the container to bring the previously loaded offline models?
Section Model prefetching and offline usage
https://docling-project.github.io/docling/usage/
Thank you for your answer.
I’ve already tried installing Docling via PyPI in both the Code Repository and VS Code environments. Although the installation technically succeeded, I encountered unknown errors that prevented proper usage.
This question is specifically about the model prefetching and offline usage option mentioned in the github
The documentation recommends running the following command to prefetch models:
$ docling-tools models download
Downloading layout model...
Downloading tableformer model...
Downloading picture classifier model...
Downloading code formula model...
Downloading easyocr models...
Models downloaded into $HOME/.cache/docling/models.
My question is: where exactly can I run this command in the context of Foundry? Is it possible to execute it from the VS Code terminal connected to the Code Repository, or does it need to be run in a different environment outside of Foundry?
If you need to stay within foundry you could run this in a vscode with two egress policies: one to your own stack and one to the hostname where the models are stored.
I am not sure how your stack is setup but at least in our case it’s significantly easier to run the model download on the local machine and upload the files to a dataset or package them as docker container.
2 Likes
Thank you for your reply.
Could you let me know where the related documents are?
I used SAM (image recognition model) as dataset but never used docker image.
Or, to solve the library issue, could it be the one way to use local VS code using palantir foundry extension?