Accessing PDF via URL in Foundry (works in browser, fails in webhook/pipeline)

Hi everyone,

I’m trying to retrieve a PDF from an internal endpoint:

Current behavior

  • Works in browser (VPN) and via a button in Workshop

  • Each row in the object table will have its own pdf file that opens through a URL

  • Fails when called from Foundry (webhook/pipeline) → 400 / 503 errors

Goal

  • Fetch the PDF

  • Store it in Foundry (media set)

  • Extract text using pdfTextExtractionV1 / pdfOcrV1

What I tried

  • Webhook with OAuth2 → failed (auth / unreachable)

  • Pipeline ingestion → no HTTP/REST connector available


Question

Is this expected due to network/auth restrictions (browser vs backend)?
What’s the recommended way to ingest and process PDFs from such endpoints?

hello, you might want to try using a third party application via developer console, and then utilizing the public-api endpoints:

add a media item to a media set: via putMediaItem

https://www.palantir.com/docs/foundry/api/v2/media-sets-v2-resources/media-sets/put-media-item/

read the media item using: readMediaItem

https://www.palantir.com/docs/foundry/api/v2/media-sets-v2-resources/media-sets/read-media-item/

transform the media item (text extraction) using transformMediaItem

https://www.palantir.com/docs/foundry/api/v2/media-sets-v2-resources/media-sets/transform-media-item/

poll for transformation status using getTransformationJobStatus

https://www.palantir.com/docs/foundry/api/v2/media-sets-v2-resources/media-sets/get-transformation-job-status/

then retrieve the transformation result using getTransformationResult

https://www.palantir.com/docs/foundry/api/v2/media-sets-v2-resources/media-sets/get-transformation-job-result/

hope that helps.