How do I access Attachments (from Objects) in a pipeline?

So I’ve got an object in the Ontology setup with an attachment field - I upload some files via a Workshop. This attachment field materialises as a column in the writeback dataset that has a string like “ri.foundry.main.attachment…”

I want to read these files in a python transform, how do I do that? Is it possible?

Hi @bagonelli,

Welcome to the Developer Community! What kind of files are you trying to read? For example, the below links provide resources for parsing Excel files uploaded to a dataset:

Microsoft Excel • Transforms Excel Parser • Palantir

Code examples • Raw file parsing • Transforms • Palantir

It’s only possible if you use a third party app as service user and do an API Call to retrieve the bytes of the attachment. That is quite a workaround (manage the TPA, the permissions of the TPA, add it as secret, add the stack url as egress policy…)

We have a feature request open since a while to materialize the bytes in the dataset but it seems it was not prioritized yet…

2 Likes

I second that feature request!

1 Like

As a side-note, as I understand that’s not a direct answer to the question, but still relevant.

Stating potentially something obvious first:
Attachments =/= media and mediasets

Media in mediaset are having multiple features that should allow for workflows that are similar to attachments-based ones.
Namely:

  • You can upload media to mediaset from a media uploader widget in Workshop
  • You can trigger an action on upload in the media uploader widget, which means you can store the media reference to an object property after the upload
  • You can upload media from Action Forms (in that case, the media will be uploaded to the backing mediaset + you can store the media reference in an object, via a function, see below example)
  • You can process media via a pipeline
  • You can process/read/write media via function ( see https://www.palantir.com/docs/foundry/functions/api-media )

Function example

import { Function, OntologyEditFunction, Edits, Integer, MediaItem } from "@foundry/functions-api";
import { Objects, AllMedia } from "@foundry/ontology-api";

//...

    @OntologyEditFunction()
    @Edits(AllMedia)
    public async exampleCreateMediaObject(exampleMedia: MediaItem): Promise<void> {
        if (MediaItem.isDocument(exampleMedia)) {
            const metadata = await exampleMedia.getMetadataAsync();
 
            const path = metadata.path ? metadata.path : "";
            const newMediaObject = Objects.create().allMedia(Date.now().toString().concat(path))
            newMediaObject.path = metadata.path + " __ " + metadata.title + " __ " + metadata.author; // Just an example here. Of course the path is only the ".path" property. I was just trying to store more info as an example
            newMediaObject.mediaReference = exampleMedia;
        }
    }

So potentially, a typical workflow might be:

  1. You have an Action that let user upload media (to mediaset and store the mediaset rid)
  2. The media is uploaded to the mediaset and a reference to the media is stored in the object
  3. You have a pipeline kicking off, that process the media (e.g. RAG Pipeline, splitting PDFs into pages, then into chunks, then do some extraction, etc.)
  4. This gets synced in the Ontology (maybe a “chunk” object or so)
  5. Now you can access from Workshop: the base document uploaded (because it’s a media and you have the reference), the chunks (because they were processed by the pipeline, so you can run semantic search, etc.)
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.