How do I access Attachments (from Objects) in a pipeline?

bagonelli · December 10, 2024, 3:49pm

So I’ve got an object in the Ontology setup with an attachment field - I upload some files via a Workshop. This attachment field materialises as a column in the writeback dataset that has a string like “ri.foundry.main.attachment…”

I want to read these files in a python transform, how do I do that? Is it possible?

Joel · December 10, 2024, 6:20pm

Hi @bagonelli,

Welcome to the Developer Community! What kind of files are you trying to read? For example, the below links provide resources for parsing Excel files uploaded to a dataset:

Microsoft Excel • Transforms Excel Parser • Palantir

Code examples • Raw file parsing • Transforms • Palantir

nicornk · December 10, 2024, 8:06pm

It’s only possible if you use a third party app as service user and do an API Call to retrieve the bytes of the attachment. That is quite a workaround (manage the TPA, the permissions of the TPA, add it as secret, add the stack url as egress policy…)

We have a feature request open since a while to materialize the bytes in the dataset but it seems it was not prioritized yet…

Flackermann · December 10, 2024, 8:54pm

I second that feature request!

VincentF · December 11, 2024, 8:38am

As a side-note, as I understand that’s not a direct answer to the question, but still relevant.

Stating potentially something obvious first:
Attachments =/= media and mediasets

Media in mediaset are having multiple features that should allow for workflows that are similar to attachments-based ones.
Namely:

You can upload media to mediaset from a media uploader widget in Workshop
You can trigger an action on upload in the media uploader widget, which means you can store the media reference to an object property after the upload
You can upload media from Action Forms (in that case, the media will be uploaded to the backing mediaset + you can store the media reference in an object, via a function, see below example)
You can process media via a pipeline
You can process/read/write media via function ( see https://www.palantir.com/docs/foundry/functions/api-media )

Function example

import { Function, OntologyEditFunction, Edits, Integer, MediaItem } from "@foundry/functions-api";
import { Objects, AllMedia } from "@foundry/ontology-api";

//...

    @OntologyEditFunction()
    @Edits(AllMedia)
    public async exampleCreateMediaObject(exampleMedia: MediaItem): Promise<void> {
        if (MediaItem.isDocument(exampleMedia)) {
            const metadata = await exampleMedia.getMetadataAsync();
 
            const path = metadata.path ? metadata.path : "";
            const newMediaObject = Objects.create().allMedia(Date.now().toString().concat(path))
            newMediaObject.path = metadata.path + " __ " + metadata.title + " __ " + metadata.author; // Just an example here. Of course the path is only the ".path" property. I was just trying to store more info as an example
            newMediaObject.mediaReference = exampleMedia;
        }
    }

So potentially, a typical workflow might be:

You have an Action that let user upload media (to mediaset and store the mediaset rid)
The media is uploaded to the mediaset and a reference to the media is stored in the object
You have a pipeline kicking off, that process the media (e.g. RAG Pipeline, splitting PDFs into pages, then into chunks, then do some extraction, etc.)
This gets synced in the Ontology (maybe a “chunk” object or so)
Now you can access from Workshop: the base document uploaded (because it’s a media and you have the reference), the chunks (because they were processed by the pipeline, so you can run semantic search, etc.)

system · December 25, 2024, 8:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.