So I’ve got an object in the Ontology setup with an attachment field - I upload some files via a Workshop. This attachment field materialises as a column in the writeback dataset that has a string like “ri.foundry.main.attachment…”
I want to read these files in a python transform, how do I do that? Is it possible?
Welcome to the Developer Community! What kind of files are you trying to read? For example, the below links provide resources for parsing Excel files uploaded to a dataset:
It’s only possible if you use a third party app as service user and do an API Call to retrieve the bytes of the attachment. That is quite a workaround (manage the TPA, the permissions of the TPA, add it as secret, add the stack url as egress policy…)
We have a feature request open since a while to materialize the bytes in the dataset but it seems it was not prioritized yet…
As a side-note, as I understand that’s not a direct answer to the question, but still relevant.
Stating potentially something obvious first: Attachments =/= media and mediasets
Media in mediaset are having multiple features that should allow for workflows that are similar to attachments-based ones.
Namely:
You can upload media to mediaset from a media uploader widget in Workshop
You can trigger an action on upload in the media uploader widget, which means you can store the media reference to an object property after the upload
You can upload media from Action Forms (in that case, the media will be uploaded to the backing mediaset + you can store the media reference in an object, via a function, see below example)
You can process media via a pipeline
You can process/read/write media via function ( see https://www.palantir.com/docs/foundry/functions/api-media )
Function example
import { Function, OntologyEditFunction, Edits, Integer, MediaItem } from "@foundry/functions-api";
import { Objects, AllMedia } from "@foundry/ontology-api";
//...
@OntologyEditFunction()
@Edits(AllMedia)
public async exampleCreateMediaObject(exampleMedia: MediaItem): Promise<void> {
if (MediaItem.isDocument(exampleMedia)) {
const metadata = await exampleMedia.getMetadataAsync();
const path = metadata.path ? metadata.path : "";
const newMediaObject = Objects.create().allMedia(Date.now().toString().concat(path))
newMediaObject.path = metadata.path + " __ " + metadata.title + " __ " + metadata.author; // Just an example here. Of course the path is only the ".path" property. I was just trying to store more info as an example
newMediaObject.mediaReference = exampleMedia;
}
}
So potentially, a typical workflow might be:
You have an Action that let user upload media (to mediaset and store the mediaset rid)
The media is uploaded to the mediaset and a reference to the media is stored in the object
You have a pipeline kicking off, that process the media (e.g. RAG Pipeline, splitting PDFs into pages, then into chunks, then do some extraction, etc.)
This gets synced in the Ontology (maybe a “chunk” object or so)
Now you can access from Workshop: the base document uploaded (because it’s a media and you have the reference), the chunks (because they were processed by the pipeline, so you can run semantic search, etc.)