In this example of the documentation, I can use an AIP orchestrator to call at high pace a GPT Vision model, for instance to query pictures that are stored in a mediaset.
However, I don’t have a mediaset in my case, but I have a b64 string which represented encoded pictures, as a column of my input dataset.
Is it possible to tweak this
answered = completions.withColumn(
ctx,
df,
[
MultimediaPromptComponent(["system_prompt"], ChatMessageRole.SYSTEM),
MultimediaPromptComponent([ImagePromptComponent(pngs, "mediaItemRid")]),
],
"llm_answer",
)
To something like
# ...
MultimediaPromptComponent([ImagePromptComponent(myb64col, "base64encoded")]),
# ...
What is the valid code to pass my b64 encoded images to the orchestrator ?
Hey Vincent,
Unfortunately passing in the b64 string isn’t available at the moment. Was there a reason you don’t have the image in a MediaSet? Would be curious to understand your usecase a bit better.
Adrian
Those pictures are ingested as a b64 - and I could see other usecases: pictures edited live in the code, pictures extracted from PDFs, …
They are ingested into a dataset? I think in most cases, it makes most sense to store any images in foundry in a MediaSet primitive. Which is what is supported by the orchestrator. In the case of base64 images that are ingested (presumably png/jpg- the vision models usually support only a limited number of file types from what I have seen) it it pretty simple to ingest those directly as MediaSets or to convert the ingested files into a MediaSet.
In the case of live interactions, transforms tend not to be the location where those occur, that would be more a functions context