How can I pass a base64 encoded image as part of AIP Orchestrator to call a Vision model?

VincentF · June 20, 2024, 4:53pm

In this example of the documentation, I can use an AIP orchestrator to call at high pace a GPT Vision model, for instance to query pictures that are stored in a mediaset.

However, I don’t have a mediaset in my case, but I have a b64 string which represented encoded pictures, as a column of my input dataset.

Is it possible to tweak this

    answered = completions.withColumn(
        ctx,
        df,
        [
            MultimediaPromptComponent(["system_prompt"], ChatMessageRole.SYSTEM),
            MultimediaPromptComponent([ImagePromptComponent(pngs, "mediaItemRid")]),
        ],
        "llm_answer",
    )

To something like

# ...
MultimediaPromptComponent([ImagePromptComponent(myb64col, "base64encoded")]),
# ...

What is the valid code to pass my b64 encoded images to the orchestrator ?

otacruta · July 19, 2024, 2:22pm

Hey Vincent,

Unfortunately passing in the b64 string isn’t available at the moment. Was there a reason you don’t have the image in a MediaSet? Would be curious to understand your usecase a bit better.

Adrian

VincentF · July 29, 2024, 11:40am

Those pictures are ingested as a b64 - and I could see other usecases: pictures edited live in the code, pictures extracted from PDFs, …

otacruta · July 30, 2024, 2:05pm

They are ingested into a dataset? I think in most cases, it makes most sense to store any images in foundry in a MediaSet primitive. Which is what is supported by the orchestrator. In the case of base64 images that are ingested (presumably png/jpg- the vision models usually support only a limited number of file types from what I have seen) it it pretty simple to ingest those directly as MediaSets or to convert the ingested files into a MediaSet.

In the case of live interactions, transforms tend not to be the location where those occur, that would be more a functions context