Model catalog to have code examples for text completion along with Vision examples

Model catalog is a good app where we can get code examples of how to use the LLMs in typescript function or Python transforms.

One of the challenges is with vision enabled models, there is no code samples for how to use text completion. It only has examples on how to use Vision capability.
This might be a easy change for the Palantir team and would enable the developer community a lot.

Hey !
Actually the code snippets you can find in Model Catalog are both for text AND vision !
You can simply remove the vision part, and it will operate on text as well.

Are u sure, here is what I found for Claude 3.5

from transforms.api import transform, Output
from palantir_models.transforms import GenericVisionCompletionLanguageModelInput
from palantir_models.models import GenericVisionCompletionLanguageModel
from transforms.mediasets import MediaSetInput
from language_model_service_api.languagemodelservice_api_completion_v3 import (
    GenericVisionCompletionRequest,
    GenericChatCompletionResponse
)
from pyspark.sql import functions as F
from pyspark.sql.types import StringType
from language_model_service_api.languagemodelservice_api import (
    ChatMessageRole,
    GenericMessageContent,
    GenericMessage,
    MediaSetReference,
    MediaTransformation
)


@transform(
    image_input=MediaSetInput("Media set rid or path"),
    model=GenericVisionCompletionLanguageModelInput("ri.language-model-service..language-model.anthropic-claude-3-5-sonnet"),
    output=Output("Output dataset rid or path"),
)
def compute_generic(ctx, image_input, model: GenericVisionCompletionLanguageModel, output):
    media_set_rid = image_input.get_media_set_rid()
    image_references = image_input.list_media_items_by_path_with_media_reference(ctx)
    prompt = "Describe this image for me."

    def get_llm_response(media_item_rid):
        prompt_content = GenericMessageContent(text=prompt)
        request: GenericVisionCompletionRequest = GenericVisionCompletionRequest([
            GenericMessage(contents=[
                prompt_content,
                GenericMessageContent(
                        media_set_reference=MediaSetReference(
                            media_set_rid=media_set_rid,
                            media_item_rid=media_item_rid,
                            transformation=MediaTransformation.IMAGE_TO_BASE64_STRING
                        )
                    )
            ], role=ChatMessageRole.USER),
        ], max_tokens=200, temperature=0.8)
        response: GenericChatCompletionResponse = model.create_vision_completion(request)
        return response.completion

    get_llm_response_udf = F.udf(get_llm_response, StringType())

    output_df = image_references.withColumn('llm_response', get_llm_response_udf(F.col('mediaItemRid')))

    column_typeclasses = {'mediaReference': [{'kind': 'reference', 'name': 'media_reference'}]}
    output.write_dataframe(output_df, column_typeclasses=column_typeclasses)

Not sure where is the text completion part here in the code.

response: GenericChatCompletionResponse = model.create_vision_completion(request)

Hi @maddyAWS ! Following up, the call to create_vision_completion does not need an image input to successfully execute (i.e. the function can be used for solely text completion) given that model is a vision model. create_vision_completion supports image inputs in addition to its text completion capabilities.

For models with only text completion capability, such as Llama 3.1 70b Instruct, you will see the code snippet generated uses a different function - a call to create_completion.

For vision models such as Claude 3.5, if you are using them for just text completion, you can technically use either create_vision_completion or create_completion. We have made some modifications to conditionally render the recommended function based on the use case’s needs!

We’ve modified the examples on Model Catalog to make this flow more intuitive and better distinguish between these two use cases of “Text Completion” and “Vision”. Users can now toggle between the modes Text and Vision for both Transforms and Functions and receive the relevant code snippet to get started.


1 Like

Thank You for taking the feedback and making the change