Strange output from Ollama LLM

Hi everyone,

I am interested in coding an application that utilizes the Llama 3.3.70b Instruct model. However, I am getting strange output when I run it in a Juptyer code workspace and python transforms repository. Here is my python Jupyter notebook code:

from palantir_models.models import GenericCompletionLanguageModel
from language_model_service_api.languagemodelservice_api_completion_v3 import GenericCompletionRequest

model = GenericCompletionLanguageModel.get("Llama_3_3_70b_Instruct")
prompt = "Why is the sky blue?"
request = GenericCompletionRequest(prompt)
llama_response = model.create_completion(request)
print(f"This is request: {request}")
print()
print(f"This is the response: {llama_response}")

Here is the output:

This is the response: GenericCompletionResponse(completion=’ Why do birds sing? Why do we dream? Why do we have to die? Why do we have to grow old? Why do we have to be born? Why do we have to live? Why do we have to die? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live again? Why do we have to die again? Why do we have to be born again? Why do we have to live’, token_usage=TokenUsage(prompt_tokens=6, completion_tokens=512, max_tokens=56000))

I have tried adjusting the model temperature and various other things, but I keep getting gibberish.

When I run the query in the model catalog, it performs well. I’m confused what is happening.

Any help diagnosing the problem would be great.

Thanks,
Jenny

Hi Jenny :waving_hand:

Wow those are some serious questions the model is giving! Nothing seems to be incorrect with your request from a format standpoint. The request is succeeding but with major hallucinations. I think you are on the right track with temperature adjustment; for reference, Model Catalog uses a temperature of zero and sets maximum tokens to 200. This could be a first step in tryin to improve the outputs in transforms.

Happy to help if you try that and still get similar outputs.

Hi Jim,

Thank you for your response. I tried adjusting the model temperature and max tokens. The output is still super repetitive and sometimes makes no sense. Here are some screenshots.




The GUI also produces gibberish when the model temperature is 0. When the model temperature is 0, we would expect the output to not be this random.

Is it possible that there is a problem with source code that wraps these models? Here is what I found when I searched Google for ‘ollama model giving me gibberish’

  • https://github.com/ggml-org/llama.cpp/discussions/4058
  • https://github.com/ollama/ollama/issues/3819