Incremental with Use LLM Block

rvora · April 16, 2025, 3:36pm

When skip recomputing is enabled on a UseLLM block, does that mean that if there is no prompt change and the same input is received as a snapshot, the same rows will not recompute?
if that is the case, how does setting the input to be incremental change the nature of the build?

helenq · April 16, 2025, 3:45pm

For your first question, yes if skip recomputing is enabled on the usellm block, then it’ll compute the output once and as long as that output isn’t null or an error, we will cache it for you. Then, when your build runs in the future, any row with the same input column values/parameters will just use the output value from the cache instead of having the LLM recompute it.

From docs:
When Skip recomputing rows is enabled, rows will be compared with previously processed rows based on the columns and parameters passed into the input prompt. Matching rows with the same column and parameter values will get the cached output value without reprocessing in future deployments.

One thing to note is that even if you do change the prompt wording, you can still keep the cache. If you change the input columns however, this will reset the cache.
https://www.palantir.com/docs/foundry/pipeline-builder/pipeline-builder-llm/#skip-computing-already-processed-rows

The difference between just using the cache vs using incremental is for our cache, we dont save any null/errors. However, incremental builds will save those null outputs to your output dataset. This could lead to some inconsistency if you’re expecting some repeated input rows because the llm could output a null value first and then a non null value, and only the non null value would get cached and used for future builds.

Let me know if that all makes sense or if you have any questions!

rvora · April 16, 2025, 3:58pm

Hi Helen,

Thank you that makes a lot of sense! For my specific use case I’m interested in making sure only successful calls are not re-run (since I am using a custom Compute profile and don’t want to waste resources + build time). I want null/errors and any new rows to be run when the output dataset is run as part of a build schedule

From reading the above, it sounds like this behavior will achieved if Skip Recomputing Rows is turned on but each input is a snapshot (so no incremental inputs)?

helenq · April 16, 2025, 3:59pm

Yup you should just use the skip recomputing rows option then

system · May 16, 2025, 4:00pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.