10000 char limit from Streaming pipeline to UDF call

nirdesh · July 10, 2025, 8:21am

Hi,

We are calling a Python UDF from Streaming pipeline. The output from pipeline is json which has approximately 18000 chars and it goes to the UDF where it calls the json.loads(data) method which is a standard json loads method.

The UDF fails as it does not get the 18000 chars but rather gets only 10k chars and three trailing dots at the end (trimming).

Because of this trim the json is invalid and the JSON.Loads method fails on preview itself.

Error:
{“success”: false, “message_id”: “f107-4a4e-8bcc-xyz”, “errors”: “Unable to enrich data with correlation id f107-xyz error: Expecting value: line 1 column 10001 (char 10000)”}

joe · July 10, 2025, 2:39pm

Not sure about the character limit (checking). In the meantime could you use a more compact format than json?

david · July 10, 2025, 2:41pm

Hello! I’d like to confirm your set up here: you have a streaming pipeline and a python udf which you are using as a transform in the pipeline. You are seeing the input to the python udf actually come across as a truncated json string instead of the full string, and so of course the python udf cannot load the json. Is that accurate?

If so, is this happening for previews as well as builds, or just one or the other?

nirdesh · July 10, 2025, 6:03pm

Yes, you got it right. I could confirm that it is working fine in the build but the preview is failing to load any successful data with this limitation. It is only on streamed pipeline sending the call to UDF to load json and JSON has 18000 chars.
When preview delegates the call to UDF, it sends trimmed json and hence wrong output.

The same preview works fine when tried from a pipeline with batched dataset which is causing another confusion that it could be a build problem.