I want to execute an API call (not AIP/not an LLM/not a model) as part of a streaming pipeline.
This would for example be an API to resolve an adress to a geospatial point, with a third party service.
How can I do this ? Is it possible ?
I want to execute an API call (not AIP/not an LLM/not a model) as part of a streaming pipeline.
This would for example be an API to resolve an adress to a geospatial point, with a third party service.
How can I do this ? Is it possible ?
Did you ever figure this out @VincentF ?
Yes, you can use a UDF to hit an external api as part of a stream.
But this must be in Java? And can it use the ExternalSource?
Currently, you can only write streaming UDFs in Java. You can also execute python functions from a streaming pipeline, however, queries are not yet supported in python functions.
I believe external sources are supported in streaming UDFs.
Do you have an example of how to do that? This is my use case / problem:
Do you think this is the right to solve this problem?
The biggest question here would be what are the latency requirements? If the latency requirement is upwards of 10 mins, it might make more sense to do this as an incremental batch transform (python or java).
For other tradeoff considerations, I’d take a look at the docs on streaming vs batch.
Fair question, but I would like the latency to be <1 min if possible. The faster we get data, the better our adoption and usage metrics.
Do you have an example of what this might look like? Perhaps @jeg has one.