I have a bit of a convoluted question regarding streaming data being input through a data transformation in a repository and the latency of this data. I will lay out what my current prototype project structure is and then follow up with my questions. I am just putting a practice project together to ingest MQTT data from a broker into the stream, which in turn is the input for a data transformation in code repository which ultimately outputs to a clean parsed dataset. (In the transformation, I have the input just set as the main RID for the stream) This cleaned dataset is a backing set for a telemetry device object in the ontology. I am not wanting to use the form based pipeline building, but in the case of the stream, it gives you the option of running a streaming pipeline so that you are getting live or near live data transformation for the input stream. I am looking for near minute latency on my data so I will be scheduling my data transformation to build once a minute. The problem that I seem to be encountering is that the latency from the stream itself to the data transformation seems to be much higher as if it is accessing the archival dataset for the stream as opposed to the live stream data.
-
Is this the correct way of going about trying to create a “streaming-ish” pipeline using a code repository as opposed to pipeline builder.
-
what is the latency between a stream and the data transform… should I be using the “RID” or the “View RID” as the input for the data transformation.
-
if I want better latency than just the dataset to be build once per minute, how do I go about creating a live streaming pipeline with a repository, not the pipeline builder.
-
also is there some documentation that you can point me towards on how to process just the latest streaming data and append it to a dataset, so that I am not processing all past entries into the dataset every time the build runs?
I hope that this is relatively understandable, and appreciate all of the communities help in advance.