Do streaming pipelines support sliding windows based on time instead of event count?

Currently, sliding windows in a streaming pipeline assign elements to fixed windows based on a static count. How can they configured to assign elements to overlapping windows based on a period of time, for example, 10 minutes?

1 Like

Currently there is no built-in window type that orchestrates a sliding event time window, where rows expire one at a time as the watermark (operator’s sense of the current event time) passes the row’s event time plus the 10 minute expiry.

You may be able to achieve the desired behavior using a session window. By setting a session gap equal to your desired retention length 10 minutes, you are guaranteed each session window contains at least all the rows for each key that occurred within the last 10 minutes of event time, and possibly some rows older than your desired retention. Then you can filter / modify the aggregated output to only contain the parts that occurred within the last 10 minutes of event time.

This solution involves storing more state than you require, which can have performance implications. It is important to keep state size in mind, especially if you anticipate rows with the same key will rarely have a gap larger than 10 minutes, in which case session windows will rarely close and will continue building larger and larger state.

2 Likes