How to optimize time series analysis with foundryTS

Hi all,
I would like to perform a fleet analysis based on timeseries data. For that purpose I will use foundryTS library to perform queries on the timeseries.
I understand that the analysis can be quite heavy in terms of computation time if I want to analyse 100 aircraft that performed 500.000 flights over the last 5 years. That’s the reason why I’m looking for some best practices with respect to timeseries analysis with foundryTS.
Here is my use case: for each flight, I need to extract some features during the take off phase. I need 3 parameters to identify the events during the flight (altitude and engine temperatures), then I want to average 60 parameters over a window of 4 seconds around the identified events.
I tried several strategies on a sample of aircraft before launching the analysis on the complete fleet.

  • First strategy: for each msn and each flight (interval is one flight), I first perform an interpolation at 1Hz to align all the data on the same timestamp, then I apply a udf that identify the event and aggregate the data for the 60 parameters
  • Second strategy: for each msn and each flight (for loop), I perform several foundryTS queries to identify my event (time_series_search) then another query to compute statistics
  • Third strategy: for each msn and each flight (interval is one flight), I build a first query to identify my events (with interpolation and a udf but only for the three parameters required), then I build a second query with all the parameters to compute the statistics on the window of 4 seconds (without interpolation)
    My conclusion so far are the following:
  • First strategy works well but might be a little expensive due to many interpolation (on the 60 parameters I’m interested in) and usage of udf not as optimized as native foundryTS functions (like statistics)
  • Second strategy is very slow because there are too many foundryTS API calls
  • Third strategy is not as good as expected. The first build to detect the events is very fast (a couple of minutes per msn) but the second is desperately long.
    Before I launch the analysis on the whole fleet, I would like to have an expert opinion about the best way to launch such an analysis with foundryTS, how to optimize the code so that it minimizes compute time and ressources.
    Thanks and regards
    Thomas

Hi Thomas,

Thanks for laying out your use case so clearly — the scale and structure of your workload definitely put you in the category where FoundryTS query design matters a lot. Based on what you described, here are a few considerations and patterns that tend to work better for fleet‑level timeseries workloads.

1. Minimize interpolation as much as possible

Interpolating 60 parameters at 1 Hz for every flight is almost always going to dominate compute time. In most FoundryTS pipelines, interpolation is the single most expensive operation because it forces a full resampling of the underlying series.

A common pattern is:

  • Interpolate only the signals required to detect the event

  • Use the event timestamps to perform windowed aggregations on the raw, non‑interpolated parameters

You’re already doing this in Strategy 3, which is good — but the second stage can still be slow if the query is scanning too much data per flight.

2. Push as much logic as possible into native TS functions

UDFs are flexible but significantly slower than:

  • time_series_search

  • time_series_window

  • time_series_aggregate

  • statistics functions

If your event detection logic can be expressed in native functions (thresholds, derivatives, local extrema, etc.), you’ll see a major speedup.

3. Avoid per‑flight loops when possible

The biggest bottleneck in Strategy 2 is the number of API calls. FoundryTS is optimized for vectorized operations, not Python‑side loops.

If you can:

  • Partition by MSN

  • Partition by flight

  • Run one query per MSN instead of one per flight

…you’ll get much better throughput.

4. Consider pre‑materializing the event timestamps

A pattern that scales well is:

Step 1 — Event detection layer

  • Interpolate only the 2–3 required parameters

  • Detect the event

  • Materialize a table of (msn, flight_id, event_timestamp)

Step 2 — Windowed statistics layer

  • Join the event timestamps back to the raw timeseries

  • Use time_series_window to extract the 4‑second window

  • Apply native aggregations

This avoids re‑running event detection logic for every parameter and keeps the heavy lifting inside FoundryTS’s optimized engine.

5. Check your windowing strategy

If your second stage is “desperately long,” it often means:

  • The window query is scanning the entire flight instead of a narrow range

  • The join between event timestamps and timeseries isn’t selective enough

  • The query is not partitioned by flight or MSN

Ensuring the window is applied after filtering to the relevant flight interval can drastically reduce compute.

6. Scale‑out considerations

For a fleet of 100 aircraft and 500k flights, the patterns that usually work best are:

  • One event‑detection query per MSN

  • One window‑aggregation query per MSN

  • Avoiding Python loops

  • Avoiding interpolation on large parameter sets

  • Using native TS functions wherever possible

This tends to give predictable runtimes and avoids the explosion of API calls.

Regards,

Harold.

Hello Harold,

Thank you very much for your feedback. It confirms my intuition that the analysis should be done in two steps: event detection with materialization of the event_timestamps then statistics on the previously identified events. As I said in my original message, the first step works well with a udf and 1Hz interpolation on a limited number of parameters. On the other hand, the second step is very long, probably because the strategy I use to perform window aggregation is not optimized. You mention a time_series_window function in your message. Unfortunatly I cannot find such function in the foundry’s documentation. Currently, I’m using this pattern to compute statistics:

NodeCollection(my_list_of_series)
   .map_intervals_by(my_list_of_intervals, keys=["msn"], interval_name="interval_name")
   .map(FtsF.statistics(include_std_dev=True))
   .with_metadata("msn", "parameter_name")
   .to_dataframe()

Do you think there is a better way to perform the aggregation on the list of intervals? If yes, can you suggest some code improvement?

For information, when I asked Gemini, he told me that there is an apply or transform method that takes as argument aggregation functions and interval lists. It seems the best option for aggregation on irregulat intervals. Unfortunately I cannot find it in my foundryTS distribution (I’m currently using version 0.529.0)

# This is the most efficient 'vectorized' way to do irregular intervals
summarized_nodes = nodes.apply(FtsF.aggregate(FtsF.mean(), plan=intervals_dataset) )

Thanks again for your very kind help on the topic.

Regards

Thomas

Honestly Thomas, two quick things before anything else — FtsF.aggregate(plan=intervals_dataset) is not a real API, Gemini made it up. It’s not in any 0.5.x release I’ve shipped on. And time_series_window isn’t a standalone function either — it’s basically what map_intervals_by already does under the hood. So your current shape, NodeCollection(...).map_intervals_by(...).map(FtsF.statistics(...)), is the right one. You’re not missing an API, you’re missing one knob.

The knob, in my experience: feed map_intervals_by a dataset reference, not a Python list. When the intervals live in a dataset, the TS engine pushes the join down and partitions by msn. With a Python list the driver materializes every interval first and you lose partition pruning — and bence that’s the cliff you’re hitting at fleet scale.

intervals_node = Input("/path/to/event_timestamps").dataframe()  # msn, flight_id, event_ts

stats = (
    NodeCollection(series_list)
    .map_intervals_by(intervals_node, keys=["msn"], interval_name="event_ts")
    .map(FtsF.statistics(include_std_dev=True))
    .with_metadata("msn", "parameter_name")
    .to_dataframe()
)

Two gotchas that bit me at the same kind of scale:

  • Sort the intervals dataset by (msn, event_ts) before you use it. Unsorted, the engine falls back to a full scan per interval and the partition pruning basically doesn’t help.
  • Partition the raw TS dataset by msn at ingest. Otherwise stage 2 re-scans every flight’s worth of points and that alone tends to dominate runtime.

For context — an another tenant order of a hundred MSN with a few thousand flights each and ~60 parameters on narrow windows, the dataset-ref path finished end-to-end in roughly the time the event-detection stage alone had been taking before. The Python-list path wasn’t finishing inside a working day. Order-of-magnitude difference, not a marginal tuning gain.

Regards,
Birol

Hello Birol,

You’re proposal seems great, especially if it allows to perform pruning more effectively.

Unfortunately, my version of foundryTS does not allow to provide a dataframe to map_intervals or map_intervals_by. These nodecollection methods only accept Interval or list of Intervals.

Does it mean that I cannot further improve my code? I’m afraid it might be the case … I will try to contact Palantir support to seek for help.

I’ll let you know if I get an answer.

Thomas

def map_intervals(self, intervals: Union[Interval, List[Interval]], interval_name: str = None) -> "NodeCollection":
        """Creates a time range for all time series in the collection using the intervals.

        Each interval is used to create a :py:func:`foundryts.functions.time_range` on the input time series,
        which can be used for further transformations and analysis. This is best used with creating :py:class:`Interval`
        either manually or by converting the result of :py:func:`foundryts.functions.time_range` to
        :py:class:`Interval`.

        The resulting dataframe has additional columns for :py:meth:`Interval.start` and :py:meth:`Interval.end`.

        Parameters
        ----------
        intervals : Interval | List[Interval]
            One or more intervals to create time ranges for all time series in the collection.
        interval_name : str, optional
            Optional alias for the intervals column in the dataframe.

        Returns
        -------
        Iterable[:py:class:`NodeCollection`]
            The updated :py:class:`NodeCollection` with each item mapped to the corresponding output of applying the
            ``func``.
"""