Zero-shot TimesFM on Foundry: Model Adapter + Live Inference pattern — and three questions

Shipped this last week and figured I’d share the shape, because honestly I couldn’t find anyone on here who had done TimesFM on Foundry yet and I’d rather compare notes than keep guessing alone. If someone has a cleaner version of this I’d genuinely like to hear it.

One note before the code — yes, Model Studio already has a native time-series forecasting trainer (AutoGluon under the hood, ensembles AutoETS, DeepAR, PatchTST, TFT, TiDE, Theta). If you have labeled history per series, start there. This post is for the neighbouring case: cold-start series, brand-new sources, no labeled history yet — where supervised per-series training doesn’t really apply and a zero-shot foundation model can carry you until you’ve accumulated enough training data to go back to Model Studio.

The shape, short version: checkpoint as a Foundry dataset → Model Adapter → Live Deployment → named Function → Workshop widget. No fine-tuning. Bence TimesFM is good enough zero-shot for daily-granularity forecasting at the kind of scale I was handing it.

On air-gapped tenants you cannot load the checkpoint directly from google/timesfm-1.0-200m-pytorch on HuggingFace — no egress. The sanctioned path is to pull the weights externally, ingest them as a schema-less dataset, and copy them into the driver inside the adapter. That way the adapter has no outbound network dependency at live-deployment time:

import palantir_models as pm
from timesfm import TimesFm, TimesFmHparams
from huggingface_adapters.utils import copy_model_to_driver

class TimesFMAdapter(pm.ModelAdapter):
    @pm.auto_serialize(
        weights=pm.ModelInput.Dataset("timesfm_1_0_200m_pytorch"),
    )
    def __init__(self, weights, hparams: TimesFmHparams):
        local_ckpt = copy_model_to_driver(weights)
        self._model = TimesFm(hparams=hparams)
        self._model.load_from_checkpoint(local_ckpt)

    @pm.api(input=pm.Pandas(columns=["series_id", "ds", "y"]),
            output=pm.Pandas(columns=["series_id", "ds", "yhat"]))
    def predict(self, df, *, horizon: int = 28, freq: int = 1):
        forecasts = self._model.forecast(
            inputs=[g["y"].tolist() for _, g in df.groupby("series_id")],
            freq=[freq] * df["series_id"].nunique(),
            horizon_len=horizon,
        )
        # flatten to the output schema; elided for brevity
        ...

The publishing transform on top is a small ModelOutput write, and the named Function wrapper is a few lines — takes (series_id, horizon) so Workshop or an AIP Agent can call it without knowing the adapter is under there.

For rough scale — on a modest CPU profile (single-digit vCPUs, 16 GB memory, on-prem), a single 512-point daily series at horizon 28 comes back in the low-seconds range, and cold start lands somewhere in the low double-digits of seconds. GPU profile I haven’t tried yet, which is actually my first question back.

So, three things I’d really like the community’s take on:

  1. GPU profile — has anyone published TimesFM on a GPU live deployment? What cold-start vs. warm-latency shape are you seeing there? Bence bu modelde GPU’nun payı büyük ama elle ölçmüş birini görmedim.
  2. Chaining adapters — is there a clean way to chain a TimesFMAdapter with, say, a downstream anomaly-scoring adapter inside a single Live Deployment? Or is the accepted idiom two deployments wired through a Function?
  3. When zero-shot stops being enough — on long-tail or highly seasonal series, TimesFM drifts a bit for me. At what point would you cut over from zero-shot TimesFM to a supervised Model Studio run, assuming you’ve accumulated enough labeled history by then? I’d like to hear where people draw that line in practice.

Happy to paste the full adapter + publish transform if it’d help anyone.

Regards,
Birol

1 Like