Best practices for managing thousands of entity-specific models in Palantir Foundry

I’m training machine learning models segmented by an entity key (id).
At the moment, the scale is manageable, but the long-term plan would result in thousands (and potentially more) entity-specific models.

My current approach involves:

  • Training models per id

  • Using MLflow for experiment tracking

  • Publishing models via palantir_models (ModelOutput.publish)

  • Running this logic inside a Spark groupBy().applyInPandas() workflow

During implementation, I’ve encountered several limitations:

  • ModelOutput.create_experiment() and publish() must run on the driver, not executors

  • Preview may succeed, but builds fail due to transactional or API constraints

  1. What is the practical upper bound (order of magnitude) for independently published models in Foundry that you’ve seen work reliably?
  2. Are there recommended architectural patterns for handling thousands of heterogeneous entities without publishing thousands of models?
  3. Are there Foundry-native patterns or references that help decide when to split entities into separate models vs. keeping them unified?

Thank you!

2 Likes