I’m training machine learning models segmented by an entity key (id).
At the moment, the scale is manageable, but the long-term plan would result in thousands (and potentially more) entity-specific models.
My current approach involves:
-
Training models per
id -
Using MLflow for experiment tracking
-
Publishing models via
palantir_models(ModelOutput.publish) -
Running this logic inside a Spark
groupBy().applyInPandas()workflow
During implementation, I’ve encountered several limitations:
-
ModelOutput.create_experiment()andpublish()must run on the driver, not executors -
Preview may succeed, but builds fail due to transactional or API constraints
- What is the practical upper bound (order of magnitude) for independently published models in Foundry that you’ve seen work reliably?
- Are there recommended architectural patterns for handling thousands of heterogeneous entities without publishing thousands of models?
- Are there Foundry-native patterns or references that help decide when to split entities into separate models vs. keeping them unified?
Thank you!