Best Practices for Managing Many Datasets in a Marketplace Product

Hi everyone,

I’m facing an issue when trying to publish a Marketplace product that includes a Code Repository and a large number of related Datasets (over 300). I want to package the Code Repository with all its related datasets so that users can easily use the code with the data it was designed for. However, when I select the Code Repository and all associated Datasets for inclusion, I receive an error message stating that the “Marketplace Product exceeds the size limit.”

Has anyone else encountered this issue? If so, what strategies did you use to resolve it?

Specifically, I’m wondering if I’m forced to split the Code Repository and Datasets into separate Marketplace products. If so, what’s the most efficient way to handle this, given the large number of Datasets? Manually splitting them would be very time-consuming and undesirable.

Are there any best practices or alternative approaches for managing large numbers of related Datasets within a Marketplace product? For example:

  • Is there a way to increase the size limit for Marketplace products (even temporarily)?
  • Are there any automated tools or scripts to help split and manage the Datasets?
  • Can I package the datasets in a different way to reduce the overall size?
  • Could I use a manifest file or similar to reference the datasets instead of including them directly?
  • If the Code Repository and Datasets are separate products, what’s the best way to communicate the relationship between them to users?

Any advice would be greatly appreciated!

Thanks in advance.

1 Like

Hi

Best practice here would be to split up the product. You can split up the product by packaging the repository twice and selecting specific datasets in each product, but it’d be much simpler to separate the repo here

Unfortunately there isn’t an easy way around this, other than splitting the product, or reducing the number of datasets that are included as content.

In terms of communicating relationship between them, I’m assuming some datasets in one repository are inputs to datasets in another repository. When you split the product, it will likely create a linked product - see docs here https://www.palantir.com/docs/foundry/marketplace/linked-products