Hitting Marketplace Limits

henryc · September 2, 2025, 3:47pm

hi everyone,

has anyone try to package repos / projects way beyond the existing market place limits or have experience with slicing up their workflows so its packagable? Currently, we have some goliath workflow that is hitting the marketplace limit beyond what product is capable of supporting in the short term. Please let me know and thank you!

RajKarri · September 2, 2025, 4:55pm

Yes! We did face this issue. We ended up splitting it into a data app, pipeline app, ontology app, code repos app, and AIP agents app. Linked products work really well here. Even though they span multiple products, they’re essentially stacked on top of each other.

henryc · September 2, 2025, 5:54pm

Oh, how about slicing a Code Repo. Say I have a repo with multiple datasets (i.e. 250+) and I want to create two packages of 125 each give or take.

RajKarri · September 2, 2025, 6:07pm

We should segregate the products in such a way that linked dependencies are installed first, followed by the actual products.

I suggest splitting them into Product 1 (with 125 datasets) and Product 2 (with the remaining 125 datasets). Once these two products are installed, they will function as linked products for Product 3 (code repos). During the installation of Product 3, Products 1 and 2 will supply the required inputs.

henryc · September 2, 2025, 6:16pm

I agree with that. The problem I’m running into right now is when Packaging Product 3 (Code Repo) in this scenario, I’m hitting the market place size limitation.

For example, let’s assume I’m working in a vacuum Product 3 (Code Repo), which consists of 300 output datasets, have no further linked product. It also ingest 500 input datasets. I have hit the limit of input shape if I try to package all of this specific Code Repo at once. The current input shape size limit is 10k while the output shape is 5k. Shape is a combination of dataset columns, monitors, schedules, etc.

So then, I need to splice Product 3 into two pieces, 3a and 3b. 3a and 3b will now each consist of the first 150 output datasets and the associated 250 input datasets. This now package successfully.

Curious if there are any experience with
1.) How did you do the initial splicing of the code repo? Currently, the only support is to click through individual output datasets and watch devops fetch the list of input datasets needed. Can we make this easier? say, select individual folders inside Code Repo, that then add all the output datasets of transforms within that folder?

2.) Once we splice it, how do we piece it back together. In the perfect world, sure we successfully splice it and there are no dependencies between each datasets within the same repo. This isn’t always true through.