Recommended workarounds when object set size exceeds the search around limit

theo_bell · July 17, 2024, 10:01am

One of our key object sets is going to be in the millions of objects and we’re running into the search around limits in the ontology. What are the recommended workarounds these days? In the past I would have either made hidden helper objects or just enriched more properties than are strictly necessary on the high volume objects so that you don’t need to search past them to find things.

cbiggar · July 19, 2024, 1:51pm

Hi @theo_bell ,

May I ask how big your expect query result size is?

Our search around limits should be at 10M objects in the result set, although this currently is still somewhat dependent on the query complexity (eg multiple large search-arounds in a single query will eventually hit some limitations too).

If your search around result is expected to exceed 10M objects, then the most direct approach is to reduce the query result size by applying some filters.

In the past I would have either made hidden helper objects or just enriched more properties than are strictly necessary on the high volume objects so that you don’t need to search past them to find things.

This is certainly a common last resort if you can’t refactor your query such that it succeeds.

Additionally these limits don’t exist in long-running spark builds, so it’s possible to materialize an Object Type into a dataset and use Spark as an escape hatch.

adraughon · July 19, 2024, 10:08pm

There’s an old trick you can use in Workshop but it’s a bit hacky. Let’s say you have 80 million ‘Transaction’ objects linked to 10 ‘Region’ objects. For each ‘Region’ object you can hard-code a search around to the ‘Transactions’ and hopefully each individual is less than 10 million (maybe 8 million for example). Then union all 10 Region-Transactions back together to effectively get a searcharound from ‘Region’ to ‘Transactions’ that is above 10 million.
This trick also works with the Ontology API but even better there you can iterate through all objects in set A rather than hard code. I haven’t tried this trick in typescript but I think it would work the same there too.

I guess the design intention is that with huge object sets you will put a huge load on Workshop for instance to dynamically render 10million+ objects and aggregate their average across some variable like date. You called them hidden helper objects but they can be meaningful divisions of extremely similar looking objects with different impacts. For instance you can have 500 million ‘Transactions’ but some are credit card, some are invoice etc. You could (carefully) create these helper objects such as ‘Income’ or ‘Credit Card Transactions’ each less than 10 million alone. Also you don’t have to delete the original ‘Transactions’ set either. It’s kind of like masking where one object can really exist in multiple Object Sets but where each set is intended to be smallish and targeted to a use case.

Another approach (loved by some, hated by others) is to dynamically compute summaries along some meaningful dimension. For instance, summarize ‘Transactions’ for each customer each month and resolve to a ‘Customer Monthly Summary’. Keep some fields as arrays or even dicts if you don’t want to lose data. Again, keep ‘Transactions’ somewhere in the background and link one to the other if you like.