we are going through cost optimization activities in some of our applications and I would need expert opinions on the pro/cons between using search-arounds vs. applying filters.
Background:
We have a object type - let’s call it “main_table” for simplicity - that consolidates a ton of aggregated information for a set of specific hierarchies. We use this main table in a pivot to let users drill into their numbers and “click where they want to do something”. The size per cycle is ~ 100k
In addition we have many linked object types, that hold the raw values used to build Value A, Value B etc. They always follow a similar structure. They hold their individual pk, but always the dimension_key finding its match in the main table. The size varies, but can go up into the millions.
Since we have a many-to-many relationship between main_table and all other OTs we have a link table called “dimenions” in between that holds approx ~30k dimension keys
Now my question, how would you design the variable flow to be most cost-efficient from main_table to any other. The user flow is the following
click in the main_table backed pivot on a set dimenions they want to analyze or perform an action
see details from various linked OTs as per the selected dimensions in deep-dive sections
perform their action that re-calculates the main_table to immediately show the impact of their change
I see two options (with two sub-options in 2):
Either I use search-arounds always doing two hops (main_table → dimensions → any other OT)
get the unique list of selected dimension Keys and filter any other directly on the dimension key. The list of dimension keys I would retrieve either through:
perform distinct array value object set aggregation on the per-click filtered object set
get directly the selected dimension key values by embedding an array variable in the main_table filter variable
Since the users perform this click-and-analyze a multiple thousand times a day I am wondering what the community thoughts are on the pros/cons?
One concrete question that came to my mind on this is the scalability. The option to retrieve unique values array via the distinct values object set aggregation has definitely a scalability limit of max 10k. I have the gut-feeling this might be pretty costly as well.
But what about the 2nd case where you embedd the array variable within the object set filter variable? Does this generate the list of selected values in a more performant and scalable way?
I’ll definitely check if there is a scalability limit and run an experiment by checking the performance profiler for execution time comparison. But would still love to hear thoughts from the experts!
I can’t comment on a cost perspective, but I have to believe doing the ‘filter for pk to search in next ontology object as foreign key’ defeats the purpose of the searcharound method. I would have to believe that method has optimizations that are abstracted away, otherwise, why implement searcharound in the first place?
That being said, assuming a typescript function is doing this for something like workshop I have struggled a bit with filtering.
on your option #2 from my experience
I find that doing filter, while it loads less objects, you’re still going to pull the initial objects into memory. In the searcharound that first object never gets pulled into memory and instead the ‘final object’ is the only thing finally called into the function. That to me makes it “feel more efficient” - but this is a totally vibes based opinion.
Hi @chlor8. I 100% agree with your point that filtering defeats the purpose/vision behind the ontology. To me search-arounds/elastic search is a beautiful concept and I would never again want to miss it as it is very convenient to be used when building applications.
The point I struggle with is that I see the costs our applications are generating during runtime. And I can only imagine that a powerful engine like elastic search will come with a price - especially when you have to do more hops then theoretically necessary. Since the Ressource Manager is only allocating them to the workshop app itself, we have to get very creative to run try&error experiments to understand where costs come from. Hence, my hope is that we have some experts here in the community who can enlighten us all a bit from the theorety side of things
Back to original topic, on proposal 2b I had new learnings that updating embedded variables is actually not supported with filter variables used in pivot widgets - which I find very unfortunate
So it is 2-hop-search around vs array object set aggregation + filter.
Since the object set aggregation has its scalability limit, it is anyways not applicable across the board. But still maybe there are cases where one or the other way is more applicable.