I noticed that materialized dataset of an ontology gets built even if there are no changes in ontology objects. It is happening to all the ontologies. This would effect the downstream builds if schedules are set on materialized dataset update. Can someone explain the reason why this is happening?
Hi! We sometimes perform maintenance tasks that can involve replacing pipelines, upon which the materialization rebuilds even though there is no data change. That said, we try to not rebuild it whenever we can safely detect that there is no update, so I wouldn’t expect this too happen too often. How frequently are you seeing this happen?
Each time when I create an object it would build the materialized dataset(which is expected). After sometime it would build again even if there are no updates. Previously the __patch_offset used to change from a random number to -1 during this unexpected build. But now I guess the patch offset doesn’t change but still it does that unnecessary build after each expected build
Update: I analyzed the materialized dataset before and after this unexpected build using contour and I don’t see any value getting changed. And this unexpected build happens each time a user creates an ontology object. This is causing the downstream workflows to trigger even if there isn’t any user request from UI. Would like to know more on this?
Does user create this object using edits? If there is a new object created via edits then it’s expected for this dataset to rebuild because this materialization also reflects data on edits.
I understand that this dataset gets built each time there is an edit. But the problem is the builds that happen post each edit which doesn’t account for any change in objects.
Can you clarify what you mean by this? What do these edits do if they don’t change the objects?
If you mean that these edits only create new objects rather than changing existing objects, then that is still a change to an object type and the materialization needs to rebuild to reflect the new objects.
Sorry for the confusion. When I say edits, it includes,
-
User creating an object using an action
-
User modifying existing objects using an action
-
User deletes objects using action
I expect the materialized dataset updates for any of the above things happen plus any changes in backing dataset because this dataset represents the current state of objects. But the problem is I see some unexpected builds which are happening and I don’t see any difference before and after this unexpected build when I analyzed in contour.
To replicate this create an ontology object using an action. You will see the materialized dataset getting built within few minutes(which is expected). But you will also see another build in few more minutes/hours which is unexpected. And when you compare the state of the dataset post each build you won’t see any difference
I see. I suspect this is because we download new edits into the “Merge changes” dataset in the OSv2 pipeline. When you submit an edits, it stays in a service database so that the changes can be reflected immediately (for example in materialized dataset). Eventually, when there have been enough edits accumulated or when enough time has passed (default is 6 hours), we rebuild the “Merge changes” dataset to download the patches and clear them from service database. When this happens, the materialized dataset will be rebuilt even though there is no change.