OSDK API efficiency advice needed

I am trying to create a large number of Ontology objects (>50,000) from an OSDK application. My experience was that it takes ~5 minutes to create 5000 objects. I am batching my requests, but it is still very slow. Does anybody have any advice how to make my workflow more efficient?

Are you making 5000 individual requests (1 object created per action call) for creating each of the objects? If this is the case, you could modify your action to create multiple objects, in such a way that the 5000 objects are created from a single action call. Let me know if you need some help setting this up.

No. I create 5000 objects in one batch call.

Is the action configured to only write one object at a time, and you are making a batch apply action request?

I am not sure how I can be clearer. One network request contains 5000 objects using batch update.

No I am not creating the objects one by one.

I am creating of an array of 5000 objects as the payload of a batch operation, and repeat this operation as many times as necessary. I have 50,000 objecte, thus I need to make 10 batch calls.

Sorry for asking this again, but this distinction can significantly impact performance optimization, so I appreciate you taking the time to clarify.

To help provide the most accurate guidance for optimizing your calls, could you help me understand your current setup?

Setup A

  • You have an action that is configured in Ontology Manager to write a single object
  • From your code, you are using the batch endpoints to apply this action in a batch size of 5000.
    • Apply Action Batch endpoint
    • You would see this in the OSDK as the following for different languages:
      • Typescript: batchApplyAction()
      • Python: client.ontology.batch_action
      • Java: applyBatch

Setup B

  • You have an action that is configured in Ontology Manager to write multiple objects
    • Receives an ObjectSet or List of some kind and iterates over it to create an object for each record in a single action call.
  • From your code, you are NOT using the batch endpoints (listed above). Instead, you are using the normal apply action endpoint.
    • Apply Action Endpoint
    • You would see this in the OSDK as the following for different languages:
      • Typescript: applyAction()
      • Python: client.ontology.actions
      • Java: apply

Could you let me know which method you’re currently implementing? If you are using Setup A, in your use case I would suggest implementing Setup B. Setup A, from your user code is one single request (per batch) but behind the scenes 5000 actions are still being applied, which can take its time. Setup B, would result in one network call and in a single action call to be applied (for the whole thing or per batch), that should be considerably faster.

I looked into the documentation that you linked in your detailed response. I think the main question I have is how do you define a “Create Action” that accepts an array of objects.

Thank you for your patience, Rob. I understand your question now. I have Option A implemented. I did not even know that Option B is possible. Hence my question here, so that I can learn. I would appreciate if you directed me to some documentation how Option B is done.

Thank you

You will need to make your action be backed by a function (Docs) (Docs).

The function would look something like this:

import { OntologyEditFunction, Integer, Timestamp } from "@foundry/functions-api";
import { Uuid } from "@foundry/functions-utils";

import { Objects } from "@foundry/ontology-api";

interface StructFields {
    a: Integer
    b: string
    c: Timestamp
}

export class MyFunctions {
    @OntologyEditFunction()
    public async exampleFunction(array: StructFields[]): Promise<void> {
        array.forEach((element) => {
            const pk = Uuid.random()
            const obj = Objects.create().rcrespoCrop(pk)
            obj.createdAt = element.c
            obj.optimalPhMax = element.a
            // ...
        })
    }
}

Then the corresponding action setup in ontology manager would look something like this:

Here are some other relevant docs with limits around this.

Rob, I do appreciate you trying to help me!!!

But, this is incredibly confusing.

  • but The applyBatch endpoint accepts an array and it makes one single network call with a large payload
  • The apply endpoint only accepts an object as a payload and it makes thousands of network calls.

This is exactly the opposite how you described Setup A and B.

In any case, this discussion is moot, since the limits of creating Ontology objects with these methods are very limited. We need to create tens of thousands Ontology objects in real time while the user is using our applications.

If anyone have any suggestions how to achieve that in a resonable time, please let me know, Pretty desperate here.

Hello!
In these cases, the optimization should start with the action and not the OSDK Application.

As Rob mentioned, it is key to have defined an action that accepts an array of objects as an input. This is done by a function like the one shown in the previous message. You can also look at the batch execution Docs.

The limit of the size of the batch is 10,000 (I think). A normal function that creates objects, even a lot of them should take ~2-3s. Hence even for 50,000 objects we are still talking of seconds and not minutes.

Thus, the first step is configuring a function-backed action that accepts a batch as an input. Do we have this?

There’s another issue of how to actually call this Action from a a Workshop module. I cannot call the actual batch function from a button click, because I have to prepare my batches of 10,000 objects and then that function needs to call the batch function multiple times in order to create more then 10,000 objects. This didn’t seem to work.

Thanks, Aarenas. I do have an issue with the batch submssion. Here’s my function-backed action defined:

As soon as I click the Batch execution switch, the input arguments change to the individual parameters

What am I doing wrong?

You’re doing nothing wrong!
Can you try now to apply your action in batch, as you were doing at the beginning? You should see a great performance improvement.

That is a bold claim. Do you have any examples of that to back it up?

aarenas,

I cannot perform the action in batch mode now, because the action needs three parameters (name, count, created) instead of the payload array.