How do I make sure my primary keys are unique?

When creating an object, I want to make sure any further data added to this object meets the primary key uniqueness criteria. How do I set such configuration so that any added object meets the pk condition.

Hey Rebecca,

Let’s break this down by different ways to add new objects

  1. Adding a new object using an ontology actions

    • In this case, a [Actions] ObjectsAlreadyExist error will be thrown if you try to create a new object with a duplicate primary key. So we take care of preventing duplicates for you here.
  2. Adding a new object from a dataset build

    • If you are using OSv2 (Object Storage v2), if new rows are added to your dataset, and it results in duplicate values for the column being used as the primary key, then this will cause an indexing failure so the duplicate objects will not show up (source). However, this also means that other valid new dataset updates also does not show up - which you probably do not want. If you are using OSv1, no indexing failure will occur. Instead, a random row will be picked and this might change between syncs. Again, you probably don’t want this.
    • To prevent, duplicate primary keys during builds regardless of the object storage version being used, you can set up a data expectation that will fail if there are duplicate primary keys in your dataset-build. You can do this in pipeline builder or in code repositories, depending on where you build your object-backing dataset.
  3. Forcing Uniqueness

    • If what you want to do is force that every new row coming from your data source and every new object created by an action has a unique primary key, one way to do this is to use a UUID (Universal Unique Identifier). You can use an automatically generated UUID as the primary key value for every new object created or for every new row of data coming in from your data source. This is not guaranteed to be unique but the probability of a duplicate in sufficiently low for most purposes. However, be cautious when using these in dataset builds because the UUIDs might change between builds
1 Like

I’m also facing this issue, and it is a definite requirement to have integer PKs. When the Actions submit to the Ontology that are powered by Functions fails because of the ObjectAlreadyExists Error, does the Ontology return this error to the Action that performed it so we can do a retry in the code by add a random number to the PK. We want to know if the Action can pick the error returned by Ontology after the Submission.

Hello @Akash what do you mean with „definite required that pk is integer“? Where is this requirement coming from on your end?
To @tgobindram point 3. Enforce uniqueness: in my opinion primary keys should be build meaningful wherever possible. Meaningful could be a strconcat of multiple fields like material_id | sales _doc_no | salesdocitem etc. By building a thought-through unique pk you also ensure that you did some level of exploring and sanity checking your data at hand.
There are good reasons to use Auto/random generated UUIDs, but they can also be dangerous and costly and should only be used if really necessary (esp. in large dataset builds through pipeline). A few watchouts they come with:

  • random ids change with every pipeline build. This will also retrigger unnecessary re-indexing your object types (trust me I once almost fell off my chair realizing how much compute some object types with „lazy“ pks produced :man_facepalming:t3:)
  • you make your life easier linking object types with thought-through pk-fks
1 Like

@Phil-M so the application we are working is a legacy app and we are migrating it to Foundry. It was an OLTP system before, and the PO requires it to be same as well in Foundry. They dont wanna change the PK as there are lot too many downstream implications

Were the PKs in the legacy system monotonically increasing? I.e., the very first one was 00001 (or similar), then 00002, … etc.?

Yes, it was monotonically increasing. started with 1 now we’re at 2.6MM… every year we have about 150K new records, and everyday 400 to 500 new objects would be created on our new
Foundry platform based on the past trends, these need to be monotonically increasing too

The monotonically increasing bit can be done via a TypeScript function and I’ve pasted below an example implementation, but there’s a very important caveat to keep in mind. There are only 2 ways new objects can be added to an existing Object Type (akin to new rows showing up in the target table):

  1. Upstream data from external system has new rows, so they flow into Foundry and once they make it to the backing dataset, they become new objects
  2. via “Create” actions on the Ontology that are executed by users

The below function code (after you swap placeholders with correct API names) will work if in your migrated workflow, there is NO new data coming from method 1 (aka no more new rows are added in the external system).

You can use the below code to author and publish a TypeScript function. Then you can create a “Create” Action Type in the Ontology that is backed by said published TS function.

import { Edits, OntologyEditFunction } from "@foundry/functions-api";
import { Objects, RelevantObjectTypeApiName } from "@foundry/ontology-api";

export class MyFunctions {

    @Edits(RelevantObjectTypeApiName)
    @OntologyEditFunction()
    public addRecord(
        // list any relevant input arguments as parameters, example below
        creatorMpId: string,
    ): void {
        // Step 1: Figure out total number of records that exist -- this assumes if there are 2,600,000 records, the most recent record has an ID of 2600000
        const currentCount = Objects.search().relevantObjectTypeApiName().all().length;
        const newRecordId = currentCount + 1;

        // Step 2: Create/instantiate new object
        const newObject = Objects.create().relevantObjectTypeApiName(newRecordId);

        // Step 3: Provide initial non-null values for desired properties
        // Example -- assumes existence of "createdBy" property
        newObject.createdBy = creatorMpId;
    }

}

Hey Josh, we’re already using a code something pretty similar to this. When multiple users are creating the objects at the same time, we’re encountering ObjectAlreadyExists error as concurrent PKs are generated if we do this

Hey,

That makes sense – the Ontology doesn’t apply any mutex lock mechanism or queueing system that prevents this from happening.

While not a perfect solution, a workaround that may help is a refactor of the code where you wrap the object creation and property initialization steps (Steps 2-3 in above code snipper) in a try-catch block. Inside the catch block, you can simply call the same function recursively. If the code ever enters that block, the next time it calculates the currentCount variable, it should reflect the new object(s) that were created concurrently by other users. Naturally, you’ll need to refactor the logic such that this doesn’t run on an infinite loop (i.e., a counter variable that starts at 0, and increments whenever the code in the catch-block is reached + have a guard clause that simply throws a user facing error if max retries have been attempted).

Otherwise, I don’t think there’s any way to guarantee unique primary keys given the constraint of keeping monotonically increasing primary key values within OSv2 (if someone else knows, I’d love to know how!).

1 Like

Two questions:

  • Is the requirement that the incremental key is present instant or would some short delay of 10-60s be allowed?
  • Does the integer key truely have to be the primary key of your object type or would it be sufficient if you just have a property in your OT representing a unique incrementally increasing value

I am thinking in the direction of automation. Basically my idea would be that:

  • your action creates a new object with a random uuid as primary key
  • the create event Triggers an automation „calculate incremental key“. In there you run your ts function to derive the inc. value - not as the pk, just as a property (And potentially create your new links for any link table since your object pk is actually not the incremental key - which might be the key in other tables…?)
  • Afaik you can set executions to be batched based on a specific duration of objects being created in that time range (not 100% sure here)
  • On automations you can set failure retries. So even if it would happen that you had an unlucky event, the automation would retry N times.

In my understanding, the Function backed Actions picks all the edits from the run time and send it to ontology. Ontology then submits this data to OSV2. Just need a confirmation that Ontology returns such errors to the function that calls it. Asking this because, I tried creating objects that already have a duplicate PK from the Code repo testing and it didn’t return any error

Hey sorry, I’m a bit confused here – when you test a function you’ve authored in a TypeScript Code Repository using the Functions > Live Preview or Published tabs at the bottom, the “results” that get returned are just simulations of what Ontology edits will occur if used in a function-backed Action (actual objects are not created or modified or deleted). When you hit “Test”, it will execute your function code by pulling the state of the Ontology at that exact moment/runtime.

If your code includes a try-catch block around the object creation code (i.e., Objects.create().objectTypeApiName(primary_key_value), etc.), and include UserFacingError code in the catch-block, then the functions testing feature at the bottom will also show that user facing error.