Limits on Vector Embeddings in Ontology

Hey Folks

I’m trying to create a vectorDB of all our historic support tickets, taking all the text from the cases, concatenating and then embedding the text.

The total dataset is around 1.6m rows and approx 20gb(uncompressed) with embeddings.

I can ontologise the dataset without the embeddings just fine, but when adding in the embeddings, the build silently cancels with no log output.

I’m on Dev Tier - am I hitting a limit here? Is there a way I can see what’s failing?

I’m been trying some strategies to get around this - creating an object that just contains the ticket UUID+embeddings then linking but that smaller dataset also fails to build.

Next is splitting out the dataset by product, this would cut down the dataset into 6 datasets or so, each with no more than 500k rows.

1 Like

Hey just a few clarifying questions:

  1. Are you creating this object directly in Pipeline Builder?

  2. What model are you using to create your embeddings? If you’re using a user defined embeddings in the text to embeddings board that currently isn’t supported in the ontology even though you’ll be able to build a dataset with those embeddings

1 Like

Hey Helen,

I’m using Pipeline Builder to build the dataset then trying create the object in Ontology Manager

The embedding model I’m using is open AI text embedding ada 002, one of the default options on the platform

Gotcha, and it sounds like the dataset is failing to build at all (from pipeline builder)? Does it work when you filter it down to a few rows? Or any embedding at all doesn’t build.

How long does the dataset build run and are you getting any build failures? And to sanity check are you deploying and building (not just deploying without build)

I had to jump through some hoops to get the final dataset with embeddings out of Pipeline Builder.

I wanted to do the case text concatenation and embeddings of that text all in a single pipeline, but it’d fail with OOM or Module Unreachable errors even with the larger compute profiles.

So I built it without embeddings, then split the output by year, giving me 10 files.

I then generated embeddings for each years worth of data, each one taking 5-7 hours to build with large and native acceleration compute profiles applied. I then unioned the 10 outputs into a table of 1.6m rows with embeddings before trying and failing to use it to back an object in Ontology Manager.

This morning I’ve split that table into datasets for each of our product lines. The largest of these is ~750k and Ontology Manager has no issues with this. I can create an object with it no problem, takes around 20 minutes to build.

When I try creating an object in Ontology Manager with the total dataset as the backing data it fails either at the changelog step or the indexing step. Diagnostic logs and screenshots:-

{“objectType”:{“currentState”:“funnel-only-indexing”,“uid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“rid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“ontologyBranchRid”:“ri.ontology.main.branch.98d10c18-87f0-466b-b473-61a5381e7ae0”,“defaultBranch”:true,“definition”:{“rid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“ontologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“datasources”:{“ri.ontology.main.datasource.0c5134de-151a-48c6-88d0-103c19f688b8”:“ri.foundry.main.dataset.c5d6c911-5cc2-4fb1-ab1c-9608b3644cb9”},“acceptsPatches”:false,“dbs”:{“ri.highbury.main.cluster.1”:“highbury”},“migrations”:{“transitions”:[],“target”:1},“indexingConfig”:“FunnelOnly”},“activePipeline”:{“type”:“batch”,“batch”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”},“replacementPipeline”:{“type”:“batch”,“batch”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”},“v2”:true},“batchPipelines”:[{“currentState”:“await-datasource-changelogs-ready”,“pipelineId”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogs”:[“a0a66503-568a-4386-afba-46d2a98cf5f0”],“ontologyVersion”:“00000014-a622-6d4a-af73-6af87c9b1e61”},{“currentState”:“await-page-ready-v2”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogs”:[“b9dcf3c3-7f8c-4481-886d-300e62714532”],“ontologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“page”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”}],“changelogs”:[{“currentState”:“up-to-date”,“changelogId”:“b9dcf3c3-7f8c-4481-886d-300e62714532”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“datasourceRid”:“ri.ontology.main.datasource.0c5134de-151a-48c6-88d0-103c19f688b8”,“datasourceLocatorRid”:“ri.foundry.main.dataset.c5d6c911-5cc2-4fb1-ab1c-9608b3644cb9”,“changelog”:“ri.foundry.main.dataset.1247e33d-19aa-45d0-bf1c-0141276f0d6b”,“snapshot”:“ri.foundry.main.dataset.08b32043-b1ae-48c5-8137-f90e592dd0ee”,“awaitingJobRid”:null,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}},{“currentState”:“build-failure-backoff”,“changelogId”:“a0a66503-568a-4386-afba-46d2a98cf5f0”,“pipelineId”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“datasourceRid”:“ri.ontology.main.datasource.ed0c2bdc-72d2-4e4d-a648-d8d68f1c4d19”,“datasourceLocatorRid”:“ri.foundry.main.dataset.09981206-63a8-4129-a8a4-74379d46f011”,“changelog”:“ri.foundry.main.dataset.707f83f0-1f05-48d0-9d77-c237eceb011e”,“snapshot”:“ri.foundry.main.dataset.e9f2c67a-5157-4fa8-a554-b083f830d6e6”,“awaitingJobRid”:null,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“EXTRA_SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}}],“pages”:[{“currentState”:“await-build”,“pageId”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogInputs”:[“b9dcf3c3-7f8c-4481-886d-300e62714532”],“pageChangelog”:“ri.foundry.main.dataset.9f6c111a-00a2-4eed-b751-108dc259cacf”,“pageSnapshot”:“ri.foundry.main.dataset.c557197b-7bf9-4e3d-9569-2f264634f5f0”,“pageDatasetHasPatches”:false,“patchedBaseVersions”:{},“additionalInput”:{},“usesOldBaseVersionFormat”:false,“patchOffsetReused”:true,“awaitingJobRid”:“ri.foundry.main.job.31fe85ad-01a7-4920-9457-29de8d848a6f”,“includesPatchOffsetColumn”:true,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“EXTRA_SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}}]}

{“objectType”:{“currentState”:“funnel-only-indexing”,“uid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“rid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“ontologyBranchRid”:“ri.ontology.main.branch.98d10c18-87f0-466b-b473-61a5381e7ae0”,“defaultBranch”:true,“definition”:{“rid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“ontologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“datasources”:{“ri.ontology.main.datasource.0c5134de-151a-48c6-88d0-103c19f688b8”:“ri.foundry.main.dataset.c5d6c911-5cc2-4fb1-ab1c-9608b3644cb9”},“acceptsPatches”:false,“dbs”:{“ri.highbury.main.cluster.1”:“highbury”},“migrations”:{“transitions”:[],“target”:1},“indexingConfig”:“FunnelOnly”},“activePipeline”:{“type”:“batch”,“batch”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”},“replacementPipeline”:{“type”:“batch”,“batch”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”},“v2”:true},“batchPipelines”:[{“currentState”:“await-datasource-changelogs-ready”,“pipelineId”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogs”:[“a0a66503-568a-4386-afba-46d2a98cf5f0”],“ontologyVersion”:“00000014-a622-6d4a-af73-6af87c9b1e61”},{“currentState”:“await-initial-syncs-ready-v2”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogs”:[“b9dcf3c3-7f8c-4481-886d-300e62714532”],“ontologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“page”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”,“persistentSyncs”:{“452a231c-1673-4441-9470-24967046f0af”:“BOOTSTRAPPING”},“nextBaseVersion”:“10000000-0000-0002-0000-000000000001”,“baseVersionAcked”:false}],“changelogs”:[{“currentState”:“up-to-date”,“changelogId”:“b9dcf3c3-7f8c-4481-886d-300e62714532”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“datasourceRid”:“ri.ontology.main.datasource.0c5134de-151a-48c6-88d0-103c19f688b8”,“datasourceLocatorRid”:“ri.foundry.main.dataset.c5d6c911-5cc2-4fb1-ab1c-9608b3644cb9”,“changelog”:“ri.foundry.main.dataset.1247e33d-19aa-45d0-bf1c-0141276f0d6b”,“snapshot”:“ri.foundry.main.dataset.08b32043-b1ae-48c5-8137-f90e592dd0ee”,“awaitingJobRid”:null,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}},{“currentState”:“build-failure-backoff”,“changelogId”:“a0a66503-568a-4386-afba-46d2a98cf5f0”,“pipelineId”:“1b7ddcc6-89cd-4dd5-a701-8ab6d7f8817b”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“datasourceRid”:“ri.ontology.main.datasource.ed0c2bdc-72d2-4e4d-a648-d8d68f1c4d19”,“datasourceLocatorRid”:“ri.foundry.main.dataset.09981206-63a8-4129-a8a4-74379d46f011”,“changelog”:“ri.foundry.main.dataset.707f83f0-1f05-48d0-9d77-c237eceb011e”,“snapshot”:“ri.foundry.main.dataset.e9f2c67a-5157-4fa8-a554-b083f830d6e6”,“awaitingJobRid”:null,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“EXTRA_SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}}],“pages”:[{“currentState”:“up-to-date”,“pageId”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeRid”:“ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“changelogInputs”:[“b9dcf3c3-7f8c-4481-886d-300e62714532”],“pageChangelog”:“ri.foundry.main.dataset.9f6c111a-00a2-4eed-b751-108dc259cacf”,“pageSnapshot”:“ri.foundry.main.dataset.c557197b-7bf9-4e3d-9569-2f264634f5f0”,“pageDatasetHasPatches”:false,“patchedBaseVersions”:{},“additionalInput”:{},“usesOldBaseVersionFormat”:false,“patchOffsetReused”:true,“includesPatchOffsetColumn”:true,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“EXTRA_SMALL”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null}}],“persistentSyncs”:[{“currentState”:“up-to-date”,“persistentSyncId”:“452a231c-1673-4441-9470-24967046f0af”,“pageId”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”,“pipelineId”:“b95d37e5-370f-47ca-a5a6-f3e117eac2b3”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“dbType”:“highbury”,“dbRid”:“ri.highbury.main.cluster.1”,“activeSyncId”:“d245a8be-b702-44f9-bbc1-06fb85aa23c7”,“backgroundSyncId”:null}],“syncs”:[{“currentState”:“sync-failure-backoff-v2”,“syncId”:“d245a8be-b702-44f9-bbc1-06fb85aa23c7”,“persistentSyncId”:“452a231c-1673-4441-9470-24967046f0af”,“objectTypeUid”:“fe2beba1-246d-4299-bfcf-b827a03b953c”,“pageId”:“b227cfc4-6f82-4224-9179-0d2eda638ff2”,“nextBaseVersion”:“10000000-0000-0002-0000-000000000001”,“currentOntologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“ackedOntologyVersion”:“00000014-a661-caf0-8a80-796a9a415190”,“dbType”:“highbury”,“dbRid”:“ri.highbury.main.cluster.1”,“transformResourcesDiagnostic”:{“inferredTransformProfile”:“HIGHBURY_OPTIMIZED”,“profileConfig”:{“type”:“automatic”,“automatic”:{}},“profileOverride”:null,“upscaling”:false,“bucketingSpec”:null},“targetBaseVersions”:[“10000000-0000-0002-0000-000000000001”]}]}

Claude’s perspective on the diagnostic logs:-

Here’s what the log is telling you:

The Object Type in Question
This is a single object type (fe2beba1-246d-4299-bfcf-b827a03b953c) currently in funnel-only-indexing state — meaning it’s not fully indexed yet, only funneled.

Two Pipelines Are Running Simultaneously
This is the key thing to understand. There’s a race/replacement situation:

  • Active pipeline (1b7ddcc6...) — the older one, currently stuck
  • Replacement pipeline (b95d37e5...) — a newer one that was queued to replace it, and is actually progressing

The Actual Failure
The problem is in the active pipeline’s changelog (a0a66503...), which has a currentState of build-failure-backoff. This means a build job failed and the system is in a retry backoff state. The datasource it’s tied to is ri.foundry.main.dataset.09981206.... Notably, it inferred an EXTRA_SMALL transform profile, which could be relevant if the dataset is larger than expected.

The Replacement Pipeline Is Actually Fine
The replacement pipeline (b95d37e5...) is at await-page-ready-v2 and its changelog is up-to-date. The page within it is at await-build, with an active job (ri.foundry.main.job.31fe85ad...). This pipeline is progressing normally.

What This Likely Means Practically
The system should self-heal — once the replacement pipeline’s page build completes, it will take over from the failed active pipeline. The build-failure-backoff on the active pipeline may be a non-issue if the replacement promotes successfully.

What to Check

  • Look at the job logs for the failed build on datasource 09981206 — that’s where the root cause error will be
  • Check if the EXTRA_SMALL profile was undersized for the dataset — if the dataset grew, the transform may have OOM’d or timed out
  • Monitor whether the replacement pipeline promotes cleanly; if it does, you may not need to intervene at all



This is a more evolved snapshot — and the situation has gotten more complicated. There are now two separate failures happening.

What’s the Same
The original build-failure-backoff on changelog a0a66503 (active pipeline, datasource 09981206) is still there, unchanged.

What’s New and More Concerning
The replacement pipeline has progressed but is now hitting its own problem. The page build completed (up-to-date), but the sync to Highbury is now failing — sync d245a8be is in sync-failure-backoff-v2. This is the step where the built data actually gets pushed into the search/query database.

The replacement pipeline’s state has also shifted from await-page-ready-v2 to await-initial-syncs-ready-v2, with the persistent sync showing as BOOTSTRAPPING — meaning it’s trying to do an initial population of the Highbury cluster but can’t complete it.

What This Means
The self-healing path I mentioned in the previous log has stalled. The replacement pipeline got through its build phase but is now stuck at the database sync layer. Both pipelines are effectively blocked — one at build, one at sync.

What to Investigate
The sync failure against ri.highbury.main.cluster.1 is the priority now. Common causes include schema mismatches between the page output and what Highbury expects, a cluster-side issue, or a malformed record causing the sync job to abort. You’ll want to pull the logs for the sync job tied to d245a8be — look specifically at the Highbury ingest error.

Hi @helenq - from the Claude analysis above… I wonder if this is the root cause, the dataset is simply too large for the compute profile applied to the Ontology pipeline?

Is this something that’s configurable on the back end?

thanks!

Hey @Samwise_AIP do you know what stack you are on or are you able to access grafana logs?

if EXTRA_SMALL was too small, then Funnel should automatically retry with an upscaled profile

eg. if you see another copy of the diagnostic, following the pipeline stopping any retries, we would imagine the profile is larger

cc: @KZajac to help answer questions related to object indexing

I’m on a developer tier stack, Hopefully I’m answering the question you’re asking!

I’m not sure I can access the logs you mentioned.

I can retry and check the logs after a subsequent failure to see if the profile increases. Thanks!

I’m not seeing the compute profile increase after each failure of the indexing, the transform profile will get pushed to SMALL from EXTRA_SMALL but still fails, then doesn’t go any higher.

Hey @Samwise_AIP!

I’ve managed to find your Object Type (ri.ontology.main.object-type.fe2beba1-246d-4299-bfcf-b827a03b953c) in our internal logs.

Unfortunately, I don’t think you’ll be able to sync a dataset of this size to Highbury on the dev-tier. At the dev-tier there are some restrictions on the resources you have available, and we’re hitting that limit for the sync job for this Object Type. In this case, we’re capping the resource to the SMALL profile.

I’ll look into whether we could communicate this better through the front-end!

1 Like

Hey @KZajac

Thanks, good to know it is a limitation on Dev, appreciate you digging into this!

1.6m rows with embeddings is 10yrs of ticket history from our CRM.

I know the profiles on Dev can handle ~800k rows with embeddings, so I’ll try and drop back 1 year’s worth of tickets at a time and see at which point I can limbo under the limits

Thanks again!