How to get the last build status (or last update) from a RID using the REST API reference without curl

Uli · July 22, 2024, 2:58pm

For creating data expectations we need the last build status (and the timestamp from the last update) from a table in Foundry.

Currently we created a token and use the REST API reference to access the metadata of a RID, however, we cannot find the build status and the timestamp from the last update. Does anybody know how to get this data in a dataframe in order to create data expectations?

Important: we do NOT want to use the curl command. We want to use it inside of a code respository.

taylor · July 23, 2024, 1:31am

Are you able to elaborate on what the goal is in this situation?

Most commonly, Data Expectations are used to evaluate conditions endogenous to the current transaction. That is – it’s easy to supply checks with information availabe in the dataset and, optionally, the previous or current transaction if it’s an incremental transform. Data Health Checks, in contrast, are most commonly used to evaluation conditions that could span multiple transactions (e.g. how long has it been since the previous successful build?).

That said, if you really want to make an API call in the context of this data expection, here’s how you could do it:

First, get the transaction associated with the branch you care about (use Python instead of curl in your transform, but I’m going to use curl for demo purposes). Often, the branch id you want is master:

curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" $HOST/foundry-catalog/api/catalog/datasets/[DATASET_RID]/branches2/[BRANCH_ID]

Next, use the transaction id returned from that to get all the info you’re interested in:

curl -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" $HOST/foundry-catalog/api/catalog/datasets/[DATASET_ID]/transactions/[TRANSACTION_ID]

The response will look something like this:

{
  "type": "SNAPSHOT",
  "status": "COMMITTED",
  "filePathType": "MANAGED_FILES",
  "startTime": "2024-07-01T02:12:16.584Z",
  "closeTime": "2024-07-01T02:13:01.510Z",
  "permissionPath": "REDACTED/REDACTED",
  "record": {
    "jobRid": "ri.foundry.main.job.REDACTED",
    "runRid": "ri.foundry.main.job.REDACTED"
  },
  "attribution": {
    "userId": "REDACTED",
    "time": "2024-07-01T02:12:16.584Z"
  },
  "metadata": {
    "fileCount": 2,
    "totalFileSize": 1608621,
    "hiddenFileCount": 1,
    "totalHiddenFileSize": 1469119
  },
  "isDataDeleted": false,
  "isDeletionComplete": false,
  "rid": "ri.foundry.main.transaction.REDACTED",
  "provenance": {
    "provenanceRecords": [
      {
        "datasetRid": "ri.foundry.main.dataset.REDACTED",
        "transactionRange": {
          "startTransactionRid": "ri.foundry.main.transaction.REDACTED",
          "endTransactionRid": "ri.foundry.main.transaction.REDACTED"
        },
        "schemaBranchId": "master",
        "schemaVersionId": "REDACTED",
        "nonCatalogResources": [],
        "assumedMarkings": []
      }
    ],
    "nonCatalogProvenanceRecords": []
  },
  "datasetRid": "ri.foundry.main.dataset.REDACTED"
}

Uli · August 8, 2024, 2:43pm

Hi Taylor,

thank you for your response and sorry for my late reaction…
To keep it short and simple, I just want to access the metadata, that can be seen in the GUI (left hand side of a dataset). Things like Updated, Created, Location, Type, Size, Updated via, Tags, Health Checks, Inputs,…

We need this to check the status of all our datasets that we use in Foundry. Of course we could click some data health checks, but this is only reasonable when you have a handful of tables. In our case we have several houndreds of tables and this is why we want to create a programmatic way of reading the metadata of all datasets first and checking it afterwards.

nicornk · August 8, 2024, 5:06pm

You will need to make a few API calls to different foundry backend services to gather the information the UI shows you.

I would recommend to setup the monitoring inside foundry - using health checks, data expectations, health check groups etc.

We are doing this for ten-hundred thousands of datasets on our stack, so it does scale very well.

Uli · August 8, 2024, 9:10pm

Yes, I believe that it scales very well and we already use data health checks, data expectations and check groups.

However, there is no possibility to add or edit data health checks to several tables in one step. We have to open every single table and then manually click through the data health check menu - this is not very comfortable and above all, it is very slow.

Why is it not possible to take a whole bunch of tables and add the “time since last updated” data health check in one click?