Hello Community team,
I’m currently looking for an API call to retrieve the number of rows in a dataset . I need this to avoid doing using count() in the code, which takes a lot of time for large datasets and very often leads to OOMs (see code below).
I found the foundry satck overflow post here( Utilizing Foundry APIs, how do you get the number or rows and columns for a dataset? - Stack Overflow) where the second simpler solution doesn’t seem to work anymore.
import requests
import json
def getComputedDatasetStats(token, dataset_rid, api_base='https://.....'):
response = requests.post(
url=f'{api_base}/foundry-stats/api/computed-stats-v2/get',
headers={
'content-type': 'application/json',
'Authorization': 'Bearer ' + token
},
data=json.dumps({
"datasetRid": dataset_rid,
"branch": "master"
})
)
return response.json()
token = 'eyJwb.....'
dataset_rid = 'ri.foundry.main.dataset.1d9ef04e-7ec6-456e-8326-1c64b1105431'
result = getComputedDatasetStats(token, dataset_rid)
# full resulting json:
# print(json.dumps(result, indent=4))
# required statistics:
print('size:', result['computedDatasetStats']['sizeInBytes'])
print('rows:', result['computedDatasetStats']['rowCount'])
print('cols:', len(result['computedDatasetStats']['columnStats']))
When I try to do it the call the answer for computedDatasetStats is empty (see response below).
{'datasetRid': 'ri.foundry.main.dataset.eba120ad-a65d-469c-89eb-bfdce138a7be', 'branch': 'master', 'endTransactionRid': 'ri.foundry.main.transaction.00000047-xxxxxxxxxxx', 'schemaId': '0000000-xxxxxxxxxxxx', 'computedDatasetStats': None}
Has anyone ever been able to get this endpoint work for the API call or know another simple and functional api call to realize this?
Best,