Is it possible to obtain information that distinguishes how a dataset was uploaded to Palantir?
Specifically, I would like to differentiate between manual upload, Fusion, and Data Connection.
At the moment, I am considering using the job spec fields such as workerType and inputSpecs to make this distinction. However, is there a clearer way to retrieve this information through the API?
Hey,
Are you trying to do this programmatically? If not, you can identify this from both the Dataset itself and Data Lineage.
If you navigate to any Dataset, there’s a metadata panel on the left-hand side with attributes like Created By/On, Updated By/On, etc. If you look for the attribute called Updated via, it will provide a hyperlink to the Data Connection source, Pipeline Builder artifact, or Code Repository that contains the logic used to build the dataset. If it says File imports, it was uploaded manually by a user and you can check the Created attribute to see who uploaded it and when (see below screenshot).

On Data Lineage, the default node coloring reflected by the legend on the top-right corner is Resource Type. It will reflect whether a particular dataset was manually uploaded or built via logic in Pipeline Builder or Code Repositories (different colors for lightweight transforms, PySpark, SQL, etc.). The same applies for datasets created by and synced from a particular Fusion sheet.
Thanks for the explanation and the context.
My main goal is to retrieve this information programmatically via the API. Do you know if those same attributes (e.g., the “Updated via” metadata, or the lineage resource type) are also exposed through the API, or if there’s an equivalent programmatic way to retrieve them?
That would help me confirm whether I should keep relying on job spec fields (workerType, inputSpecs, etc.) or if there’s a more direct/official API-supported approach.
There’s a POST endpoint:
https://{YOUR_FOUNDRY_URL}/build2/api/jobspecs/get-jobspecs-for-datasets/
that accepts a payload like:
{
"datasetRids": [
{TARGET_DATASET_RID}
]
"branch":"master",
"branchFallbacks": {
"branches": []
}
}
and returns a response. If you pass in a single, valid dataset RID in the payload you’ll get a single nested key-value pair as a response where the key is the same RID and the value is select fields in the jobspec as a struct. You can parse out [“jobSpec”][“inputSpecs”] which will return a list of struct entries (one struct per input) and based on the response (test this with datasets you already have access to built through different methods like manual upload, Pipeline Builder, Code Repos, Data Connection, etc.) you should be able to get what you want!
Thank you very much for the thorough explanation.
Based on your guidance, I will proceed by parsing the job spec (including inputSpecs).
This fully answers my question for now. I appreciate your help.
This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.