I have a dataset with files in it. I want to remove some of the files within that dataset, and keep the remaining files in that dataset. Is there a way to remove the data from within a dataset by filename?
You can achieve this with a DELETE transaction, which you can perform either by API call (https://www.palantir.com/docs/foundry/api/v2/datasets-v2-resources/files/delete-file/) or from the UI (there is a trash can icon next to each file in the โFilesโ section of the dataset โDetailsโ tab).
Note that performing a DELETE transaction in this way will cause issues for downstream incremental transforms (e.g., for the effect on downstream Python incremental transforms, see https://www.palantir.com/docs/foundry/transforms-python/incremental-reference#append-only-input-changes). It will likewise cause issues if the dataset itself is updated incrementally, whether via Data Connection or transforms.
1 Like
import requests
def delete_file(stack_url, dataset_rid, file_path, headers, branch_name='master'):
url = f"https://{stack_url}/api/v2/datasets/{dataset_rid}/files/{file_path}"
params = {"branchName": branch_name}
response = requests.delete(url, headers=headers, params=params)
response.raise_for_status()
def process_files(stack_url, dataset_rid, headers, logical_paths):
total_success = 0
failed_files = []
for file_path in logical_paths:
try:
delete_file(stack_url, dataset_rid, file_path, headers)
print(f"โ
Successfully deleted file: {file_path}")
total_success += 1
except requests.HTTPError as e:
print(f"โ Error deleting file {file_path}: {e}")
failed_files.append(file_path)
print_summary(total_success, failed_files)
def print_summary(total_success, failed_files):
print("\n๐ Summary:")
print(f"โ
Total files successfully deleted: {total_success}")
print(f"โ Total errors encountered: {len(failed_files)}")
if failed_files:
print("โ Failed files:")
for file in failed_files:
print(f" - {file}")
STACK_URL = <INSERT_STACK_URL>
DATASET_RID = <INSERT_DATASET_RID>
HEADERS = {
'authorization': 'Bearer <INSERT_TOKEN>',
'content-type': 'application/json',
}
logical_paths = [
"file_logical_path.txt",
]
process_files(STACK_URL, DATASET_RID, HEADERS, logical_paths)
1 Like