How can i remove and delete files from a dataset by filename?

I have a dataset with files in it. I want to remove some of the files within that dataset, and keep the remaining files in that dataset. Is there a way to remove the data from within a dataset by filename?

You can achieve this with a DELETE transaction, which you can perform either by API call (https://www.palantir.com/docs/foundry/api/v2/datasets-v2-resources/files/delete-file/) or from the UI (there is a trash can icon next to each file in the โ€œFilesโ€ section of the dataset โ€œDetailsโ€ tab).

Note that performing a DELETE transaction in this way will cause issues for downstream incremental transforms (e.g., for the effect on downstream Python incremental transforms, see https://www.palantir.com/docs/foundry/transforms-python/incremental-reference#append-only-input-changes). It will likewise cause issues if the dataset itself is updated incrementally, whether via Data Connection or transforms.

1 Like
import requests

def delete_file(stack_url, dataset_rid, file_path, headers, branch_name='master'):
    url = f"https://{stack_url}/api/v2/datasets/{dataset_rid}/files/{file_path}"
    params = {"branchName": branch_name}
    response = requests.delete(url, headers=headers, params=params)
    response.raise_for_status()

def process_files(stack_url, dataset_rid, headers, logical_paths):
    total_success = 0
    failed_files = []

    for file_path in logical_paths:
        try:
            delete_file(stack_url, dataset_rid, file_path, headers)
            print(f"โœ… Successfully deleted file: {file_path}")
            total_success += 1
        except requests.HTTPError as e:
            print(f"โŒ Error deleting file {file_path}: {e}")
            failed_files.append(file_path)

    print_summary(total_success, failed_files)

def print_summary(total_success, failed_files):
    print("\n๐Ÿ“Š Summary:")
    print(f"โœ… Total files successfully deleted: {total_success}")
    print(f"โŒ Total errors encountered: {len(failed_files)}")
    if failed_files:
        print("โŒ Failed files:")
        for file in failed_files:
            print(f"   - {file}")

STACK_URL = <INSERT_STACK_URL>
DATASET_RID = <INSERT_DATASET_RID>
HEADERS = {
    'authorization': 'Bearer <INSERT_TOKEN>',
    'content-type': 'application/json',
}
logical_paths = [
    "file_logical_path.txt",
]

process_files(STACK_URL, DATASET_RID, HEADERS, logical_paths)
1 Like