Archive dataset files

ALAIZA · June 5, 2025, 11:07am

Currently I have 2 big datasets of around 10Tb each:

One of them is just historical data that we don’t process anymore, but keep all the downstream datasets
The second is running incrementally and every day we add a new day of data.

I would like to do any of the following things:

The first historical dataset I would like to archive it in the cheapest way possible, this data potentially wont be accessed anymore, worst case scenario we delete it, but then given the size of it I hope that just press “delete” works or is there any other concern?
The second dataset I would like to partially archive it or reduce his size somehow, in reality we never access old data, we just access the incremental data of the day to aggregate downstream and we rarely/never “reprocess” the data, so making it lighter would help

Are there any ideas how we can tackle this problems?

Maverick · June 5, 2025, 12:09pm

Hello Alaiza,

Check out Solution Designer it’s a great way to plan.

VincentF · June 5, 2025, 1:56pm

Not sure to understand your question. Do you just want to delete this dataset then ? you can make sure to delete and permentaly delete from the trash if so.
You might want to look into https://community.palantir.com/t/how-to-snapshot-a-dataset-from-time-to-time-automatically/2772 which will get you a way to regularly “clean up” your dataset. You can keep it incremental writing and reading.