I can write a file in the FileSystem like so:
with output.filesystem().open(“example.txt”, “wb”) as new_file:
new_file.write(“example”)
Is it possible to DELETE a file in the output filesystem within an incremental transform?
I can write a file in the FileSystem like so:
with output.filesystem().open(“example.txt”, “wb”) as new_file:
new_file.write(“example”)
Is it possible to DELETE a file in the output filesystem within an incremental transform?
Hello!
No, you cannot delete files from the output filesystem within an incremental transform using the standard API.
File deletion is managed at the dataset or platform level, not within individual transform code.
So, there are a few levels to discuss here:
“Can I remove the content of a file as part of an incremental transform ?”
==> Yes. You can just update the file with zero bytes, with the same name (you “overwrite” this file). Except if you are doing specific things (Java Incremental e.g.) then you will have an “UPDATE” transaction added to your output, with this file being overwritten.
This is not preventing access to the old file ! Given you overwritten a file by another, the old file is still accessible in the old transaction. You can access it via the “History” tab or by reverting transactions (on a branch or on master) etc.
The goal is rather for this particular file to not “participate” in the current view of the dataset.
“Can I have a DELETE transaction as part of an incremental build ?”
==> This, the answer is no. An incremental transaction will generate an APPEND or UPDATE or SNAPSHOT transaction.
To my knowledge, it is only possible to create a delete transaction programmatically (via API). See https://www.palantir.com/docs/foundry/api/v1/datasets-resources/transactions/create-transaction/
A DELETE transaction will only delete files. You can’t update/append other files.
Note: Same thing, the old files will still be accessible given they are simply marked as “should not participate in the current view” by the new DELETE transaction.
“How can I delete my old files ?”
==> Retention allows to delete and cleanup old transactions/files/etc. The exact behavior depends on the configuration.
See https://www.palantir.com/docs/foundry/retention/overview
Assuming your use-case is to delete files programmatically per some event: One idea would be to have another transform or function that hits platform APIs to create this “DELETE” transaction programmatically.
However, be aware, that a DELETE transaction will likely trigger a downstream snapshot (an UPDATE transactions, given the current view of the dataset changed and some data “disappeared”)
There are a few threads here were workarounds are explained, e.g. manually setting transaction metadata on the DELETE transaction and fake that the transaction was made by the retention service.
The real solution will probably the iceberg tables within foundry that support sophisticated row updates and merge into schemes - however, they clearly depart from the traditional foundry dataset semantics.