Is it possible to control when a Retention Delete transaction is executed?
When I check the transaction history, I see that it is executed at different times depending on the day, so I would like to know what triggers it.
date |
time |
transaction |
2025/4/18 |
2:43 |
Append |
2025/4/18 |
4:46 |
Append |
2025/4/18 |
6:49 |
Append |
2025/4/18 |
8:52 |
Append |
2025/4/18 |
10:56 |
Append |
2025/4/18 |
12:59 |
Append |
2025/4/18 |
15:03 |
Append |
2025/4/18 |
16:06 |
Append |
2025/4/18 |
17:35 |
Delete |
2025/4/18 |
18:09 |
Append |
2025/4/18 |
20:13 |
Append |
2025/4/18 |
22:16 |
Append |
2025/4/19 |
0:19 |
Append |
2025/4/19 |
2:22 |
Append |
2025/4/19 |
4:26 |
Append |
2025/4/19 |
6:29 |
Append |
2025/4/19 |
8:33 |
Append |
2025/4/19 |
10:37 |
Append |
2025/4/19 |
10:55 |
Delete |
2025/4/19 |
12:41 |
Append |
2025/4/19 |
14:44 |
Append |
2025/4/19 |
16:47 |
Append |
2025/4/19 |
18:51 |
Append |
2025/4/19 |
20:54 |
Append |
2025/4/19 |
22:59 |
Append |
Hey, retention runs in 24 hour cycles (mostly) and order of the datasets processed is not fixed. So, the exact time when retention runs on a dataset cannot be controlled. What is the reason you want this to be deterministic?
Thank you for your answer. Currently, it is not necessary to control it, but I thought that if the timing of deletion was decided, it would be easier to design the execution timing of subsequent processing and calculation resources.
Hey, the timing of the transaction as stated by @tmishra is indeterminant and cannot be controlled. As for what triggers it, is likely a retention policy set up for your dataset either on the namespace level or some default retention policy.
A DELETE transaction is committed to your dataset, if one or more APPEND transactions in the latest view are eligible for deletion (as per the retention policy(s)). I think, the first step would be look at what policy is actually deleting those transactions. You can look at the “custom metadata” section of one of the APPEND transactions in your history and determine which policy deleted it. The metadata should look something like
{
"foundry-retention": {
"MARKED_FOR_DELETION": "2023-11-06T14:18:50.130493624Z",
"COMPLETED_DELETION": "2023-11-13T14:30:52.303711079Z",
"safePolicyLocator": {
"policy": "somePolicy",
},
"markFlags": "[MARK_V2_DATASET_CONTEXT]"
}
}
You can then reach out to your Palantir admin or namespace owner to figure out the policy given the id and understand it better. It should likely not be a default policy as that does not delete from the latest view. Documentation for transaction selectors are present here.
it would be easier to design the execution timing of subsequent processing and calculation resources.
Could you please describe a bit more on this ? Why is execution timing dependent on the DELETE transaction ? Do you have transactions that run for a longer time or is the DELETE transaction interfering/aborting your main transactions ?
Thank you for your comment.
I understand that you cannot control the timing of deletion.
I also understand that you can check the policy that applies with custom metadata.
Could you please describe a bit more on this ? Why is execution timing dependent on the DELETE transaction ? Do you have transactions that run for a longer time or is the DELETE transaction interfering/aborting your main transactions ?
I didn’t give it too much thought, but I figured that since this would be run daily, it would be cheaper to process a smaller set of data subsequently (after the Delete transaction has been executed).
–
By the way, is it correct to understand that Foundry Compute is also charged for executing Delete transactions?
By the way, is it correct to understand that Foundry Compute is also charged for executing Delete transactions?
I wouldn’t say that compute is charged for delete transaction per say. It should be charged based on how much data is in your view when the the downstream transform runs which can be reduced by DELETE transactions.
What is the size of the files in your DELETE transactions?
I understand that computing resources are not charged for each deletion transaction.
Dozens of files (xxx.log.gz) are deleted in one deletion transaction, but each one and the total is 0B.
Then I would say your compute costs are likely not gonna be affected. It depends overall more on the actual number/size of files in the view than the number of transaction.
However, I am not from the compute team so I would refer you to that if you still have concerns
1 Like