The problem I’m experiencing involves a data connection to a Google Cloud Storage database. I have had limited luck in reliably syncing new .parquet files from the database into my dataset, and have experimented with both Snapshot Syncs and Append Syncs.
I’m noticing that when I run a snapshot sync on the dataset (having cleared my GCS database), my dataset in Foundry does not clear to empty. The build completes with no issues, and yet when I open the file, the “last updated” parameter doesn’t seem to register it in real time, and it doesn’t empty. There are no filters on the snapshot sync.
My ultimate goal is to create an append sync that works every 5 minutes on a GCS database that updates every 5 minutes from an external script, but if I can’t get a simple snapshot sync to work (with zero data!), I do not see this happening reliably.
Hi @guyhartstein - sorry to hear you are running into issues with the Google Cloud Storage connector.
To ensure I understand the issue you are facing, when you try to sync files from your empty GCS database to a Foundry dataset that already contains files, you are expecting the Foundry dataset to reset to empty but it does not. Is that accurate?
If so, it’s worth clarifying that Data Connection never deletes previously ingested files. Instead, you would need to register a Delete Transaction via an API (https://www.palantir.com/docs/foundry/api/datasets-v2-resources/files/delete-file/) to remove the existing files. But in your case, it might actually be more straightforward to start with a new dataset into which to sync the GCP objects. Another point to clarify is that if Data Connection finds no files or objects to sync based on your sync configuration, it marks the sync as successful but does not actually commit a new transaction to the dataset. You will see these “aborted” transactions in the History tab of the dataset. So effectively, it does not update the dataset. This would explain why the “last updated” timestamp on your dataset isn’t being updated.
When you add new objects to the GCS database, do those sync successfully to Foundry? If not, what error or behaviour are you seeing?
This is very helpful. I was trying to zero out a dataset using that snapshot method. I managed to solve this problem by verifying that my uploaded file names were different and by starting with a snapshot upload, and only then starting to append in order to ensure no duplicate uploading. Thank you!