Has anyone gotten the s3 compatible api layer to work with foundry datasets.
I’ve followed this guide to set up the aws cli to point to Foundry but I get a permissions error when trying to run the cli example
aws --profile foundry s3 ls s3://dataset-id
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: ServiceException: PERMISSION_DENIED (S3Proxy:PermissionDenied): {permissions={ri.foundry.main.dataset.dataset-id=[s3-proxy:datasets-read]}} (314b8251-af45-431f-98e1-04e0834171da
I have removed the dataset id above but i am passing a valid id
Hi @sean - can you confirm whether you are using static credentials associated with a service user or temporary credentials associated with your user account? Depending on which user we’re dealing with, what roles does that user have on the dataset you’re trying to read?
Thanks for the reply, I missed a step in which i needed to grant the service user access to the project. I got the CLI to work but still having an issue where duckdb is running into a 403
HTTP GET error on ‘https://host/io/s3/dataset-rid/spark/dataset.parquet’ (HTTP 403)
When creating static credentials you should have setup a third-party application and configured a set of projects for which those credentials will be restricted.
The dataset you are querying needs to be in one of those projects.
You should also make sure that the third-party app has client credentials enabled. This will create a service user for that third-party app. The service user associated with the third-party app needs to be given permission to the projects and any markings on those projects - ultimately the service user needs to have access to the dataset.