S3 Compatible API Issue

sean · July 12, 2024, 5:23pm

Has anyone gotten the s3 compatible api layer to work with foundry datasets.

I’ve followed this guide to set up the aws cli to point to Foundry but I get a permissions error when trying to run the cli example

 aws --profile foundry s3 ls s3://dataset-id

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: ServiceException: PERMISSION_DENIED (S3Proxy:PermissionDenied): {permissions={ri.foundry.main.dataset.dataset-id=[s3-proxy:datasets-read]}} (314b8251-af45-431f-98e1-04e0834171da

I have removed the dataset id above but i am passing a valid id

cdesouza · July 12, 2024, 7:20pm

Hi @sean - can you confirm whether you are using static credentials associated with a service user or temporary credentials associated with your user account? Depending on which user we’re dealing with, what roles does that user have on the dataset you’re trying to read?

sean · July 15, 2024, 9:47pm

Thanks for the reply, I missed a step in which i needed to grant the service user access to the project. I got the CLI to work but still having an issue where duckdb is running into a 403

HTTP GET error on ‘https://host/io/s3/dataset-rid/spark/dataset.parquet’ (HTTP 403)

nicornk · July 16, 2024, 1:59pm

This works for me, with duckdb 1.0.0:

import duckdb

con = duckdb.connect()
creds = {'access_key': 'PLTR...',
 'secret_key': '...',
 'token': '...'}
endpoint = "stack.palantirfoundry.com/io/s3"

con.execute(
    """
CREATE SECRET foundryConnection (
    TYPE S3,
    KEY_ID '{access_key}',
    SECRET '{secret_key}',
    SESSION_TOKEN '{token}',
    ENDPOINT '{endpoint}',
    URL_STYLE 'path',
    REGION 'foundry'
);
""".format(**creds, endpoint=endpoint)
)
df = con.execute(
    "SELECT * FROM read_parquet('s3://ri.foundry.main.dataset.d5308d45-9822-4e02-afb9-2704636308ee/**/*.parquet') LIMIT 1;"
).df()  # replace with a dataset RID of yours, should be a dataset with parquet files

print(df.head())

sean · July 16, 2024, 3:41pm

Thank you, I’ll try using temporary credentials with the session option.

tpowell · July 19, 2024, 3:55pm

When creating static credentials you should have setup a third-party application and configured a set of projects for which those credentials will be restricted.

The dataset you are querying needs to be in one of those projects.

You should also make sure that the third-party app has client credentials enabled. This will create a service user for that third-party app. The service user associated with the third-party app needs to be given permission to the projects and any markings on those projects - ultimately the service user needs to have access to the dataset.