Issue with s3 Data Connection Tutorial

Hello!

I am working on the “Speedrun: Creating Your First Data Connection Tutorial” and I am encountering an issue when creating the data source.

When I get to the end of the data source creation workflow and click “explore”, I see the following:

I have double checked that the s3 uri is correct, that the egress policy is correct, and that the aws credentials have been entered correctly.

Under “See details”, I can see the full stack trace which I have copied here:

The explorer command failed to run:
java.lang.Throwable:RemoteException: CUSTOM_CLIENT (ExplorerCommand:ExplorerCommandFailed) with instance ID 811d285f-307d-451a-98e1-7c3e2c666ad7: {exceptionClass=com.palantir.logsafe.exceptions.SafeRuntimeException, message=The credentials provided have insufficient rights to access the requested resource, please check that the provided credentials are valid and have the required permissions. This might be caused by the bucket not allowing request from the job's origin. For agents, you will need to allow incoming traffic from the agent host or from the proxy if you have one setup (this can be setup in the source config or in the agent advanced config via JVM properties). For direct connection, if you bucket is in the same region as the Foundry stack please allow incoming traffic from the Foundry vpc endpoint (please contact Palantir to get the right endpoint). If your bucket is in a different region please allow incoming requests from the Foundry egress IPs found in the control panel app.: {exceptionClass=com.amazonaws.services.s3.model.AmazonS3Exception, isRetryable=true, statusCode=403, errorType=Client, serviceName=Amazon S3, errorCode=AccessDenied, requestId=ZJKPK740HKB6C7XF, errorMessage=Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ZJKPK740HKB6C7XF; S3 Extended Request ID: RfawdOrF6m9Izf1u1xMje9TZq5ZQB7rvB8oWkX8WMfOMYeBv3kq52rR6RlhVF4MPe7NDu4nNSed19f4tA/AJ7Q==; Proxy: )}, stacktrace=com.palantir.magritte.plugin.s3.errors.AwsForbiddenEnhancement.enhanceException(AwsForbiddenEnhancement.java:30)
com.palantir.magritte.plugin.s3.errors.AwsExceptionEnhancer.lambda$enhanceException$0(AwsExceptionEnhancer.java:44)
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
java.util.AbstractList$RandomAccessSpliterator.tryAdvance(AbstractList.java:706)
java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129)
java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527)
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:647)
com.palantir.magritte.plugin.s3.errors.AwsExceptionEnhancer.enhanceException(AwsExceptionEnhancer.java:45)
com.palantir.magritte.plugin.s3.errors.AwsExceptionEnhancer.runWithExceptionEnhancement(AwsExceptionEnhancer.java:37)
com.palantir.magritte.plugin.s3.WrappedS3Client.listObjectsV2(WrappedS3Client.java:37)
com.palantir.magritte.plugin.s3.S3Crawler$ListObjectsV2ResultIterator.computeNext(S3Crawler.java:49)
com.palantir.magritte.plugin.s3.S3Crawler$ListObjectsV2ResultIterator.computeNext(S3Crawler.java:30)
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
java.util.Iterator.forEachRemaining(Iterator.java:132)
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
com.palantir.magritte.plugin.s3.S3FileAndDirExplorer.getFileTreeNodes(S3FileAndDirExplorer.java:44)
com.palantir.magritte.plugin.s3.S3BucketReader.getFileTreeNodes(S3BucketReader.java:68)
com.palantir.magritte.plugin.s3.S3DirectSource.getFileTreeNodes(S3DirectSource.java:263)
com.palantir.magritte.source.explore.FileBasedExplorableSource$1.visitGetFileTreeNodesRequest(FileBasedExplorableSource.java:21)
com.palantir.magritte.source.explore.FileBasedExplorableSource$1.visitGetFileTreeNodesRequest(FileBasedExplorableSource.java:18)
com.palantir.magritte.explorer.api.FileBasedExplorationRequest$GetFileTreeNodesRequestWrapper.accept(FileBasedExplorationRequest.java:230)
com.palantir.magritte.explorer.api.FileBasedExplorationRequest.accept(FileBasedExplorationRequest.java:65)
com.palantir.magritte.source.explore.FileBasedExplorableSource.exploreFileBased(FileBasedExplorableSource.java:18)
com.palantir.magritte.api.Source$1.visitFileBased(Source.java:40)
com.palantir.magritte.api.Source$1.visitFileBased(Source.java:37)
com.palantir.magritte.explorer.api.ExplorationRequest$FileBasedWrapper.accept(ExplorationRequest.java:315)
com.palantir.magritte.explorer.api.ExplorationRequest.accept(ExplorationRequest.java:86)
com.palantir.magritte.api.Source.explore(Source.java:37)
com.palantir.magritte.cloud.explorer.CloudSourceExplorationResource.lambda$getExplorationResponse$12(CloudSourceExplorationResource.java:136)
com.palantir.magritte.cloud.explorer.CloudSourceExplorationResource.rethrowRuntimeExceptionsAsExplorerCommandFailures(CloudSourceExplorationResource.java:173)
com.palantir.magritte.cloud.explorer.CloudSourceExplorationResource.getExplorationResponse(CloudSourceExplorationResource.java:126)
com.palantir.magritte.cloud.explorer.CloudSourceExplorationServiceEndpoints$GetExplorationResponseEndpoint.handleRequest(CloudSourceExplorationServiceEndpoints.java:65)
com.palantir.conjure.java.undertow.runtime.ConjureExceptionHandler.handleRequest(ConjureExceptionHandler.java:42)
com.palantir.tracing.undertow.TracedStateHandler.handleRequest(TracedStateHandler.java:44)
com.palantir.conjure.java.undertow.runtime.LoggingContextHandler.handleRequest(LoggingContextHandler.java:40)
io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:852)
com.palantir.witchcraft.ActiveCountingExecutorService$TaskWrapper.run(ActiveCountingExecutorService.java:84)
com.palantir.nylon.threads.RenamingExecutorService$RenamingRunnable.run(RenamingExecutorService.java:92)
org.jboss.threads.EnhancedViewExecutor$EnhancedViewExecutorRunnable.run(EnhancedViewExecutor.java:501)
org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1521)
com.palantir.tritium.metrics.TaggedMetricsThreadFactory$InstrumentedTask.run(TaggedMetricsThreadFactory.java:94)
java.lang.Thread.run(Thread.java:840)}
com.palantir.conjure.java.dialogue.serde.DefaultClients.newRemoteException(DefaultClients.java:148)
com.palantir.conjure.java.dialogue.serde.DefaultClients.block(DefaultClients.java:123)
com.palantir.conjure.java.dialogue.serde.DefaultClients.callBlocking(DefaultClients.java:76)
com.palantir.magritte.cloud.explorer.CloudSourceExplorationServiceBlocking$1.getExplorationResponse(CloudSourceExplorationServiceBlocking.java:85)
com.palantir.magritte.coordinator.command.CloudSourceExplorationRequestMapper$1.subtype(CloudSourceExplorationRequestMapper.java:110)
com.palantir.magritte.coordinator.command.CloudSourceExplorationRequestMapper$1.subtype(CloudSourceExplorationRequestMapper.java:70)
com.palantir.magritte.bridge.command.SourceExplorationCommand.map(SourceExplorationCommand.java:50)
com.palantir.magritte.coordinator.command.CloudSourceExplorationRequestMapper.getCloudSourceExplorationResponse(CloudSourceExplorationRequestMapper.java:70)
com.palantir.magritte.coordinator.command.SourceExplorerCommandRunner.getCloudRunExplorerResponse(SourceExplorerCommandRunner.java:131)
com.palantir.magritte.coordinator.command.SourceExplorerCommandRunner.lambda$getExplorerResponseForSourceWithReadyRuntime$1(SourceExplorerCommandRunner.java:104)
com.palantir.magritte.store.source.api.RuntimePlatformResponse$VisitorBuilder$1.visitCloud(RuntimePlatformResponse.java:175)
com.palantir.magritte.store.source.api.RuntimePlatformResponse$CloudWrapper.accept(RuntimePlatformResponse.java:297)
com.palantir.magritte.store.source.api.RuntimePlatformResponse.accept(RuntimePlatformResponse.java:70)
com.palantir.magritte.coordinator.command.SourceExplorerCommandRunner.getExplorerResponseForSourceWithReadyRuntime(SourceExplorerCommandRunner.java:88)
com.palantir.magritte.coordinator.command.SourceExplorerCommandRunner.getExplorerResponseForSource(SourceExplorerCommandRunner.java:79)
com.palantir.magritte.coordinator.resources.FileBasedSourceExplorationResource.getFileTreeNodes(FileBasedSourceExplorationResource.java:51)
com.palantir.magritte.coordinator.api.FileBasedSourceExplorationServiceEndpoints$GetFileTreeNodesEndpoint.handleRequest(FileBasedSourceExplorationServiceEndpoints.java:73)
com.palantir.conjure.java.undertow.runtime.ConjureExceptionHandler.handleRequest(ConjureExceptionHandler.java:42)
com.palantir.tracing.undertow.TracedStateHandler.handleRequest(TracedStateHandler.java:44)
com.palantir.conjure.java.undertow.runtime.LoggingContextHandler.handleRequest(LoggingContextHandler.java:40)
io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:852)
com.palantir.witchcraft.ActiveCountingExecutorService$TaskWrapper.run(ActiveCountingExecutorService.java:84)
com.palantir.nylon.threads.RenamingExecutorService$RenamingRunnable.run(RenamingExecutorService.java:92)
org.jboss.threads.EnhancedViewExecutor$EnhancedViewExecutorRunnable.run(EnhancedViewExecutor.java:501)
org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2516)
org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2495)
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1521)
com.palantir.tritium.metrics.TaggedMetricsThreadFactory$InstrumentedTask.run(TaggedMetricsThreadFactory.java:94)
java.lang.Thread.run(Thread.java:840)

This feels like a bug. Any help would be greatly appreciated.

Can you post the policy of your AWS user/role that is used in the connection?

Hi Will! Can you also confirm if you are using a AIP Developer Tier account or one created through your organziation/employer? Also, can you confirm if you are still seeing this error as of this morning?

Garrett

Yes, on a dev tier account (my own, not provided to me through an organization).

And yes, I saw the issue this morning

Thanks, Will! And were you able to succesfully set-up an Egress Policy as detailed in this step in the training? We’ve recently seen some errors with Egress Policies on AIP Dev Tier accounts.
https://learn.palantir.com/speedrun-data-connection/1864433

Egress policy creation seems fine, but I have yet to successfully connect with any of them.

I have also created one for google.com and tried sending a simple ping to it from within a code repo (with the google egress policy imported to that repo) with no success.

Hey @willstone,

I wanted to test out a very simple connection to google.com. @gprellberg can continue helping with the data conneciton tutorial. I got very basic google.com working. The code below uses Source-based External Transforms. The code is as follows:

from transforms.api import transform_df, Output, Input
from transforms.external.systems import external_systems, Source


@external_systems(
    google_com_source=Source("ri.magritte..source.7dfe6041-c7c0-494d-bd2d-68bd94008ac1")
)
@transform_df(
    Output("ri.foundry.main.dataset.186cbf58-97aa-490d-bfcf-2f3c641986ae")
)
def compute(google_com_source, ctx):
    url = google_com_source.get_https_connection().url

    # client is a pre-configured Session object from Python `requests` library.
    client = google_com_source.get_https_connection().get_client()
    # Send a GET request to the URL
    response = client.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Get the content of the response as a string
        content = response.text
        df = ctx.spark_session.createDataFrame([(content,)], ["page_content"])
        return df
    else:
        return ctx.spark_session.createDataFrame([], ["page_content"]) 

This simply just dumps the google.com website to a single column and works. One thing I noticed that you may be running into is that requests to google.com are atuomatically redirected to www.google.com so you need to make sure to add that t your egress policies on the source. Without it the error I saw was like in the below screenshot:

1 Like

I did face same issue when I setup first time and this particular step is not captured in speedrun.

I’m not sure whether this adds any value in this particular context. You may need to take a look at “s3 bucket policies”.

“In addition to adding a bucket policy, a valid network egress policy must still be created and attached to the relevant workloads in order to successfully connect to S3.”

1 Like

Thanks Raj!

I’ve added an s3 bucket policy:

Though when I go to “explore source” in the data connection application I am still getting the same issue:

image

Thanks for this @malbin, it gets me closer.

Forgive my ignorance, but suppose I wanted to run this code using the @function decorator from the functions API so I could call it from a workshop application or something like that - how might I go about doing that?

You may need to have access point on S3.

@willstone
In which region is your foundry stack? If your bucket is in the same AWS region you need an egress policy and bucket policy.

1 Like

Hey @willstone - we actually discovered this is an issue on our side with allowing some of the new AIP Dev Tier stacks access to the datasources for the Datasource speedrun. Thank you for flagging and apologies for the issue. We’re working on fixing it.

1 Like

To add on calling google.com from Functions rather than pipelines. Currently the best way to call external services from Functions is going through a webhook as per https://www.palantir.com/docs/foundry/data-integration/external-functions/. This is a bit involved. Soon (order of a month, or two) you will be able to import sources directly into functions repositories and perform external calls directly in code.

2 Likes

Hi @willstone - we’ve resolved an issue with the S3 bucket in the Data Connection training and AIP Dev Tier accounts. I’ve tested and am now seeing the connection work properly. Are you able to test again on your instance? Hopefully this unblocks your work!

Best,
Garrett

1 Like