Resource Limitation Issue When Running Pipeline on Ingested Dataset

tommylees112 · December 30, 2024, 8:51am

I’m encountering a resource limitation issue when attempting to run a pipeline on a dataset I’ve ingested into Foundry. The dataset is approximately 4GB in size (~30 million rows). I was previously able to process it successfully, but now I’m hitting resource constraints.

To troubleshoot, I tried reducing the dataset size in the first transform by selecting only more recent records (~10 million rows), but I still encounter the same error. Below is the job error details:

Job Error:

{
    "errorCode": "INTERNAL",
    "errorName": "ModuleExitReason:RequestedResourcesExceedResourceQueueCapacity",
    "errorInstanceId": null,
    "safeArgs": {
        "sparkModuleId": "bac9183d-87ee-41d7-857e-0883dfb8ea49",
        "exitReason": "REQUESTED_RESOURCES_EXCEED_RESOURCE_QUEUE_CAPACITY",
        "exitMessage": "Module is requesting more compute resources than can ever be provided by the limits of the Resource Queue. Consider reducing the resource requirements or increasing the Resource Queue Limits."
    },
    "unsafeArgs": {}
}

Error Message:

Module died. Exit reason: REQUESTED_RESOURCES_EXCEED_RESOURCE_QUEUE_CAPACITY.

Message not helpful?  
The driver running the job crashed, ran out of memory, was terminated, or otherwise became unresponsive while it was running. Try rebuilding, and if the problem persists, see logs for more information to confirm if either the driver or executor, or both, ran out of memory and try increasing driver and executor memory accordingly. The exit reason for the crash was REQUESTED_RESOURCES_EXCEED_RESOURCE_QUEUE_CAPACITY.

From the message, it appears that the pipeline is requesting more resources than the Resource Queue can provide. I’d appreciate any advice on how to resolve this issue. Specifically:
1. Are there ways to further optimize the pipeline or reduce resource requirements?
2. Is it possible to adjust the Resource Queue limits, and if so, how can this be done?
3. Could recent changes in cluster configuration or limits be affecting this, and is there a way to check?
4. Are there specific logs or configurations I should inspect to better understand the issue?

Thanks in advance for your help!

Tommy

VincentF · December 30, 2024, 8:56am

Resources queues are managed in Resource Manager App: https://www.palantir.com/docs/foundry/resource-management/resource-queues

You might have access or you might need to request an administrator of your Foundry enrollment to get a look there.

The problem is rather not specific to the size of your dataset, but rather the resources available vs the resources requested for your build (in other words, you can have a 1 MB dataset and still hit the issue, because it occurs when the build starts and request resources).

If you use Pipeline Builder, you can configure the resources requested by your build, there: https://www.palantir.com/docs/foundry/pipeline-builder/management-build-settings

However, assuming you haven’t changed or bumped the resources requested in the first place, it is very surprising to hit this limit. Hence I would encourage you to contact support (for example via Foundry Issues) to validate if there is an issue ongoing on your specific enrollment.

system · February 28, 2025, 8:57am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.