External transform fails to build

Hi,

I’m trying to build an external transform in Code Repositories but it’s failing to successfully build. I followed the external transforms documentation to set it up, and then I used the Pipelines template in Code Repositories to create the repository where my transform resides. The checks and build initialization are successful but my function fails with this error:

Failed to submit job to worker: TransformsWorker:SparkModuleFailedToSubmitToResourceQueue {name=40bed26b-e2c4-460d-9d6c-d16ecfd50414, type=python-1, message=Optional.empty, resourceQueue=ri.resource-policy-manager.global.resource-queue.ef296b60-d589-42e1-9364-13a8dfc0e1c8} (2ce95bc8-c13c-44bc-b6b8-c64b2d9e4b22)

Does anyone know what the problem is here? Thanks in advance!

Hi @jessica.lin ! Can you share your code?

Hi yes here it is below thanks in advance @mai125 !

from transforms.api import transform, Output
from transforms.external.systems import use_external_systems, EgressPolicy
from functions.sources import get_source
import json


@use_external_systems(
    egress=EgressPolicy('ri.magritte..source.c56607c4-e6ba-47e4-a558-d5fe7e6d2b93')
)
@transform(output=Output("ri.foundry.main.dataset.6f74a845-94a7-486a-b244-b5dae11d03ba"))
def create_dataset(egress, output):
    # Define the fixed base URL for the CVE API
    base_url = "https://services.nvd.nist.gov/rest/json/cves/2.0"

    # Access the client from the provided data source
    source = get_source("NvdCve")
    client = source.get_https_connection().get_client()

    # Initialize variables for pagination and results
    page_size = 2000  # Maximum results per page
    start_index = 0
    all_cves = []
    has_more_results = True

    while has_more_results:
        # Build query parameters for the API call
        params = {
            "startIndex": start_index,
            "resultsPerPage": page_size  # Adjust if needed by the API spec
        }

        # Make the API request
        response = client.get(base_url, params=params)
        response.raise_for_status()  # Raise exception for HTTP errors
        data = json.loads(response.text)

        # Extract vulnerabilities
        vulnerabilities = data.get("vulnerabilities", [])
        if not vulnerabilities:
            break  # Stop if no vulnerabilities are returned

        # Append all vulnerabilities to the list
        all_cves.extend(vulnerabilities)

        # Update pagination variables
        start_index += page_size
        has_more_results = start_index < data.get("totalResults", 0)

    # with output.filesystem().open('cve_data.json', 'w') as f:
    #     json.dump(all_cves, f, indent=2)
    return output.write_text(json.dumps(all_cves))

A couple of things that may cause the above error:
-create_datasetcompute

  • return output.write_text(json.dumps(all_cves))output.write_dataframe(all_cves) (writing to a dataset does not to be returned)
  • output.write_dataframe([this needs to be a Pyspark DataFrame]) → I would convert all_cves to a pyspark dataframe
  • doublechecking that the while loop is not an infinite loop

Using Preview can help iterate faster through the errors

Let me know if you get stuck!

Hi @mai125 will be sure to try those things! I also wanted to note in case it’s worth mentioning that when I change the body of the function to just be a return statement, create_dataset still fails to build with the same error log message. Do you still think it’s the case that the error is related to the body of the function?

It seems you are not passing in any Source into your compute function? (As well as passing a source rid into the egress policy constructor)

Maybe start fresh from the docs here:
https://palantir.com/docs/foundry/data-integration/external-transforms-source-based//

Because you are using the @transform() decorator, you won’t need to return anything, just make sure you write the dataframe in the output: output.write_dataframe(all_cves)

Hi @mai125 thanks for the advice! Update I’ve followed your suggestions but am still getting the same error. Even if the body of the function contains a single line of code like ‘test_var = 1’ I still get the same error, which makes me think that the issue may be related to configuration of resources rather than the code content? What do you think?

Can you share your code? Does it work in preview?

Hi @mai125 update it works now! The issue was with the policy RID that I was using as the argument for egress in the @use_external_systems decorator; a combination of that and you suggestions made my build run successfully. Thank you so much!

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.