Rest API connection - Pyspark + RDD in foundry?

smpena · August 9, 2024, 8:54pm

Hello,

Is something like this possible in foundry so I don’t have to loop through a json object?

‘’'req = requests.get(url, headers={‘Accept’: ‘application/json’, ‘Authorization’: ‘Basic aaa’})
column =
gettots =req.json()
req2 = req.content.decode(‘utf-8’)
rdd = sc.parallelize([req2])
jsonDF = spark.read.json(rdd)

df_expl = jsonDF.withColumn("explodedarray1",F.explode(jsonDF.issues))


columns.append(df_expl)'''

sandpiper · August 10, 2024, 2:56am

Yes, this is possible in Transforms Python. Here is a minimal example that covers the use of SparkContext.parallelize and SparkSession.read.json:

from transforms.api import transform_df, Output


@transform_df(
    Output("<output_path_or_rid>"),
)
def compute(ctx):
    spark_session = ctx.spark_session
    json_strings = [
        '{"name": "A", "age": 5}',
        '{"name": "B", "age": 7}'
    ]
    string_rdd = spark_session.sparkContext.parallelize(json_strings)
    return spark_session.read.json(string_rdd)

Regarding the special ctx parameter, see https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api/#parameters-3. For details of how to do an external API call from a transform, see https://www.palantir.com/docs/foundry/data-integration/external-transforms/ or https://www.palantir.com/docs/foundry/data-integration/external-transforms-source-based/.

smpena · August 21, 2024, 10:31pm

Thx!! This works nicely