Based on spark documentation (https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.mllib.evaluation.RegressionMetrics.html)
Mean Absolute Error should return a single value which I should be able to put into a dataframe (see below, and screenshot). but I’m getting the error in the screenshot. Any idea why this is happening?
@transform(
mae_output =Output("ri.foundry.main.dataset.496ab2ea-6f7f-4caf-b0de-3f803726d21c"),
metrics =Input("ri.foundry.main.dataset.9d456872-1e36-40dd-9f0f-599ffe40b12f"),
)
def compute(
ctx, metrics, mae_output
):
# metrics = metrics.dataframe()
predictionAndObservations = ctx.spark_session.sparkContext.parallelize([(2.5, 3.0), (0.0, -0.5), (2.0, 2.0), (8.0, 7.0)])
metrics2 = RegressionMetrics(predictionAndObservations)
schema = T.StructType([
T.StructField("MAE_metric", T.IntegerType(), True)])
MAE_df = ctx.spark_session.createDataFrame(data=int(metrics2.meanAbsoluteError), schema = schema)
mae_output.write_dataframe(MAE_df)