Hi,
I am following the code repository guide how to transform data. I am stuck on the output part. I follow the instructions, but i think something goes wrong at the step
Extend the path by a /claims. This will be your newly created dataset originating from this transform.
As of right now I cant preview anything and I just keep getting vague errors.
The correct code looks like this:
from pyspark.sql import functions as F
from pyspark.sql import types as T
from transforms.api import transform_df, Input, Output
@transform_dftransform_dftransform_dftransform_df(
Output(“ri.foundry.main.dataset.610357e0-43ee-481d-8e94-7140b8d9b2c4”),
claims_raw=Input(“ri.foundry.main.dataset.b5e0d3a7-edfb-4ee2-b898-0d4084c57ee5”),
)
def compute(claims_raw):
Cast the “date” column from string to date
claims = claims_raw.withColumn(“date”, F.regexp_replace(“date”, “###”, “”).cast(T.DateType()))
# Filter out declined claims
claims = claims.filter(F.col("is_accepted") == True)
return claims
My code looks like this:
from pyspark.sql import functions as F
from pyspark.sql import types as T
from transforms.api import transform_df, Inp@transform_dft, Output
@transform_df(
Output(“/Kuusko-7c629b/learning kuusko/Code Repo Training/data/prepared/claims”),
claims_raw=Input(“ri.foundry.main.dataset.9783569a-ea28-4515-b34d-a1ceb9ad1f5a”),
)
def compute(claims_raw):
# Cast the “date” column from string to date
claims = claims_raw.withColumn(“date”, F.regexp_replace(F.col(“date”), “###”, “”).cast(T.DateType()))
# Filter out declined claims
claims = claims.filter(F.col(“is_accepted”))
return claims
See the attached sceen as well.
Anyone could help me out would be greatly appreciated!
Regards,
