How to add last Modified Date in code repo?

Hello,

I will like to have a last update date column in my dataset.

Please, i don’t known if it is possible ? Thanks

Hello Elodie,

I’m not entirely sure I completely understand your question but if you are looking to capture the date of the dataset build, you can simply create a constant column that you will set to the current date. In python spark, the snippet would be something like:

from pyspark.sql import functions as F

df = df.withColumn("current_date", F.current_date())

Or if you need the full timestamp:

df = df.withColumn("current_timestamp", F.current_timestamp())

I don’t know what is the reason you need that column but keep in mind you’ll increase the memory footprint for storage of your dataset because you’ll add this for all rows :slight_smile:

Hope that it helps,

Cheers

Nicolas