Hello all,
I am new to this community. I have a requirement to create SCD for a few datasets. I am trying to create an empty final dataset so that I can read it from a pipeline and calculate the delta between the new records from another dataset. My question is, can anyone point me to how I can create an empty dataset and read it from code repository. I tried creating a file with just column names and I got an error. Then I tried reading from a dataset with no files I get another error. Is there a way to bypass this?
An example on Slowly Changing dimensions. Slowly changing dimensions allow data to be picked according to the day from the fact
An example would be
ID,Name,PostCode
1,Srinivas,London
2,Shankar,Chennai
This will be loaded as
ID,Name,PostCode,valid_from,valid_to,created_datetime
1,Srinivas,London,29/04/2025,31/12/9999,29/04/2025 00:00:00
2,Shankar,Chennai,29/04/2025,31/12/9999,29/04/2025 00:00:00
if there is an update for one of the records
An example would be
ID,Name,PostCode
1,Srinivas,Brimingham
2,Shankar,Chennai
This will be loaded as
ID,Name,PostCode,valid_from,valid_to,created_datetime
1,Srinivas,London,29/04/2025,03/05/2025,29/04/2025 00:00:00
1,Srinivas,London,03/05/2025,31/12/9999,03/05/2025 00:00:00
2,Shankar,Chennai,29/04/2025,31/12/9999,29/04/2025 00:00:00
same as delete. Another question is, how can i overcome cyclical dependency? because for me to calculate the delta, I need to get a copy of the final dim to calculate the delta