Deletion of records for Dataset output in CodeRepository

I specified DatasetA as input and implemented a job in CodeRepository that processes and outputs to DatasetB.

I wanted to prepare a separate job to delete records from DatasetB, so I implemented it in another CodeRepository,
However, the following error occurred in Checks, and I found that it is not possible to output from multiple CodeRepository for one dataset.
Alias 'xxx/DatasetB' refers to a dataset that does not belong to this repository. To determine the owner of the dataset, check the job type and source provenance in the dataset job spec. If you wish to transfer ownership to this repository, remove the jobSpec from dataset, run CI on this repository and remove the transformation from the old repository.

We do not want to incorporate the deletion process into the processing because it would complicate the process,
Is there any way to remove specific records from the Dataset?

One dataset can only be produced by one “source” (a data connection, a code repository, a pipeline builder, etc.).

Hence you can’t have two code repositories editing the same dataset. That’s why and what this error message is about.

If your goal is to avoid mixing logics together, you could very well have one code repository that outputs “which rows are to delete” and the other repository simply taking them and “removing them” on the next build.

A -> B (new rows) + C (row you want to delete) -> D (rows you want)

To me this sounds more complex than just adding the logic at B → D.

I assume you might also hit cycle issues, as I assume the rows you want to delete are derived from D (D => C).

In short: adding the logic in the existing transform (B=>D) in the simplest and the most direct way to achieve what you want.

1 Like