We are considering using Sync to import files from S3 to a dataset and then process the dataset in
CodeRepository.
If the file to be imported does not exist in S3, the contents of
dataset is empty and the build of CodeRepository fails with the following error.
>Failed to resolve dataset properties for input datasets
Even if the dataset contents are empty, what measures should be taken to load the dataset
?
Is this an incremental build? If so, I’d recommend that you abort the transaction so that the build is not marked as fail.
If not, what do you want to do here? Should the job complete successfully and return an empty df?
Sync, which imports files from S3, is Append, and CodeRepository processing using that data set is incremental build.
It is possible to have a situation where the file to be linked does not exist at the first execution and the dataset is not built, but even in this case, we would like to output (build) an empty dataset so that it can be read as an input dataset in the @transform function of a subsequent job.
1 Like
I believe that the issue here is that the input dataset has no transactions. You should be able to commit an initial empty transaction with the Create Transaction and Commit Transaction API endpoints. This will allow downstream transforms to run without encountering the “Failed to resolve dataset properties for input datasets” error.