FileNotFound error when reading file from dataset

Hi, I’m trying to unzip a file in a python transform, but running into a FileNotFound error when reading from my input file. The input file here (RMK_TIE.bcp) is pulled from an SFTP sync into a dataset (rmk_tie_bcp), and both the input file name and input dataset name are throwing FileNotFound errors when I try to preview the transform, even though I was able to select the input dataset under Input. Not sure what I’m missing here, maybe someone else might have some thoughts?

Code to unzip file:

def compute(foundry_input, foundry_output, ctx):
        input_fs = foundry_input.filesystem()
        output_fs = foundry_output.filesystem()
        input_path_bcp = "RMK_TIE.bcp"
        output_path = "rmk_tie_unzipped"

        with tempfile.NamedTemporaryFile() as temp:
            try:
                with input_fs.open(input_path_bcp, 'rb') as newest: # Line throwing error
                    shutil.copyfileobj(newest, temp)
                    temp.flush()

                    with output_fs.open(output_path, 'wb') as out:
                        input_file = input_fs.open(foundry_input.path)
                        data = input_file.read(CHUNK_SIZE)
                        while data:
                            out.write(data)
                            data = input_file.read(CHUNK_SIZE)

Are you sure that RMK_TIE.bcp is the full path of the file in the input dataset (i.e., it’s not something like /a/b/c/RMK_TIE.bcp)?

A few other things that stand out to me:

  • You’re creating a temporary file, but you never read it. Where you call input_fs.open(foundry_input.path), perhaps you mean open(temp.name) ? In fact, I’m pretty sure that input_fs.open(foundry_input.path) is going to throw an error, since foundry_input.path is not actually the path to a specific file inside the input filesystem.
  • There is no code that actually unzips the file (but perhaps you just haven’t implemented that aspect yet).
  • You can just use shutil.copyfileobj(input_file, out) instead of doing a for-loop at the end (this is just a minor note).

I was able to view all the files using list(input_fs.ls), and pull the path name from that command. Interestingly, the path from ls matched my path string exactly (as seen in the debugger), but the code now works when I use that path instead of hard-coding it.

Unfortunately I’m now getting zipfile.BadZipFile: File is not a zip file, as it looks like zipfile can’t unzip .Z files (or at least doesn’t recognize them as zip files).

https://stackoverflow.com/a/65066668/3652805

Looks like there are other Python packages available.
Alternatively, you could build a custom container which makes the unix unzip binary available and use a lightweight, container backed transform.

I was able to unzip my .Z file using the unlzw3 package, thanks!