[Guide] How to upload files (large) to Foundry using Boto3

arukavina · November 30, 2024, 3:47pm

How to Use the Foundry File Upload Script

This Python script provides a streamlined mechanism to upload large files to Foundry datasets, especially useful when other methods are unavailable. Below is a detailed explanation of its functionality and how to use it effectively.

TL;DR: Code in this Github Repo

Overview of the Script

The script:

Uploads files from a specified directory to a Foundry dataset.
Tracks already uploaded files to avoid redundant uploads.
Uses environment variables for configuration.
Leverages Foundry’s API and an S3 client for file transfer.

Prerequisites

Before using the script, ensure the following:

Python Environment: Install Python 3.8 or higher.
Required Libraries: Install the following Python packages:
```
pip install foundry-dev-tools urllib3 tqdm boto3
```
Environment Variables: Set up the required environment variables:
- FOUNDRY_TOKEN: Foundry access token.
- FOUNDRY_HOST: Foundry host URL.
- INPUT_PATH: Path to the directory containing the files to upload.
- TARGET_DATASET_RID: Resource ID of the target Foundry dataset.

Configuration

Environment Variables

The script depends on the following environment variables:

FOUNDRY_TOKEN: Your Foundry access token for authentication.
FOUNDRY_HOST: The Foundry instance URL.
INPUT_PATH: Directory containing files to be uploaded.
TARGET_DATASET_RID: The resource ID of the target dataset in Foundry.

Use a .env file or export the variables in your shell session:

export FOUNDRY_TOKEN="your_token"
export FOUNDRY_HOST="your_host"
export INPUT_PATH="/path/to/your/files"
export TARGET_DATASET_RID="your_dataset_rid"

Code Walkthrough

Libraries and Imports

The script imports several libraries for its functionality:

foundry_dev_tools: Interacts with Foundry.
contextlib: Manages resources (file uploads).
urllib3: Handles HTTP requests.
os and Path: Work with filesystem paths.
tqdm: Displays progress bars for uploads.
json: Reads and writes JSON files to track uploaded files.

Warning Suppression

urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning)

This suppresses warnings related to insecure requests (useful for self-signed certificates).

Directory and File Handling

The script processes files from the directory specified by the INPUT_PATH environment variable. It filters files by their extension (default: .rpt).

DIRECTORY = Path(INPUT_PATH)
FILE_EXTENSION = ".rpt"

Upload Tracking

The script tracks uploaded files using a JSON file (uploaded_files.json) in the target directory:

load_uploaded_files(): Loads the list of already uploaded files.
save_uploaded_files(uploaded_files): Saves the list after successful uploads.

Upload Process

The main upload process:

Iterates over files in the specified directory.
Skips files already marked as uploaded.
Uploads eligible files to Foundry using a Foundry S3 client.

Uploading Files

The upload_file_to_foundry function handles file uploads:

@contextlib.contextmanager
def upload_file_to_foundry(ctx, file_path):
    boto3_client = ctx.s3.get_boto3_client(verify=False)
    file_size = file_path.stat().st_size
    path_in_dataset = file_path.name

    with tqdm(total=file_size, desc=path_in_dataset, unit="B", unit_scale=True) as pbar:
        boto3_client.upload_file(
            str(file_path), TARGET_DATASET_RID, path_in_dataset, Callback=pbar.update
        )

Error Handling

The script handles exceptions during uploads:

except Exception as e:
    print(f"Failed to upload {file.name}: {e}")

This ensures the script continues processing remaining files even if an upload fails.

Final Output

The script prints the list of successfully uploaded files at the end:

print("Successfully uploaded files:")
for uploaded_file in uploaded_files:
    print(uploaded_file)

How to Use the Script

Clone the repo:

git clone https://github.com/arukavina/foundry_upload.git
cd foundry-file-upload

Set up the required environment variables.
Place your target files in the directory specified by INPUT_PATH.
Run the script:
```
python upload_files.py
```
Monitor the progress bars for each file being uploaded.
Review the uploaded_files.json file to track uploaded files.

Notes

Ensure that the FOUNDRY_TOKEN and FOUNDRY_HOST are correct to avoid authentication issues.
The script skips files already listed in uploaded_files.json.
Modify FILE_EXTENSION to target a different file type if needed.

By using this script, you can efficiently upload large volumes of data to Foundry, bypassing other upload constraints.

system · January 29, 2025, 3:48pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.