How to Use the Foundry File Upload Script
This Python script provides a streamlined mechanism to upload large files to Foundry datasets, especially useful when other methods are unavailable. Below is a detailed explanation of its functionality and how to use it effectively.
TL;DR: Code in this Github Repo
Overview of the Script
The script:
- Uploads files from a specified directory to a Foundry dataset.
- Tracks already uploaded files to avoid redundant uploads.
- Uses environment variables for configuration.
- Leverages Foundry’s API and an S3 client for file transfer.
Prerequisites
Before using the script, ensure the following:
- Python Environment: Install Python 3.8 or higher.
- Required Libraries: Install the following Python packages:
pip install foundry-dev-tools urllib3 tqdm boto3
- Environment Variables: Set up the required environment variables:
FOUNDRY_TOKEN
: Foundry access token.FOUNDRY_HOST
: Foundry host URL.INPUT_PATH
: Path to the directory containing the files to upload.TARGET_DATASET_RID
: Resource ID of the target Foundry dataset.
Configuration
Environment Variables
The script depends on the following environment variables:
FOUNDRY_TOKEN
: Your Foundry access token for authentication.FOUNDRY_HOST
: The Foundry instance URL.INPUT_PATH
: Directory containing files to be uploaded.TARGET_DATASET_RID
: The resource ID of the target dataset in Foundry.
Use a .env
file or export the variables in your shell session:
export FOUNDRY_TOKEN="your_token"
export FOUNDRY_HOST="your_host"
export INPUT_PATH="/path/to/your/files"
export TARGET_DATASET_RID="your_dataset_rid"
Code Walkthrough
Libraries and Imports
The script imports several libraries for its functionality:
foundry_dev_tools
: Interacts with Foundry.contextlib
: Manages resources (file uploads).urllib3
: Handles HTTP requests.os
andPath
: Work with filesystem paths.tqdm
: Displays progress bars for uploads.json
: Reads and writes JSON files to track uploaded files.
Warning Suppression
urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning)
This suppresses warnings related to insecure requests (useful for self-signed certificates).
Directory and File Handling
The script processes files from the directory specified by the INPUT_PATH
environment variable. It filters files by their extension (default: .rpt
).
DIRECTORY = Path(INPUT_PATH)
FILE_EXTENSION = ".rpt"
Upload Tracking
The script tracks uploaded files using a JSON file (uploaded_files.json
) in the target directory:
load_uploaded_files()
: Loads the list of already uploaded files.save_uploaded_files(uploaded_files)
: Saves the list after successful uploads.
Upload Process
The main upload process:
- Iterates over files in the specified directory.
- Skips files already marked as uploaded.
- Uploads eligible files to Foundry using a Foundry S3 client.
Uploading Files
The upload_file_to_foundry
function handles file uploads:
@contextlib.contextmanager
def upload_file_to_foundry(ctx, file_path):
boto3_client = ctx.s3.get_boto3_client(verify=False)
file_size = file_path.stat().st_size
path_in_dataset = file_path.name
with tqdm(total=file_size, desc=path_in_dataset, unit="B", unit_scale=True) as pbar:
boto3_client.upload_file(
str(file_path), TARGET_DATASET_RID, path_in_dataset, Callback=pbar.update
)
Error Handling
The script handles exceptions during uploads:
except Exception as e:
print(f"Failed to upload {file.name}: {e}")
This ensures the script continues processing remaining files even if an upload fails.
Final Output
The script prints the list of successfully uploaded files at the end:
print("Successfully uploaded files:")
for uploaded_file in uploaded_files:
print(uploaded_file)
How to Use the Script
-
Clone the repo:
git clone https://github.com/arukavina/foundry_upload.git cd foundry-file-upload
-
Set up the required environment variables.
-
Place your target files in the directory specified by
INPUT_PATH
. -
Run the script:
python upload_files.py
-
Monitor the progress bars for each file being uploaded.
-
Review the
uploaded_files.json
file to track uploaded files.
Notes
- Ensure that the
FOUNDRY_TOKEN
andFOUNDRY_HOST
are correct to avoid authentication issues. - The script skips files already listed in
uploaded_files.json
. - Modify
FILE_EXTENSION
to target a different file type if needed.
By using this script, you can efficiently upload large volumes of data to Foundry, bypassing other upload constraints.