Add new objects when csv file is dropped into folder

robind · March 31, 2025, 6:43pm

I want the user to be able to drop a new csv file into a folder in Foundry via Workshop (via file uploader widget). I want code repo to listen to this folder and when it detects a new csv file, it processes it and adds the new objects to the object type. It does not have to be code repo, but I figured this would be the best way. Is this possible? AIP Assist keeps sending me down the wrong tracks.

sandpiper · April 1, 2025, 2:20am

In the Media Uploader widget’s upload destination configuration, you should use “Dataset” (with a static dataset configured) instead of “Folder.”

A dataset is conceptually very similar to a folder - it can contain a collection of arbitrary files, including multiple CSVs. If the CSVs all have the same structure, you can set a schema on the dataset, but it’s also fine to have a schemaless dataset containing CSVs with different structures, as long as you can express appropriate logic to process them in a code repository (see the documentation on reading and writing unstructured files for examples).

You can then configure a schedule with a data updated trigger to build a downstream dataset that backs an object type whenever the dataset of CSVs is updated.

ruanr · April 9, 2025, 3:27pm

Hi team,
Thank you for the explanation, I’m new to building custom interfaces and hope to take the opportunity on this similar post to see where I’ve gone wrong in my csv uploader.

I have a similar requirement and am stuck in a loop with how to implement the above.
I require the functionality to upload a csv file within a workshop. The workshop is a mixture of distinct data for a business and eventually multiple businesses, and the employees etc that develop into a historic timeline based dataset.
I’ve split these in the pipeline builder with datediff functions etc, and that’s fine, but I need to be able to add instances of the distinct metrics at any given point via csv, multiple at a time or single values.

The csv I want to ‘inject’ somewhere in the pipeline so that I can union it with this distinct dataset (name, surname, age, manager, start date, end date, etc…) after it’s been checked for duplicates and other logic errors. The uploader should ideally capture the username uploading the file and the date and apply it to all the records being upoaded.

I’ve been having issues where, if I choose create or edit on the ontology actions, I either get only the username and date in an otherwise empty schema and separately have the rest of the data from the csv or I get errors about values and formats not matching, or the frustrating permission issue.

As it stands, I’ve recreated the 31 column schema as a blank dataset in the pipeline builder, which includes the upload username and date (should it?) and am trying a ‘create or modify’ action on this dataset (created via pipeline builder).

How do I set up the ontology and/or pipeline to choose that the data should come from the csv into my blank dataset and the 2 columns for the uploader metadata should come from the user input below the uploader?

Any tips welcome, ideally I want something simple that operates like this and gives me access to the data immediately in the pipeline so it can refresh in the workshop.

sandpiper · April 14, 2025, 10:55pm

You’ll need two datasets; a dataset of CSVs and a dataset (and associated object type) that represents “CSV Upload Events” containing the path of the uploaded file (this could be the primary key in the object type definition), uploaded user, and uploaded date (though I would recommend timestamp instead of date).

In the Media Uploader widget’s “Upload” configuration, you would specify the dataset of CSVs. In the “Output” configuration, you would specify an Action that takes as input a file identifier (this will be the path of the file in the CSV dataset), user ID, and date/timestamp and creates a “CSV Upload Event” object. The Media Uploader widget allows you to “wire up” the file identifier as a default parameter value.

In your pipeline, you would join the dataset of CSVs with a materialization created from the CSV Upload Event object, using file path as the join key. By doing so, you’ll have the columns from the CSV alongside the user and date/timestamp information.

At some point in the future, the recommended implementation here would be to use a Media Set instead of a Dataset since Media Sets have better support for concurrent uploads from many users. However, functionality for Media Sets containing CSV files hasn’t been released yet, so a Dataset is the best option for now.

Alternative implementations might be to parse the data in the CSV as part of the action (possible with function-backed actions when uploading the CSV as an Attachment or to a Media Set). This will get the data in the Ontology right away and allow you to show an error to the user in real-time if there is an issue with the file. However, if there is complex data-validation logic that requires deduplicating the data in the CSV with the data in other uploaded files, that might be difficult to manage with a real-time processing implementation.

system · June 13, 2025, 10:55pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.