Daily Python code to get MLB Statcast data in Foundry

I am working on a project that gathers MLB statcast data through a python script. My vision for this is to combine all of the data in pipeline builder then create a finalized dataset that I build a dashboard with through workshop.

Below is my current pipeline with MLB data up to 6/29/2025:

To start the project I gathered all of the data from the MLB season up to this point and uploaded it into the dataset in the screenshot titled “Statcast Data from Openi…”, performed some transformations on it, joined a table to convert player IDs to names, then ended up with a finalized dataset that can be seen to the far right of the screenshot.

How do I integrate a daily python script that will pull the MLB statcast data from yesterday, send it through my pipeline builder to be properly transformed, then added to the bottom of my finalized dataset that is seen in the far right of the screenshot?

I have that script that and if I run it in Google Colab it spits out a .csv, but I don’t want to have to do that everyday

Thanks!

Hi @Edge - this sounds like a good usecase for External Transforms. The docs linked are pretty comprehensive and walk through an e2e example of how to ingest data from an API using a Python code repository. Once you have this set up you can set up schedules to trigger the build-frequency you want. I also find this doc on scheduling best practices useful. Hope that helps!

Thanks! Let me get back to you and see if I have any luck with that method.