Pipeline Builder - Webscraping

Using the pipeline builder, is there a way to include a jupyter notebook in the pipeline workflow? I would like to use python to retrieve data from a website and then pass that into the pipeline on a recurring schedule.

Would the correct way to implement this be an external webhook.

3 Likes

Hey!

You should be able to create a dataset as an output in Code Workspaces, and then use that as an input in Pipeline Builder. You can also use external transforms for this, if you just want to do pure Python (without Jupyter Notebooks).

Please also see the FAQ for more information on how to allow your notebook to make external API calls.

-Eirik

4 Likes

Hello,

While it’s not currently possible to include a jupyter notebook in pipeline builder, you should be able to use python UDFs to make a call to an external python function from within pipeline builder.

Here’s our documentation on importing a UDF into pipeline builder and creating a python UDF / function.

Creating a python function that can make external calls is a bit tricky. But the setup should be similar to creating an external transform in code repositories.

3 Likes

Hi, I’m also trying to accomplish a similar task, but the website I’m trying to access is not a REST API. In that case, how would I configure the source type in Data Connection?