url = ‘http://export.arxiv.org/api/querysearch_query=all:’+keywordsToString+‘&start=0&max_results=2’
data = urllib.request.urlopen(url)
to download PDFs from ArXiv, which I intend to save to a Media Set. I’ve got this working on my local computer, but it seems to timeout on Foundry. Is there a different way I should be doing this
I was kind of getting the same issue, but with another media set download. I think it is due to Foundry’s network restrictions on the way external APIs handle long-running requests. Here are some things I tried:
1/ Handle Large Files Asynchronously
If the request involves downloading large files (e.g., PDFs), Foundry’s pipelines or workflows can handle this more efficiently by just making these downloads batch processes.
You could extract metadata from the ArXiv API first and store the URLs in a dataset, then use a separate process to download PDFs if necessary.
2/ Improving Timeout Settings
If you’re tied to using Python scripts within Foundry, libraries like requests are more robust than urllib for handling timeouts and retries.
Sorry I can’t be of more help, this is as far as my knowledge reaches in this topic. Hope this does provide a clearer picture. Let me know if this helps or if you solved it - would love to know what the solution ended up being.