Automated Web Scraping?

Has anyone done any webscraping, either client or server side? Wondering best practices and if scheduling webscraping is possible in the server side.

1 Like

Hello and welcome.

Yes, scheduled webscraping is possible depending on the specific implementation.

If you read the suggestions from Anvil staff in this post, it might help to guide your choices.

2 Likes

You can use anvil.http.request to get the HTML of a web page. There are many ways to parse the HTML once you have it as a string - I recommend BeautifulSoup.

I built a search engine using Anvil that does just that, here’s a blog post about it. It crawls the web by using BeautifulSoup to get the href of each <a> tag and requesting the page at that URL.

(BeautifulSoup is available in Server Modules on individual tier and above, or on the free tier you can run it from an Uplink script.)

2 Likes

Hi @shaun , I’m planning to use scrapy. do you think it will work with anvil?