Automated Web Scraping?

shaun · November 4, 2019, 2:57pm

You can use anvil.http.request to get the HTML of a web page. There are many ways to parse the HTML once you have it as a string - I recommend BeautifulSoup.

I built a search engine using Anvil that does just that, here’s a blog post about it. It crawls the web by using BeautifulSoup to get the href of each <a> tag and requesting the page at that URL.

(BeautifulSoup is available in Server Modules on individual tier and above, or on the free tier you can run it from an Uplink script.)