Has anyone done any webscraping, either client or server side? Wondering best practices and if scheduling webscraping is possible in the server side.
1 Like
Hello and welcome.
Yes, scheduled webscraping is possible depending on the specific implementation.
If you read the suggestions from Anvil staff in this post, it might help to guide your choices.
2 Likes
You can use anvil.http.request
to get the HTML of a web page. There are many ways to parse the HTML once you have it as a string - I recommend BeautifulSoup
.
I built a search engine using Anvil that does just that, here’s a blog post about it. It crawls the web by using BeautifulSoup to get the href
of each <a>
tag and requesting the page at that URL.
(BeautifulSoup is available in Server Modules on individual tier and above, or on the free tier you can run it from an Uplink script.)
2 Likes