Sanitize HTML to prevent XSS and Javascript injection

tomka · March 14, 2020, 7:56pm

Hi, this might be useful for other people who are dealing with possibly unsafe HTML input by users:

Anvil supports lxml if you’re able to use the “Full Python” server setting:

from lxml.html import clean
html_with_evil_script = "<h1>Headline</h1><script>alert('evil!')</script>"
cleaner = clean.Cleaner()
sanitized_html = cleaner.clean_html(html_with_evil_script))
print(sanitized_html)

Options:
https://lxml.de/api/lxml.html.clean.Cleaner-class.html

robert · March 15, 2020, 3:48am

To give a little background on when this can/should be used, when would you recommend using this? For all text input by a user? Does it only apply to HTML input?

I’m familiar with these concepts and security implications but unsure about specific recommendations for Anvil.

tomka · March 25, 2020, 5:01pm

Hi Robert, sorry for the late reply! I think it’s really only necessary if you get HTML input from your users, not basic text. @meredydd says a bit more about this topic here: Sharing a 'page' (form with generated content) to Facebook

PS: I think you built some really interesting and useful apps with Anvil!

robert · March 26, 2020, 1:14am

Gotcha. Thanks for contributing to keeping everyone’s apps secure. It’s something we should all keep in our minds.

I’m hoping to keep building interesting apps!