Hi all,
Last week Google open-sourced Magika, their AI-powered file-type identification system which helps detect binary and textual file types.
It’s actually pretty handy and really easy to work with, as it comes with a Python edition we can slot straight into Anvil apps.
Add the package: “magika”
In your sever code:
from magika import Magika
#@software{magika,
#author = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},
#title = {{Magika content-type scanner}},
#url = {https://github.com/google/magika}
#}
def magika_check(bytes):
m = Magika()
result = m.identify_bytes(bytes)
type = result.output.ct_label
return type
All you need to do is pass your file bytes (file.get_bytes) to it!