A client of mine recently sent a zipped archive of files which needed to be extracted to a data table.
I didn’t want to use the file system and I also didn’t want the entire archive extracted within memory. Here’s what I came up with - a generator function to yield each zipped file as a media object:
import io
import zipfile
def get_zipped_files(media):
"""A generator of files contained within a zip archive
Parameters
----------
media : anvil.media instance
Returns
-------
anvil.BlobMedia instance
"""
with zipfile.ZipFile(BytesIO(media.get_bytes())) as zipped:
for zipinfo in [zi for zi in zipped.infolist() if not zi.is_dir()]:
name = zipinfo.filename.split("/")[-1]
with zipped.open(zipinfo) as file:
yield anvil.BlobMedia(
content_type="application/pdf", content=file.read(), name=name
)
In my case, each of the files is a PDF, so I could hard code the content type. They were also in a directory structure, but I only want the files themselves.