Efficient way to iterate through all files in Google Drive

The docs say you can get the files in a folder from the Google Drive service by using folder.files but warn “that this might be slow if there are many files in the folder”. Does this mean this fetches the contents of all of the files in the folder into the user’s browser? Does it recursively get files from subdirectories as well?

Alternately using folder.list_files() presumably just gets the handles, but returns an iterator. Does the data of the file get downloaded as soon as the iterator is used? is list(folder.list_files()) any more or less efficient than folder.files?

Excellent question.

It’s actually simpler than you’re thinking - folder.files simply returns precisely list(folder.list_files()). The reason it might be slower is that folder.list_files() returns a lazy iterator, which only requests the next few files in batches as you iterate. Turning this into a list forces the entire collection to be realised, which might take a long time for a folder with lots of files.

So you can choose which to use based on your use-case. If your folder might have many files, it would be better to use the lazy iterator, but in most cases it’s probably fine to use folder.files.

To answer your other questions:

  • Neither of these techniques are recursive, they only return files in the given folder.
  • File data never gets downloaded until you call get_bytes() on the Media object. In particular, this means you can do things like pass files to and from the server without moving the actual data back and forth. See the documentation for more information about Media types.

Hope that helps!