Excel file type

chicho_knwe1 · March 19, 2021, 10:23am

I have a quick question about excel file types.

In the screenshot, you can see that I upload one .xlxs, which is the standard excel file and the type is “application/vnd.openxmlformats-officedocument.spreadsheetml.sheet”.
The second and third examples are CSV files where the type is “application/vnd.ms-excel”

However, in the documentation Anvil Docs | CSV and Excel import, it suggests that a CSV file should be of type “text/csv” and the normal excel file should be of type “application/vnd.ms-excel”(In my solution I also use pandas together with anvil.media and the full Python 3 version). Am I doing something wrong here?

gegeclarinette · March 29, 2022, 8:06am

Hello
Did you find a solution
i have the same problem with a csv file recognized as a vnd.ms-excel and that i could not treat with pandas

that’s what i got trying tu run the exemple in the documentation with the color.csv file
https://anvil.works/docs/data-tables/csv-and-excel

The file’s content type is: application/vnd.ms-excel

ValueError: Excel file format cannot be determined, you must specify an engine manually.

at /usr/local/lib/python3.7/site-packages/pandas/io/excel/_base.py:1196
called from /usr/local/lib/python3.7/site-packages/pandas/io/excel/_base.py:364
called from /usr/local/lib/python3.7/site-packages/pandas/util/_decorators.py:311

robert · March 29, 2022, 9:40am

The newer xlsx file type is really just an XML file. CSV files can also be opened by Excel, and Excel will sometimes change the MIME type to it’s vendor-specific (vnd prefix) MIME type. Because you know…Microsoft.

You aren’t doing anything wrong. This is a larger issue in the tech world. This StackOverflow post shows that this question can have many answers.

If you want to be “more correct” with your CSV files, definitely always save them with a “text/csv” type. The data people in your life will thank you.

robert · March 29, 2022, 9:48am

@gegeclarinette Welcome to the community!

In the tutorial you linked the if statement will try pd.read_excel if the MIME type is not text/csv.

    if file.content_type == 'text/csv':
      df = pd.read_csv(file_name)
    else:
      df = pd.read_excel(file_name)

To be honest, I don’t trust MIME types between CSV and Excel. I would use the file extension (CSV vs xls or .xlsx). As you have seen a CSV file can show the wrong MIME type. Blame Microsoft for this.

You could change your code to this and it might work better. It’s certainly not elegant but it might help with your issue.

    if file.name.split(".")[-1] == 'csv':
      df = pd.read_csv(file_name)
    else:
      df = pd.read_excel(file_name)

This of course assumes the person uploading the file named the file correctly. This is not a reliable assumption, but it should work more often than not.

owen.campbell · March 29, 2022, 10:14am

Using the EAFP principle, I’d use a try/except:

try:
    df = pd.read_excel(file_name)
except ValueError:
    df = pd.read_csv(file_name)

gegeclarinette · March 29, 2022, 10:57am

Thanks for your answer
It finaly worked
Just have to teach my users to name their file correctly!

owen.campbell · March 29, 2022, 1:04pm

With the tweak I suggested, you don’t have to care about the file name.

p.colbert · March 29, 2022, 3:23pm

Looking at one of my .xlsx files, the first two bytes are ‘PK’, indicating that it is a zip file. In fact, it opens with 7zip (et al), showing a substantial substructure of folders, containing subfolders and files, only some of which have an xml extension.

Attempting to read an .xlsx as an xml file (e.g., with Python’s standard library) does not look like it’s going to work. But there are other, third-party Python libraries that can read it. Some of them can be found here: Anvil’s List of Packages

robert · March 31, 2022, 4:33am

Sorry should have expanded. If you dig into that ZIP file you will find the workbook XML located in there and also some metadata that’s also XML.

Here is the spec for an XLSX file transitional format from the Library of Congress. It describes the structure in fairly untechnical language. XLSX Transitional (Office Open XML), ISO 29500:2008-2016, ECMA-376, Editions 1-5).