Conversion DOCX to PDF & Excel conversion (docx2pdf/pywin32)

What I’m trying to do:
I’m trying to convert my DOCX or XLSX files to pdf

For PDF:

  #some example
  docx_file = app_tables.projects_files.get(file_name='20230529-14-39-32-Assays.xlsx')
  file = docx_file['file']
  pdf_file = docx2pdf.convert(file)
  pdf_media = anvil.media.from_buffer(pdf_file, "application/pdf")
  #save in new column
  docx_file.update(pdf=pdf_media)
  • The problem I’m facing is, that the file is lazy media. How can I convert it to just normal file in order to use with docx2pdf? Or maybe you know the better way to do the ocnversion?

For Excel:

  • For excel conversion I’m facing little different problem. I can’t install pywin32:
    I’ve tried 4 latests version of pywin32. Nothing seems to work.

ERROR: Could not find a version that satisfies the requirement pywin32==306 (from versions: none) ERROR: No matching distribution found for pywin32==306

Error: Build failed

pywin32 is a library that lets you access the microsoft OS API and many of the windows COMs.

I don’t think that Anvil is running on a windows operating system, so there would be no windows API’s for it to interact with.

If you have used docx2pdf in the past, was it on a windows machine and had Microsoft Word installed along side it? I don’t know much about it but if it is using the windows API it might just be a controller wrapper for python, that uses Word to process the document?

These are all just guesses about docx2pdf though.

I skipped the actual meat of your question sorry, if your docx2pdf.convert(file) 's convert function can use a file handler which takes a “file like object” you can instead rewite it like this:

from io import BytesIO

file = docx_file['file']
pdf_file = docx2pdf.convert(BytesIO(file.get_bytes()))

You are getting the bytes from your lazy media object, and wrapping them in a “fake” file in memory for the program to interpret as a file.

2 Likes

For the excel part of your question, I would reccomend looking into using openpyxl to read the spreadsheet and then another pdf library like PyPDF2 (my personal favorite).

1 Like

I will use cloudconvert to convert and connect it with the app. All conversion method without using Microsoft are looking terrible.