Creating and manipulating pdf files via PyPDF2 and FPDF

neriksen85 · February 1, 2018, 10:26pm

Normally I don’t have any trouble using PyPDF2 and FPDF when generating/creating combined pdfs. However, in Anvil I simply cannot figure out how the “file type” when working with pdf files in memory in FPDF and PyPDF2 when importing/exporting to/from google drive. I have tried several combinations and now I am feeling pretty daft, so I am reaching out to the community. The code is in a server module and called from the client.

import io
import PyPDF2
from fpdf import FPDF

# trying PyPDF2 
@anvil.server.callable
  def test_pdf_read():
  file = app_files.test_app.get("document.pdf")
  pdfFileObj = open(file.get_bytes(), 'rb')

  # the issue is getting the file object to be accepted by the PdfFileReader()
  pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
  print(pdfReader.numPages)


# trying FPDF
@anvil.server.callable
def make_pdf():
  pdf = FPDF()
  pdf.add_page()
  pdf.set_font('Arial', 'B', 16)
  pdf.cell(40, 10, 'Hello World!')
#      byte_string = pdf.output(dest='S').encode('latin-1')
  byte_string = pdf.output(dest='S')
  new_pdf = app_files.test_app.create_file("new_file.pdf", byte_string)

Any help will be greatly appreciated.

EDIT 1: After rereading the FPDF docs I have changed the above code that should making a pdf via FPDF. In order to output the pdf correctly it should be encoded with:
.encode(‘latin-1’)
https://pyfpdf.readthedocs.io/en/latest/reference/output/index.html
but when I include the encoding in the commented line, the server throws the following error:

anvil.server.ExecutionTerminatedError: Server code exited unexpectedly: e82d9d6fd5

daviesian · February 19, 2018, 4:53pm

Hi Niels,

The following examples work for me in Python 3:

# trying PyPDF2 
@anvil.server.callable
def test_pdf_read():
  pdf_bytes = app_files.foo.get_bytes()

  # Create a BytesIO object that can be used directly as a file
  file = io.BytesIO(pdf_bytes)
  pdfReader = PyPDF2.PdfFileReader(file)
  print(pdfReader.numPages)
  
# trying FPDF
@anvil.server.callable
def make_pdf():
  pdf = FPDF()
  pdf.add_page()
  pdf.set_font('Arial', 'B', 16)
  pdf.cell(40, 10, 'Hello World!')
  byte_string = pdf.output(dest='S').encode('latin-1')

  # Create an Anvil Media object from the encoded bytes.
  pdf_media = BlobMedia("application/pdf", byte_string, name="new_file.pdf")
  # Create a Google Drive file from the Media object
  app_files.folder.create_file("HelloWorld.pdf", pdf_media)

I do get an encoding error if I try the .encode('latin-1') function call in Python 2, but I didn’t investigate that further. If you have a requirement for Python 2, let me know.

Does that solve your problem? Let me know if anything remains unclear.

- Thanks, Ian.

neriksen85 · February 19, 2018, 9:27pm

Thank’s for the description, Ian. However, in order to fully close the circle, could you also give a very short example of how to implement the PyPDF2.PdfFileWriter(), which is needed in order to overlay one pdf file onto another? I have entered some code below that doesn’t work for me.

  pdf1Reader = PyPDF2.PdfFileReader(io_file)
  pdf2Reader = PyPDF2.PdfFileReader(io_file2)
  pdfWriter = PyPDF2.PdfFileWriter()

  for pageNum in range(pdf1Reader.numPages):
    pageObj = pdf1Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)
  for pageNum in range(pdf2Reader.numPages):
    pageObj = pdf2Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

  pdfOutputFile = io.BytesIO()
  pdfWriter.write(pdfOutputFile)
  
  pdfOutputFile.close()
  io_file.close()
  io_file2.close()

I have gone through the documentation for the io module a couple of times without being able to get the full grasp of it. If anyone reading this knows of some litterature/articles that better helps me understand io.BytesIO I would greatly appreciate a link.

Thanks

navigate · February 20, 2018, 3:37am

This is a bit of a hasty response, but

Ensure you aren’t getting tripped up by an encrypted file
PdfFileReader may need an ‘rb’

To slightly modify Daviesian’s code:

def test_pdf_read():
    pdf_bytes = app_files.foo.get_bytes()
    #Create a BytesIO object that can be used directly as a file

    memoryFile = io.BytesIO(pdf_bytes)
    pdfFileObj = PyPDF2.PdfFileReader(memoryFile,‘rb’)
    if not pdfFileObj.isEncrypted:
       print(pdfReader.numPages)
    else:
        print(‘error!’)

daviesian · February 20, 2018, 10:24am

Hi Niels,

Your code very nearly works as-is, the only thing you need to tweak is the use of BytesIO for writing. In particular, you need to call seek(0) on it before you can read its contents. Here is an example which joins two PDF app files:

@anvil.server.callable
def join_pdf():
  file1 = io.BytesIO(app_files.foo1.get_bytes())
  file2 = io.BytesIO(app_files.foo2.get_bytes())
  
  pdf1Reader = PyPDF2.PdfFileReader(file1)
  pdf2Reader = PyPDF2.PdfFileReader(file2)
  pdfWriter = PyPDF2.PdfFileWriter()

  for pageNum in range(pdf1Reader.numPages):
    pageObj = pdf1Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)
  for pageNum in range(pdf2Reader.numPages):
    pageObj = pdf2Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

  out_file = io.BytesIO()
  pdfWriter.write(out_file)
  out_file.seek(0)
  out_file_media = BlobMedia("application/pdf", out_file.read(), name="out.pdf")

  app_files.folder.create_file("out.pdf", out_file_media)

I hope that helps - I think we’ve now covered all the angles. Just let me know if there’s anything else you need.

neriksen85 · February 20, 2018, 3:37pm

It now works for me when combining FPDF, PyPDF2 (PdfFileReader and PdfFileWriter). @navigate thank’s for the heads up on checking for encryption. Regarding the ‘rb’ (read binary) parameter in the PyPDF2.PdfFileReader of your comment:

pdfFileObj = PyPDF2.PdfFileReader(memoryFile,‘rb’)

I encountered some minor issues with font etc. in the output file. However these issues went away and resulted in a perfect output pdf, when I removed the parameter, ‘rb’:

pdfFileObj = PyPDF2.PdfFileReader(memoryFile)

nameispalmer · June 7, 2018, 9:21am

I had no experience with PyPDF nor FPDF before have started sorting out with Anvil. Then I needed to optimize a simple website in order to make it possible for users to open up specific PDF forms, fill and sign them. I connected FPDF but didn’t figure out what the error actually I saw. I sorted that out in a dead simple fashion, adding links to this editor instead https://pump-it-up-job-form.pdffiller.com/ and tracking its metrics. But nevertheless to process this thing on the website only would be a better option, so thanks to all for these explanations

shaun · February 24, 2020, 5:57pm

Update - we’ve made it much simpler than using PyPDF2 or FPDF.

You can now create PDFs from any Form by running:

media_object = anvil.pdf.render_form('MyForm')

The return value is a media object, meaning you can download it:

anvil.media.download(media_object)

as well as attaching it to an email, storing it in a Data Table, displaying it in an Image…

Here’s a step-by-step build of a PDF invoice generator. To find details of how to use the PDF feature, here’s the PDF section of the reference docs.

kmichael · May 22, 2021, 10:33am

Hey guys,

I have a small query on a different version of the problem mentioned here. Is there any smooth process to merge two PDFs where 1 pdf is a media_object generated within Anvil and the other pdf is stored in Anvil’s data table?

Would @daviesian 's join_pdf work by itself or is there another approach I’m missing?

rickhurlbatt · May 22, 2021, 2:22pm

PyPDF2 can definitely do this with the merger class. I have done this in Anvil and it works well. You will need to load both PDFs in to the before merging.

Just be aware of server runtimes as large docs require a bit of time. Best to do this in a background task. Check out the pypdf2 docs.

https://pythonhosted.org/PyPDF2/PdfFileMerger.html

@admins please split topic if required