Accessing saved file in a table with a path

agiveon · July 30, 2023, 9:38pm

I am trying to build a chatGPT based Q/A bot.

One stage in the process is uploading a file to get indexed.

I got the upload part, but the indexing part (I use textract package) requires a path to the file. Here is the error message I get:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not LiveObjectProxy

Is there a way to provide a path to the file that was uploaded (or saved in a table)?

jshaffstall · July 30, 2023, 9:44pm

Given an uploaded file (which is a Media object in Anvil), you can write it to a temporary file. See the docs for Media objects, in the section on Files in Server Modules.

You can also search the forum for sample code, since a variation on this question shows up pretty frequently.

agiveon · July 30, 2023, 11:07pm

maybe its time they put it in the docs

jshaffstall · July 30, 2023, 11:12pm

I’m confused. It is in the docs, in the Media objects section on Files in Server Modules.

agiveon · July 30, 2023, 11:29pm

Its in the docs in a very convoluted way (took me a few hours to figure it out). My suggestion was that they provide a complete example for such a simple (and clearly repeating question).

jshaffstall · July 30, 2023, 11:35pm

I don’t think you were looking in the same place as I was. In the section I mentioned there’s a three line code example right after text that directly points to your use case:

If you’re using a Python library that wants you to pass it a filename, this can be really useful for writing some data into a file, then passing the file_name to the library you’re using.

Only the second line changes, based on where you’re getting the Media object from.

(Mostly putting this here for future folks who may happen on this topic.)

agiveon · July 30, 2023, 11:38pm

I appreciate your willing to help. I cannot find this string you shared in the docs… can you point me to the page itself (link)?

jshaffstall · July 30, 2023, 11:38pm

Its in this section: Anvil Docs | Files, Media Objects and Binary Data

agiveon · July 30, 2023, 11:43pm

Thank you. I could not get that part to work. Here is what I did:

@anvil.server.callable
def process_corpus(file):
  # file is given as a Media object
  
  pdf_path = '/tmp/corpus.pdf'
  
  with open(pdf_path, 'wb+') as f:
    f.write(file.get_bytes())

  # this is the function that needed a file path
  doc = textract.process(pdf_path)

jshaffstall · July 30, 2023, 11:55pm

Without seeing the code for the other way that you couldn’t get to work, I can’t say what might have gone wrong with it. I will note that the way you’re showing isn’t safe if you have multiple users executing the same server function at the same time, as you’re using the same file name for them all.

agiveon · July 30, 2023, 11:59pm

Yeah. You are right, but this is a function for an “admin” user which is only one. Should be fine. The rest of the function fails because of a different issue I have with the image size

meredydd · July 31, 2023, 11:04am

This is exactly right. What you’ve done, @agiveon, works but will fail in mysterious ways if there are concurrent calls to process_corpus(). What I’d recommend instead is the technique described in the docs here, using anvil.media.TempFile():

import anvil.media

@anvil.server.callable
def process_corpus(file):
    # file is given as a Media object
    with anvil.media.TempFile(file) as pdf_path:
        doc = textract.process(pdf_path)

agiveon · July 31, 2023, 11:48am

I tried that and got this error:

`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 10: invalid continuation byte`

* `at /usr/local/lib/python3.10/codecs.py:322`