I am trying to build a chatGPT based Q/A bot.
One stage in the process is uploading a file to get indexed.
I got the upload part, but the indexing part (I use textract package) requires a path to the file. Here is the error message I get:
TypeError: stat: path should be string, bytes, os.PathLike or integer, not LiveObjectProxy
Is there a way to provide a path to the file that was uploaded (or saved in a table)?
Given an uploaded file (which is a Media object in Anvil), you can write it to a temporary file. See the docs for Media objects, in the section on Files in Server Modules.
You can also search the forum for sample code, since a variation on this question shows up pretty frequently.
maybe its time they put it in the docs 
I’m confused. It is in the docs, in the Media objects section on Files in Server Modules.
Its in the docs in a very convoluted way (took me a few hours to figure it out). My suggestion was that they provide a complete example for such a simple (and clearly repeating question).
I don’t think you were looking in the same place as I was. In the section I mentioned there’s a three line code example right after text that directly points to your use case:
If you’re using a Python library that wants you to pass it a filename, this can be really useful for writing some data into a file, then passing the
file_name to the library you’re using.
Only the second line changes, based on where you’re getting the Media object from.
(Mostly putting this here for future folks who may happen on this topic.)
I appreciate your willing to help. I cannot find this string you shared in the docs… can you point me to the page itself (link)?
Thank you. I could not get that part to work. Here is what I did:
@anvil.server.callable
def process_corpus(file):
# file is given as a Media object
pdf_path = '/tmp/corpus.pdf'
with open(pdf_path, 'wb+') as f:
f.write(file.get_bytes())
# this is the function that needed a file path
doc = textract.process(pdf_path)
Without seeing the code for the other way that you couldn’t get to work, I can’t say what might have gone wrong with it. I will note that the way you’re showing isn’t safe if you have multiple users executing the same server function at the same time, as you’re using the same file name for them all.
Yeah. You are right, but this is a function for an “admin” user which is only one. Should be fine. The rest of the function fails because of a different issue I have with the image size 
This is exactly right. What you’ve done, @agiveon, works but will fail in mysterious ways if there are concurrent calls to process_corpus()
. What I’d recommend instead is the technique described in the docs here, using anvil.media.TempFile()
:
import anvil.media
@anvil.server.callable
def process_corpus(file):
# file is given as a Media object
with anvil.media.TempFile(file) as pdf_path:
doc = textract.process(pdf_path)
1 Like
I tried that and got this error:
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 10: invalid continuation byte`
* `at /usr/local/lib/python3.10/codecs.py:322`