Timeout on large file upload

Hi!

Just evaluating Anvil for a project and so far I love it and expect to be able to save a lot of time.

One of the requirements is the ability to upload large data files for analysis.

I’ve set it up with the FileUploader and saving to a table as a media column.

A small file (197K) works but attempts with a large file (49MB+) return a timeout on the server module. I’m thinking it may not work to store large files into a table.

Subsequent processing will involve parsing and preparation the analysis to return in some type of grid.

What approach do you recommend? Is there a way to change the timeout limit?

Thanks!

I have an app which does exactly this - upload and store PDF documents of substantial size.

To handle the upload timeout issue, I use a client writable view (https://anvil.works/docs/data-tables/data-security#views).

My media uploader loads the file into this view and I then call a server function to write it from that view into my real table (and then delete from the view, all within a transaction).

Hope that helps!

5 Likes

You may also find this thread useful. It discusses using Google Drive as an option.

1 Like

Is there a small clonable example you can share that uses this technique? I have many questions that I think I’d be able to answer for myself if I were to look over the code.

Sorry, no. This was done for a commercial client of mine.

I threw this together to show the principle:

https://anvil.works/build#clone:24VCGPJYOSAP4DR6=C77EWVK6PJ2MC5R5LRUVDLHM

2 Likes

Thanks for that!

In the process of laying out some questions here, I’ve answered some of them. Some still remain.

  1. What’s the advantage of copying the media over to a separate table? Adding a row to the client writeable view adds it to the media_upload table. I could just use the media from there, right?

  2. How does this avoid the server timeout issues with large files? At some point in this process the file still has to be sent to the server. Is adding a row via the client writable view avoiding the timeout in some way?

  1. I use the separate table so I can put an ‘owner’ column on it, which I need in order to generate the view but don’t need on the main table.

  2. Yes, the client writeable view avoids the timeout because it doesn’t make a server call.
    (Well, there’s a server call to get the view, but there’s no call involved in uploading the file which is where the timeout would occur).

Wow…that seems like a loophole. A terrific one, but I’m a bit concerned that the behavior might change at some point.

Regardless, this technique should do wonders for my project for submitting video games for a contest. I was running into server timeout issues not because of file size but because of low bandwidth in rural areas.

I believe it’s by design. The view has to be fetched with a server call and is then writable from the client, hence its name.

Hi Owen,

Thanks for that suggestion. I’m going to try that out and see if it works for me.

Cheers!

Thanks, campopianoa,

I did see that approach and am going to try that as well.
For the feature to be a bit more scalable I’m thinking S3 might be better and maybe more secure. Would using S3 work essentially the same way?

Thanks!

Thanks, I’ll definitely try that!

You can write to S3 using boto3, here’s an example provided by AWS:

https://docs.aws.amazon.com/code-samples/latest/catalog/python-s3-s3-python-example-upload-file.py.html

import boto3

S3 = boto3.client('s3')

SOURCE_FILENAME = 'filename'
BUCKET_NAME = 'bucket-name'

S3.upload_file(SOURCE_FILENAME, BUCKET_NAME, SOURCE_FILENAME)
1 Like

thanks, shaun!

I’m putting that on my list to try also. Ultimately I need to extract the json data out of the file to parse so I’m interested to see performance of S3 vs media column in a table.

this is almost what I’m after but… HOW do we get the source filename?

I looked the fileuploader over and cant find it there…

Many thanks,

Michael

Media objects have a name attribute that tells you the original file name.

yes but how does the name help?

doesn’t the code above refer the the filepath? and don’t they need to be uploaded before they are media objects?

There are a bunch of questions on the forum about writing temporary files for libraries that require files instead of media objects. If you need to do that, then you wouldn’t worry about the original name, you’d use whatever name the temporary file had.

Hi @owen.campbell , I followed your suggestion and it worked well. Was wondering though if this is the state of the art solution on this issue? Thanks!