Timeout on large file upload

evelin.ento · November 15, 2019, 5:10am

Hi!

Just evaluating Anvil for a project and so far I love it and expect to be able to save a lot of time.

One of the requirements is the ability to upload large data files for analysis.

I’ve set it up with the FileUploader and saving to a table as a media column.

A small file (197K) works but attempts with a large file (49MB+) return a timeout on the server module. I’m thinking it may not work to store large files into a table.

Subsequent processing will involve parsing and preparation the analysis to return in some type of grid.

What approach do you recommend? Is there a way to change the timeout limit?

Thanks!

owen.campbell · November 15, 2019, 11:59am

I have an app which does exactly this - upload and store PDF documents of substantial size.

To handle the upload timeout issue, I use a client writable view (https://anvil.works/docs/data-tables/data-security#views).

My media uploader loads the file into this view and I then call a server function to write it from that view into my real table (and then delete from the view, all within a transaction).

Hope that helps!

campopianoa · November 15, 2019, 12:43pm

You may also find this thread useful. It discusses using Google Drive as an option.

jshaffstall · November 15, 2019, 2:25pm

Is there a small clonable example you can share that uses this technique? I have many questions that I think I’d be able to answer for myself if I were to look over the code.

owen.campbell · November 15, 2019, 2:31pm

Sorry, no. This was done for a commercial client of mine.

owen.campbell · November 15, 2019, 3:17pm

I threw this together to show the principle:

https://anvil.works/build#clone:24VCGPJYOSAP4DR6=C77EWVK6PJ2MC5R5LRUVDLHM

jshaffstall · November 15, 2019, 4:34pm

Thanks for that!

In the process of laying out some questions here, I’ve answered some of them. Some still remain.

What’s the advantage of copying the media over to a separate table? Adding a row to the client writeable view adds it to the media_upload table. I could just use the media from there, right?
How does this avoid the server timeout issues with large files? At some point in this process the file still has to be sent to the server. Is adding a row via the client writable view avoiding the timeout in some way?

owen.campbell · November 15, 2019, 4:35pm

I use the separate table so I can put an ‘owner’ column on it, which I need in order to generate the view but don’t need on the main table.
Yes, the client writeable view avoids the timeout because it doesn’t make a server call.
(Well, there’s a server call to get the view, but there’s no call involved in uploading the file which is where the timeout would occur).

jshaffstall · November 15, 2019, 6:33pm

Wow…that seems like a loophole. A terrific one, but I’m a bit concerned that the behavior might change at some point.

Regardless, this technique should do wonders for my project for submitting video games for a contest. I was running into server timeout issues not because of file size but because of low bandwidth in rural areas.

owen.campbell · November 15, 2019, 7:32pm

I believe it’s by design. The view has to be fetched with a server call and is then writable from the client, hence its name.

evelin.ento · November 17, 2019, 6:36pm

Hi Owen,

Thanks for that suggestion. I’m going to try that out and see if it works for me.

Cheers!

evelin.ento · November 17, 2019, 6:38pm

Thanks, campopianoa,

I did see that approach and am going to try that as well.
For the feature to be a bit more scalable I’m thinking S3 might be better and maybe more secure. Would using S3 work essentially the same way?

Thanks!

evelin.ento · November 17, 2019, 6:39pm

Thanks, I’ll definitely try that!

shaun · November 19, 2019, 10:25am

You can write to S3 using boto3, here’s an example provided by AWS:

https://docs.aws.amazon.com/code-samples/latest/catalog/python-s3-s3-python-example-upload-file.py.html

import boto3

S3 = boto3.client('s3')

SOURCE_FILENAME = 'filename'
BUCKET_NAME = 'bucket-name'

S3.upload_file(SOURCE_FILENAME, BUCKET_NAME, SOURCE_FILENAME)

evelin.ento · November 20, 2019, 1:55am

thanks, shaun!

I’m putting that on my list to try also. Ultimately I need to extract the json data out of the file to parse so I’m interested to see performance of S3 vs media column in a table.

michaellavers · February 16, 2022, 9:42pm

this is almost what I’m after but… HOW do we get the source filename?

I looked the fileuploader over and cant find it there…

Many thanks,

Michael

jshaffstall · February 16, 2022, 10:03pm

Media objects have a name attribute that tells you the original file name.

michaellavers · February 16, 2022, 10:07pm

yes but how does the name help?

doesn’t the code above refer the the filepath? and don’t they need to be uploaded before they are media objects?

jshaffstall · February 16, 2022, 10:10pm

There are a bunch of questions on the forum about writing temporary files for libraries that require files instead of media objects. If you need to do that, then you wouldn’t worry about the original name, you’d use whatever name the temporary file had.

edmondssesay · June 28, 2022, 6:46pm

Hi @owen.campbell , I followed your suggestion and it worked well. Was wondering though if this is the state of the art solution on this issue? Thanks!