Store large number of rows of reference data

Hey!

I need to store a few million rows of postcode reference data simple key:value format

e.g. Postcode(String):Distance(Integer)

I hit my row limit on Anvil data tables, so went to look at storing in an external mongo database, but can’t find any precedence of people doing this on the forum…

So - how would you do this? Is the only option an external Postgres database?

I’m a big fan of sql databases so I nearly always recommend them for things like this.

But if you’re using key:value, have you considered something like Redis? I guess it depends on how you want to retrieve the data.

Just FYI, one of my clients inserts up to 1 million rows per hour into a 3 node MySQL cluster (well, Galera so MariaDB techincally) but I extract several hundred thousand bits of commonly accessed data into redis for fast lookups (basically a cache). As long as your lookups are simple it works pretty well.

Look at https://www.fullstackpython.com/blog/install-redis-use-python-3-ubuntu-1604.html (oh yeah, I use uplink running on my own server for all db heavy stuff).

Thanks - Redis hadn’t occurred to me! Will have a think about that!

Have also just discovered ElephantSQL where I can get a free postgres database up to 20MB which is plenty for a couple of million rows :slight_smile:

If the data is “small” (meaning key value pairs are just postcodes and itegers as you describe, not long text fields), you can save them as a JSON.txt file and have them loaded when the form opens. I have used this in an Anvil application- if I have time I might try and put together a “sanitized” version open for distribution.

I did give this a go after your suggestion, but the data is 20MB which I think might be a bit much to ask people to wait for on page load :slight_smile:

Would be a nice simple solution for a tiny dataset though - and avoids DB latency etc.!

1 Like

Hi Matt,

How are you doing with this?

As you may already know: You can use any Python database API in a Server Module. For Mongo, you would use pymongo.

pymongo is not currently installed in the Server Module environment. We like to install whatever anybody needs, so if you do want to try using pymongo, just ask and we’ll install it.

Here is a list of the currently installed Server Modules.

For Postgres, we have psycopg2 installed. Again, just ask if you need something different.

In terms of hosting, AWS Relational Database Service provides a free tier, it may be worth checking if that’s sufficient for your needs (or perhaps a paid RDS instance is within your budget).

Thanks @shaun - I’ve currently gone for an Elephant SQL instance using psycopg2.

However I might give RDS free tier a go!

1 Like