Offline Caching of Data Tables [BETA]

divyeshlakhotia · December 18, 2022, 5:51am

Want faster access to data tables and more powerful PWAs? I just developed this dependency that can help you with that. By taking use of IndexedDB, it automatically does the job of storing and fetching rows from IndexedDB.

The usage is simple. Instead of using the default anvil tables, you can use cached tables

from cached_tables.ct import app_tables

row = app_tables('table_1').get(Col=Value)

This will first search for the row in the offline database. If match is found, it will return the row much faster than a server call. But if the row does not exist in offline database, it will fetch the row from your online database. The dependency will then store the row in the offline database to make sure that the results will be much faster next time.

You can also use other methods like search, get_by_id, has_row. Updating or deleting the row in client-side will update the record in both your offline database and online database (if forms have permission to do it). There are also methods for syncing rows and much more.

Link to docs - https://cached-tables-docs.anvil.app/

And the usage is not only limited to faster access to data tables. Storing rows in offline database means that they can be accessed without an internet connection. And that means - even more powerful PWAs which allow searching or getting rows from your tables even offline!

Clone Link -Anvil | Login

However, this dependency is currently unstable. Before usage in production, a proper testing from your side is recommended.

Report any issues that may come up. Thanks and have a great day!

mark.breuss · December 18, 2022, 9:44am

Awsome work - and hats off to you for tackling such a hard problem.

I don’t use anvil data tables but offline data access has been occupying me for a long time.
I took a look at the dependency and here are my two cents on that. Understand them as opinionated suggestsions and not criticism per se.

As I understand it you are downloading the whole datatable to the client. For many apps this will be to much data, and or a security issue.
→ You could work around that by letting the user define some sort of partitioning. Like all rows that belong to the currently logged in user.
In an offline first szenario its all about conflict resolution. What happens if multiple clients write to the same data. A simple resolution is that the last write wins. In your dependency however the last one to get back online wins. Imagine a client getting back online after a year of beeing offline would override all changes made to the row. → You could workaround that by storing a timestamp and comparing them when re-syncing.

Of course it depends on the usecase at hand. Those were merely two things that came to mind when skimming over it.
All in all - keep up the great work!

Mark

divyeshlakhotia · December 18, 2022, 9:57am

Thanks you so much for the valuable suggestions.

No. The whole database will not be downloaded in one go. That could be too big. This dependency will only download the data it gets. So as long as you are never asking for that row in client, it will never be downloaded. And if you want to ask for the row in client without storing it, you can switch to default anvil.tables for now. Although I will add a parameter that lets you specify whether to store row or not.
I am not sure sure if I understood this situation properly. Will you mind explaining it further?

jshaffstall · December 18, 2022, 3:30pm

Interesting approach! I have a similar work-in-progress that uses an in-memory cache of rows. I’ll be looking forward to see how yours develops.

jshaffstall · December 18, 2022, 4:38pm

Imagine two clients with offline versions of the same row. Both make incompatible updates to the row, and then both (at different times) sync the row to the server. Depending on the timing of the updates, you could end up with inconsistent data in the data tables.

A timestamp of the last update would help (you could disallow updates from offline clients whose timestamp in the row didn’t match), although given the (admittedly rare) possibility of duplicates with timestamps, a transaction safe sequence number probably works better.

When I was dealing with this issue I used a sequence number, and raised an exception when a client tried to update an old version of a row. On the client itself I had a commit/rollback mechanism to bring the client back into sync with the server if the client had old data.

I think this sort of offline caching works best with apps where one user owns a set of rows and is the only one who’ll be modifying them. It’s trickier with apps where multiple users collaborate on editing the same rows.

divyeshlakhotia · December 18, 2022, 5:15pm

Thanks for the clarification @jshaffstall @mark.breuss Reasons like this are why I have marked this dependency as unstable for production. I will work on this issue and deploy a fix.

Until then, I have disabled updating rows if the app is in offline_mode.

If the app is online however, both the offline and online database will be updated together (Offline database first and then online database in background)

At this stage, the dependency is best suitable for apps that only forms to read data tables, not write them. Will work on future improvements.

p.colbert · December 18, 2022, 6:46pm

It’s still tricky when that one user can log in from anywhere (any device, any browser) without first logging out (and flushing cache back to central DB) everywhere else.

Edit: I think the Anvil IDEs get around that by basing everything on a Version Control System. Old records are never destroyed, just marked as obsolete. If a conflict is detected, the user can review all the changes, and decide which ones to keep. Whether that’s practical for a more general audience, I’m not sure.

stefano.menci · December 19, 2022, 12:15am

I have an app that often has two users modifying the same row. Here is how I avoid conflicts.

The app manages crates lifecycle. User 1 decides what items go inside each crate, user 2 logs when an item has arrived to the crating department, user 3 logs when each item is in the crate.

Often user 1 decides to add one item to a create while user 2 receives another item while user 3 adds another item into the crate.

Each row describes one crate with all its content and the status of each item.

I solved the problem of the conflicts by:

storing both time stamp and user name for the last update
adding a timer on the client that every 5 seconds asks the server “my row has been modified by user X at time Y, do you have any updates for me”?
if that is still the latest update timestamp (99.9% of times), the server returns None and the client does nothing
if another user has updated the row after that timestamp, the server returns the row with the new value, then
if the client is not dirty, that is if the user has not edited anything, the form is silently updated with the new content of the row
if the data on the client has been modified, an alert warns that the unsaved editings will be lost, the form is going to be updated and shows the name of the user who last updated the row

Perhaps this dependency could do a similar polling?
The problem is that Anvil’s timers live in a form, while this dependency shouldn’t be tied to a form. Perhaps un infinite loop with a sleep(5)?

divyeshlakhotia · December 19, 2022, 2:25am

No, the offline row will never overwrite to central database when syncing.

Unless you update the row manually using

row['col'] = value #or .update

And as I mentioned previously, this will update both the online and offline database so everything works as expected.

But the problem here is that when refreshing an offline row, the online row will overwrite it. This means that any changes the user made offline will be lost.

Once I get around that problem, a good practice will be to first refresh the row and compare the new values with the offline values first before making any changes.

divyeshlakhotia · December 19, 2022, 2:28am

Interesting approach. I might take inspiration from this for one of my apps.

As for the issues with timer, I can just use the setInterval method of Javascript.