Uplink or REST?

rsmahmud · September 17, 2023, 11:53pm

What I’m trying to do:

I am planning to build a remote monitoring app.
Collecting serial data every second from modbus RTU slaves using raspberry pi pico W and logging it in data tables.

Let’s consider I have thousands of such modbus devices and I deploy one pico w for each.

I can see two approaches and need your advice on deciding which path would be optimal for this use case.

Option 1: Uplink

Run uplink on pico w to call a server function that writes the data to a new row.

Concerns:

Is connecting thousands of uplink to an anvil server is a feasible idea?
thousands of uplink calling the same server function to write data, how much resources do i need on server, how many app_tabes.data.add_row() operation can i process per second?
client uplink key does not seem to work on pico w. so when I install the devices on different locations, my server uplink key will be stored on it. anyone can just plug it in their laptop and take the uplink key! is there any way to limit uplinks to only call a certain function? how should i handle this security concern?

Option 2: HTTP Endpoint

Create an http endpoint to receive data and log it to database. pico w can send rest api request.

Concerns:

again, how many concurrent api request can my anvil server handle? how should i plan for server resource, if only there were some tool or documents explaining load calculation and resource planning for anvil apps…

Some general anvil queries:

is there any database connection pooling mechanism to optimize heavy writes?
is checking background task state is just another server call or is there something more to it? which would be more efficient,

timer tick doing app_table.data.search() from client side
timer tick calling server function which reads the database on server
timer tick checking background job state, and the bg task does the database reading and puts it on task state.

What are your advice for me? If you were to face this which path would you choose?

jshaffstall · September 18, 2023, 1:05am

I cannot answer most of your questions, since I don’t write apps at that scale.

Uplinks get divided by client and server versions. If a client uplink key doesn’t work for you, and you use a server uplink key, you cannot prevent them from directly accessing data tables, even if you protect individual server functions in other ways (e.g. requiring authorized users with appropriate roles).

I think the above probably points you in the direction of HTTP Endpoints for security reasons, unless you can switch to a device that supports client uplink keys.

david.wylie · September 18, 2023, 9:57am

Hi Rasel, this is not a direct answer for you, but if I were doing this, I would approach it slightly differently.

As you know I prefer separation of concerns when dealing with volume, and I would have concerns that you might overwhelm your Anvil allowances rather quickly at scale. You might end up needing a dedicated Anvil server which may or may not be within your budget. So I would :

collect and write the data outside of Anvil (eg Flask & MySQL running on a cheapy VPS from Linode or DigitalOcean
use Anvil for displaying/alerting/etc., connecting to the external database

I would go even further, though I accept I am getting into controversial territory

I wouldn’t do the database writes in Flask. I would push the data into a queuing system (Redis lists or streams being my preferred but RabbitMQ would also work). You can then have as many consumers of that queue as is required across multiple servers if necessary. You can spin them up and drop them as required (the servers and/or the consumer processes).

Too many moving parts for some, but for me it allows high volume and a degree of recovery/mitigation when (and it’s always when) the database stops accepting data and the data consumers (api/uplink/whatever) start timing out and causing high CPU usage & lost data …

Just my 2c …

rsmahmud · September 18, 2023, 3:51pm

Hi david,

Thanks for sharing your insight on this! I understand your approach. usually, these kinds of IoT monitoring works using some sorts of pub-sub mechanism.

But for my use case, it’s just periodic data logging, REST would be fine. thousand plus devices is just me thinking ahead. any modern server should be able to handle hundreds of requests.
logging data each 10, 30 or even 60 seconds would be fine. redis would be an overkill for this.

But on demand, if i want to monitor one such device LIVE, i can directly call functions on pico with 1 sec timer interval. leveraging async the periodic logging loop can run independently.

last year @meredydd demonstrated a live dashboard using pico w, monitoring his room temperature and humidity. he short-polled the server to make the plot realtime which is great for a quick demo but not scalable. I was inspired by his demo and wondered what does one need to make this scalable!

Anvil is great, I love it. With each passing day, I’m becoming less comfortable with other frameworks.
Anvil stays my first choice for any project. If i could use the client_uplink key, I could give it a try!

junderhill · September 19, 2023, 8:04pm

For what it’s worth, we do something similar with PIs and Python. But we send the data to a dedicated time series database and and chart it with a dedicated charting package. We operate at scale—hundreds of units, millions of data points per day, near real-time monitoring.

We love Anvil and there’s no reason this project couldn’t be done in Anvil. But we’ve made some optimizations to the architecture on the data collection platform which eases the burden on the back end. We don’t use any fancy queues or anything like that, but we do:

Cache points in a local SQLite database. This allows us to collect data when offline.
Run a separate process which uploads the data in batches to a REST endpoint.
Run a third process which performs housekeeping functions, including downloading configuration data from the cloud based on the unit’s serial number, purging uploaded points from the database, and actually updating the code from Git when a new version is available.

The three Python programs just run in the background, managed by systemd. There’s no fancy IPC—we communicate through the SQLite database, using transactions to maintain integrity. All operations, including sampling, writing to the database, and uploading, automatically retry on failure. Without the retries, the system would never work. With them, it’s nearly 100% reliable.

You can minimize your risk of key theft by using a different API key for each unit. But I don’t think you can eliminate that risk completely. FWIW, we have shared, unload-only API keys on the PIs—can’t see any way around that.