Connecting via Uplink to Google App Engine (initially for data loading using singer.io)

jim · October 7, 2019, 5:30pm

I’m planning on using Anvil for a data management application, and I’m wondering if I might be also be able to use it for data loading from various sources to my data warehouse.

I have been experimenting with using Singer taps and targets to do this (running locally to test), and it seems like you could set up background/scheduled tasks in Anvil to execute these… does this sound like it might be a viable project, or not suited to Anvil?

https://www.singer.io/#what-it-is

One potential sticking point (but maybe not a dealbreaker) is that it is best practice to install the tap (data source) and target (data destination) in different virtual environments, and I’m not sure how you’d achieve this in Anvil. Anyway, any thoughts/guidance gratefully received!

running-singer-with-python

shaun · October 8, 2019, 9:38am

Hi @jim!

Are you intending to run Singer in an Anvil Server Module or in a Uplink script?

If you’re using the Uplink, you can run a separate script for the tap and target, so each could be running in a different virtualenv. If for some reason you’d like to have just one Uplink script, you can activate different virtual environments programatically in Python.

jim · October 8, 2019, 9:43am

Hi @shaun,

I was initially going to try and keep it simple and run both tap and target in an Anvil Server Module, getting data from an external source and streaming it to my data warehouse (i.e. not storing it locally in Anvil)… I could try the Uplink approach in future, looks quite simple so thanks for the link.

To run taps/targets in an Anvil Server Module, would I need to get you guys to install each tap/target we need (on a paid plan, obviously) or is that something I can do locally?

shaun · October 8, 2019, 2:00pm

If you want to set up multiple virtual environments in Server Modules, we can do that on a Dedicated plan (where you have a whole server to yourself).

To get complete control of your environment without a dedicated plan, the Uplink is indeed the way to go. You can run the Uplink anywhere, including on a cloud machine - you can run an EC2 micro for free on the AWS free tier.

jim · October 8, 2019, 5:48pm

This is great, thanks. I think I’ll have to start with the Uplink approach, but one day dedicated sounds nice.

I’ve been reviewing the documentation and tutorials and just want to confirm that my assumptions are correct:

It looks like you need to run the script constantly using the anvil.server.wait_forever() line in the python script to be executed which means you’d need to run it constantly on a VM (and not a serverless solution)… I keep everything within GCP for ease of integration and logging, so I’d be looking at a simple Compute Engine VM Instance. Any serverless ideas would be great, do you think this would work on App Engine?.
I assume I can pass parameters to the uplink function(s) e.g. config, user_id and then execute the script using these parameters. Can it handle multiple concurrent function invocations with different parameters?

Thanks in advance, I really want to use Anvil for this but need to understand how well this will scale.

david.wylie · October 8, 2019, 7:13pm

Can you trigger a serverless script via an API? If you can, then you could use the scheduler in Anvil to periodically (or programmatically on demand) fire the script which could then attach to Anvil, do its business, then shut down.

I have no idea if that fits within what singer.io does (as I’ve never heard of it before).

shaun · October 9, 2019, 12:41pm

Hi @jim, here are some answers to your questions:

When do you not need to keep the Uplink running?

If you want to call functions in the Uplink script, you need anvil.server.wait_forever() to keep the websocket connection alive and listen for calls. But if you want to call functions in Server Modules from the Uplink script, or use Data Tables (or do anything else), you don’t have to keep the Uplink script running. These scripts do not need to be kept running:

import anvil.server
anvil.server.connect("key")
anvil.server.call("do_something_in_a_server_module", [1, 3, 4, 7])

and

import anvil.server
import sys
anvil.server.connect("key")
app_tables.customers.add_row(name=sys.argv[1], zip_code=sys.argv[2])

Running an Uplink script in App Engine

You can certainly run the Uplink in App Engine! The Uplink is a pip-installable library, so you just need to add it to requirements.txt.

I’ve just successfully tried this out. I created an Anvil app that sends data to an App Engine script, which in turn sends data back to the Anvil app.

The Anvil app has a Button that hits an App Engine app, passing in a name in the path:

    # On the client
    anvil.server.call('hit_app_engine', self.text_box_1.text)

    # On the server
    APP_ENGINE_URL = 'https://golden-memory-255409.appspot.com'

    @anvil.server.callable
    def hit_app_engine(name):
      # This could be a POST request if you had lots of data to send.
      anvil.http.request(APP_ENGINE_URL + '/logme/' + name)

The App Engine app is a Flask app that uses the Anvil Uplink to store the name in Data Tables, and call a server function.

#!/usr/bin/env python3
import logging
from datetime import datetime

from flask import Flask
import anvil.server
from anvil.tables import app_tables


anvil.server.connect('LB4MGWIXKJOVJHRZXQ7AWZQT-ST5S6MYBRAXYDFUR')
app = Flask(__name__)


@app.route('/logme/<name>')
def log_name(name):
    """Log the request path in a Data Table."""
    app_tables.name_log.add_row(name=name, when=datetime.now())
    result = anvil.server.call('log_the_name', name)
    return result

The Anvil Uplink library is installed by including it in requirements.txt:

~/gcp-projects/anvil-uplink $ cat requirements.txt
Flask==1.0.2
gunicorn==19.9.0
anvil-uplink

This demonstrates sending data from Anvil to App Engine and back in several ways.

Here’s a GIF of that app running:

And here’s a clone link so you can take a look:

https://anvil.works/build#clone:ST5S6MXBRAXYDEUR=NZYBDDU3N4YHRC47AO4PBRU5

I’ve put my App Engine app in Github, it’s here:

Can you make concurrent calls to Uplink scripts?

Yes, it’s a fully multi-threaded server. If you create a script that does this:

#!/usr/bin/env python3

import anvil.server
from time import sleep

anvil.server.connect('GTLMTUIUVQI63IW7BWRWZ4Q6-F25JU2MCAKFCK5PX')

@anvil.server.callable
def run_the_thing():
    print('running...')
    sleep(5)
    print('done')
    return 'done!'

anvil.server.wait_forever()

And run run_the_thing in quick succession from different instances of your app, you get this output:

running...
running...
running...
running...
running...
done
done
done
done
done

woodpav · October 10, 2019, 2:34am

@shaun There is a problem with the third example. App Engine is fully managed so the Uplink does not boot until you call gcloud app browse and only stays open for a few hours. App Engine is built to respond to HTTP requests ad hoc and not to keep a websocket open indefinitely. By calling gcloud app browse we reboot the server and reopen the websocket until GCP comes along and cleans it up several hours later.

Steps to reproduce:

Create a main.py file with the contents from your example
Create an app.yaml with the key runtime: python37.
Create a requirements.txt with anvil-uplink
Run gcloud app deploy
Try running an anvil.server.call from the Anvil frontend and it results in an error.
Run gcloud app browse
Try running an anvil.server.call from the Anvil frontend and it succeeds.
Wait 2 hours
Try running an anvil.server.call from the Anvil frontend and it results in an error (no callable registered).
Run gcloud app browse
Try running an anvil.server.call from the Anvil frontend and it succeeds.

I tried using a Compute Engine but ran into issues with python versioning and building a PEX file so I gave up. It’s definitely possible to use CE but it added too much complexity for our stack.

Let me know if you have more ideas or if you think I missed something. My current idea is to git push to server modules with folders enabled, email support@anvil.works for pip install, and unroll all my private requirements into the main app.

Another option is to use Anvil scheduled tasks to create a HTTP heartbeat that keeps App Engine running. I haven’t tried this yet.

david.wylie · October 10, 2019, 5:20am

I thought you were saying that’s precisely what you didn’t want (the Anvil uplink to be running all the time)? If you do you can run it on a $5/month server from Digital Ocean.

Maybe I’ve misunderstood? I can be a bit slow

shaun · October 10, 2019, 10:11am

@woodpav My example uses an HTTP request to transfer the initial data to the App Engine app - it does not anvil.server.call into the App Engine app, but it does anvil.server.call from the App Engine app, and write to Data Tables from App Engine. I’ve just checked and it still works without me having run gcloud app browse recently.

You’re describing using anvil.server.call to call into the App Engine app from cold. Your suggestion of using a heartbeat to keep the app alive is interesting, I hadn’t thought of that. However as @david.wylie says, running a full cloud server might be the way to go - it doesn’t necessarily cost much.

jim · October 10, 2019, 2:48pm

Thanks @shaun for this helpful and detailed response, and for everybody else (@woodpav, @david.wylie) for your valuable insights.

In my use case (long running data transfer process between an external API and a data warehouse) it definitely makes sense to use Anvil for the user interface/config and App Engine to run the process, however the data does not need to be sent back to Anvil (just status, which I can log somewhere else and access from my Anvil app) so I don’t think I need to use uplink in this instance, I can trigger the process via HTTP or Pub/Sub and keep it simple in Anvil.

Anyway, problem solved and Anvil looks like a good choice for this and future projects. Onwards!