Simple Caching approach

Ruben.Varela · October 27, 2021, 1:56am

I have some tables with quite a bit of data that I have to transform in some ways. The way the data has to be transformed is not the quickest. I decided to build a quick caching layer.

Before the code, fair warning. This is my first application on Anvil and I’m not an expert for sure on internals. But this has been great for my use case.

Server code,

def cache_get(key: str = '', default=False):
  '''
  :param key: Unique key identifying the cached record to pull
  :param default: Optional default value returned. False by default
  :return: False or default if no data. Cached string by default
  '''
  if not key:
    return default

  result = app_tables.cache_data.search(cache_id=key)
  if len(result) == 0:  # if no result, return default
    return default
  elif len(result) > 1:  # if more than 1 result, delete all and return default
    for row in result:
      row.delete()
      return default

  for x in result:
    return x['data']


def cache_set(key: str = '', payload: str = ''):
  '''
  :param key: Key under which to store the cache. Has to be unique. Prefix with func name, class name, or user id
  :param payload: A string representation of what needs to be cached. Can be a JSON for example
  :return: Bool: True if it cached it. False if it didn't or on error.
  '''
  if not key or not payload:
    return False

  if not isinstance(key, str):
    print('cache_set: Key is not a string')

  if not isinstance(payload, str):
    print('cache_set: Payload is not a string')
    return False

  try:
    result = app_tables.cache_data.search(cache_id=key)
    for row in result:
      row.delete()
    app_tables.cache_data.add_row(cache_id = key, data = payload, last_updated = datetime.now())
    return True
  except Exception as e:
    print(e)
    return False

cache_data table is,

cache_id: Text
data: Text
last_updated: Date and Time

The server side functions are simple. The setter takes an ID (string) and a Payload (string). The idea is you can serialize somehow your data and store it. If it’s user data, add the user id as part of the ID. For simple data types, I’m simply storing a JSON representation (json.dumps())

def some_function():
  c_data.= cache_get('someid')
  if c_data:
    return json.loads(c_data)

  //expensive calls/transformations
  some_var = [1,2,3,4,5,6,7,8,9]

  c_data = json.dumps(some_var)
  cache_set('someid', c_data)
  return some_var

Right now I’m storing the date/time when the row is added. Thinking I can add extra code on cache_get() to delete the entry and return false if it’s older than X .

Anyway, hope it helps others and I’d appreciate feedback!

robert · October 27, 2021, 9:58am

Cool. I thought it would be client side caching when I saw the title, and was pleasantly surprised to see a server approach.

Is this cache called by a client or run on some schedule?

Also, do you have a strategy for cache invalidation?

Ruben.Varela · October 27, 2021, 12:40pm

The way I have it setup is with the client calls a server side function.

The server side function uses the cache calls to see if the data is already cached. If not, it executes the code, caches it and returns the data.

The next time the function is called, the cache is set and it will just return that without executing the expensive code.

–

Regarding cache invalidation, I don’t have a strategy right now implemented, but I have 2 approaches I’m thinking about.

First, I set a global max time for caching. Let’s say 4 hours. On cache_get(), I load the data, check the time column that’s set and if it’s older than 4 hours, it deletes the row, and returns false/default so the function executes the original code and caches it again.

Second, on cache_set() I add a third and optional argument, something like TTL, where you can send the time it should expire. Then instead of saving the time the record is created, we send the time it should expire. Then on cache_get(), If current time is older than time saved, the row gets deleted and false/default is returned so the original code can be executed.

First one is where I was going originally, but the second one I think might be better. And not having made a decision, that’s why I haven’t implemented it yet.