Data table support for aggregate functions

ahawes · January 30, 2020, 4:51pm

Is there a way to efficiently retrieve summary information from a data table?
e.g. the number of orders at each possible order status
If I was doing this outside of Anvil, I’d simply use SQL count and group by

campopianoa · January 30, 2020, 5:07pm

Hello @ahawes and welcome,

I think it will depend on which summary information you would like and how your tables are set up.

Using the DataTables API you can do something like the following to get the number of orders with a particular status (not a groupby).

num_rows=len(app_tables.orders.search(status='in_stock'))

To get a groupby-like operation, perhaps numpy could help:

import numpy as np

my_col=[r['order_status'] for r in app_tables.orders.search()]
unique, counts = numpy.unique(my_col, return_counts=True)
my_counts=dict(zip(unique, counts))

# returns a dictionary where each key represents the order status and the values are their counts

In terms of using SQL directly, the docs here say:

Users on Dedicated Hosting or Enterprise plans can access the data in their Data Tables using SQL directly. To get started, contact support@anvil.works to get your dedicated database credentials.

robert · January 30, 2020, 5:09pm

Personally if I need to do something like this I will load the subset of data I need (I’ll subset by first querying the table for what I need) into Pandas. Pandas has copious amounts of aggregation options.

I don’t (yet) see a performance issue with Pandas, but I also am careful to write Pandas code that can be executed via arrays and therefore take advantage of the Cython integration.

stefano.menci · January 30, 2020, 7:45pm

I do use SQL aggregate queries.
They are available on my dedicated plan, I don’t know on what other plans they are available.

ahawes · January 31, 2020, 10:18am

Thanks for the suggestions. I will try them and see what works best for me

rickhurlbatt · February 5, 2020, 5:09am

+1 for Pandas.

@ahawes I am sure you have already seen the post below, but if not, it is very helpful.

owen.campbell · February 5, 2020, 10:15am

itertools.groupby can be used on a data table query. Both return iterators, so it’s a reasonably efficient technique.

gweber.lauraandleigh · September 26, 2020, 3:32pm

I was just looking through the documentation and forum to find aggregation functions. As far as I can tell and from what I see here, it looks like there is no “group by”-like functions intrinsic to data tables.

Is that the case?
Could we keep this conversation alive and move this post to a feature request? Or is there no overall motivation to add these functions? Is it just recommended to use these alternatives (pandas/itertools/sql) instead?

It would certainly simplify my code if data tables had built-in aggregation. It also looks like plenty of other people have had to create a plethora of alternatives.

bridget · September 28, 2020, 10:01am

Hi @gweber.lauraandleigh,

This is correct. Anvil doesn’t support groupby lookups out of the box.
Please do create a Feature Request, and I can then add it to our list