If you’re new to Anvil or the Anvil Data Tables, you may want to start with our Storing and Displaying Data Tutorial. When you’re done there, come back here to learn more about retrieving data from your Tables.
To get data out of your Data Tables, you can construct queries from a set of query operators. Query operators can be used alone or combined to create complex expressions.
The query operators are provided by the anvil.tables.query
module. When you add the Data Tables service to your app, this module will be imported using the shorthand q
by default - you can change this if you wish.
import anvil.tables.query as q
Query operators are methods on q
, for example q.less_than(x)
.
We’ll look at some examples of using the query operators, and we’ll talk about using indexes to improve query performance.
The query operators
Here are some examples of using the query operators, organised by theme.
In the examples there are two Data Tables. app_tables.machines
contains IP addresses, hostnames and other data about computers on an (imaginary) network. app_tables.logs
contains logs from those machines.
We’ve built all these examples into a sample app for you to clone and inspect:
Values within a range
The machines
table has a column last_seen
that records the date and time that the computer last sent out a heartbeat to our app.
Here’s how to use less_than
to get all machines that haven’t been seen since before 14th December 2017:
app_tables.machines.search(
last_seen=q.less_than(
datetime(day=14, month=12, year=2017),
)
)
In addition to q.less_than
, you can use q.greater_than
, q.less_than_or_equal_to
, q.greater_than_or_equal_to
.
You can also search for values in a range using q.between
. This gets all Rows whose values lie between the min
and max
values you provide.
The machines
table has an uptime
column specifying the proportion of the time each machine has been operational.
It’s a floating-point number between 0
and 1
. Let’s say we want to select all machines that have been operational at some point (> 0% uptime), but have worse than 99.99% uptime:
app_tables.machines.search(
uptime=q.between(
min=0,
max=0.9999,
min_inclusive=False,
)
)
We don’t want to select machines that have never been up, so we use min_inclusive=False
to exclude the exact min
value (0
).
By default, the minimum value is inclusive (min_inclusive=True
) and the maximum value is exclusive (max_inclusive=False
).
Range queries don’t just work on numbers. Here’s an example of choosing machines whose hostname begins with letters between 'b'
and 'e'
inclusive:
app_tables.machines.search(
hostname=q.between(
min='b',
max='f',
)
)
Why does it specifiy 'f'
as the max? Recall that the lower bound is included and the upper bound is not included in the results. Text
columns are ordered lexically - f
< fa
< fz
. So if we used max_inclusive=True
, we would get all hostnames beginning 'b'
to 'e'
and the hostname that is exactly 'f'
!
Logical ’not'
Logical ’not’ is implemented by the q.not_
operator. It has a trailing underscore because not
is a protected keyword in Python.
Here’s an example of selecting all machines whose IPV6 addresses are not None
.
app_tables.machines.search(ipv6=q.not_(None))
Matching strings
Pattern matching
To perform a pattern match to a string, use q.like
. q.ilike
is a case-insensitive version.
There are two wildcards. To match any single character, use _
. To match 0 or more instances of any character, use %
. For a literal _
or %
, use \_
or \%
. So:
q.ilike('%ion')
would matchion
,lion
, andTRACTION
q.ilike('_ion')
would matchlion
but notion
orTRACTION
.q.ilike('\%ion')
would match%ion
.
In this example, we use wildcards to search for all machines in a given subnet:
app_tables.machines.search(
ipv4=q.like('192.168.10.%'),
)
Full text search
To perform an intelligent text search, use q.full_text_match
. The results are the Rows that contain all of the words in
the query. Rows containing similar words are matched (stemming) - so searching
for ‘Walking’ will match things containing ‘Walk’ and ‘walker’ as well as ‘Walking’. As well as stemming, words that
are very common in English are ignored to avoid a fog of false positives (here’s a full list).
This query finds all logs that contain 'Stopping process'
as well as 'stop process'
and 'process is now stopping'
:
app_tables.logs.search(
message=q.full_text_match('Stopping process'),
)
A richer query language can be used if you set raw=True
. The search term is interpreted using PostgreSQL’s
tsquery
syntax.
So if you want all logs containing 'Stopping process'
and 'stop process'
but not 'Process is now stopping'
, you can
use the <->
(‘followed by’) operator:
app_tables.logs.search(
message=q.full_text_match('Stopping <-> process', raw=True),
)
See the Postgres tsquery
docs for a full
specification of this query language.
Combining queries
For a single column
Queries can be combined using q.any_of
, q.all_of
and q.none_of
.
When used as keyword arguments, these operators apply to a particular column. So to select all machines except dionysus
and apollo
, you can do:
app_tables.machines.search(
hostname=q.none_of('dionysus', 'apollo')
)
If you want to construct a list or tuple of values, you can unpack them into positional arguments using the *
operator:
hostnames = ['dionysus', 'apollo']
app_tables.machines.search(
hostname=q.none_of(*hostnames)
)
Here’s an example of selecting all machines that are part of the build system:
app_tables.machines.search(
config=q.any_of(
{'type': 'build_master'},
{'type': 'build_worker'},
)
)
To explain: the machines
in our example have a Simple Object column called config
. If any machine has a 'type'
key with values 'build_master'
or 'build_worker'
, this query returns them.
The combination operators can combine queries as well as values. This query matches machines whose uptime is below 99% but not zero - similar to the example above, which used q.between
.
app_tables.machines.search(
uptime=q.all_of(
q.less_than(0.99),
q.not_(0),
)
)
For multiple columns
When used as positional arguments, q.any_of
, q.all_of
and q.none_of
apply to one or more columns. Pass column names to
them as keyword arguments.
Here is a query for machines:
- whose hostname contains
'au'
, or - whose IPV4 address begins with
192.168.
and ends in.0
, or - who have an IPV6 address.
Notice that q.any_of
is used as a positional argument here:
app_tables.machines.search(
q.any_of(
hostname=q.ilike('%au%'),
ipv4=q.ilike('192.168.%.0'),
ipv6=q.not_(None),
)
)
And here we use q.none_of
to select all machines who don’t match those criteria:
app_tables.machines.search(
q.none_of(
hostname=q.ilike('%au%'),
ipv4=q.ilike('192.168.%.0'),
ipv6=q.not_(None),
)
)
Queries can be arbitrarily complex
Queries can be nested arbitrarily to execute complex logic, because q.any_of
, q.all_of
and q.none_of
can take
other queries as positional arguments. There are often more readable ways of achieving the same result, but the power is there if you need it.
As an example, let’s imagine:
- You need to get all logs containing phrases like
Stopping process
. - You want to exclude logs whose level is
NOTSET
. - And you want to include
CRITICAL
-level logs only if the message was logged after midnight on 15th December 2018.
You also want to order the results by time logged.
Here’s a query that achieves that:
app_tables.logs.search(
tables.order_by('time_logged'),
q.any_of(
q.all_of(
level=q.all_of(
q.less_than_or_equal_to(LOG_LEVELS['ERROR']),
q.not_(LOG_LEVELS['NOTSET']),
),
message=q.full_text_match('Stopping <-> process', raw=True),
),
q.all_of(
level=LOG_LEVELS['CRITICAL'],
message=q.full_text_match('Stopping <-> process', raw=True),
time_logged=q.greater_than_or_equal_to(
datetime(year=2018, month=12, day=15),
),
)
),
)
It’s a little difficult to understand this at a glance, so let’s assign some of the subqueries to variables with sensible names.
error_and_below = q.all_of(
q.less_than_or_equal_to(LOG_LEVELS['ERROR']),
q.not_(LOG_LEVELS['NOTSET']),
)
critical = LOG_LEVELS['CRITICAL']
stopping_process = q.full_text_match('Stopping <-> process', raw=True)
starting_from_15th_dec_2018 = q.greater_than_or_equal_to(datetime(year=2018, month=12, day=15))
stopping_process_error_and_below = q.all_of(
message=stopping_process,
level=error_and_below,
)
stopping_process_critical_new = q.all_of(
message=stopping_process,
level=critical,
time_logged=starting_from_15th_dec_2018,
)
app_tables.logs.search(
tables.order_by('time_logged'),
q.any_of(
stopping_process_error_and_below,
stopping_process_critical_new,
)
)
This is functionally the same query as before, but it should now be easier to understand.
Indexes - optimise query performance
Developers on Business Plans with Extra Server Resources or Enterprise plans can create indexes on columns to optimise performance.
Right-click on the column heading in the Data Tables Service to set up indexes.
There are three types of index, which optimise different query types:
- Range queries such as
q.less_than
andq.between
are optimised by a B-Tree index. - Pattern matching (
like
andilike
) are optimised by a Trigram index. full_text_match
queries have an index type of their own, which involves indexing the data structure used to perform the stemmed search.
Multiple indexes of different types can be added to the same column.
As a rule of thumb, apply indexes when your queries start running slowly. They will usually make finding rows much faster. The cost is that writes are made very slightly slower, because the index must be updated as well as the underlying data. So the best approach is to err on the side of using them, but don’t just apply them to everything from the start!
Clone the examples
This app contains each of the examples listed in this cookbook. Clone it to see them at work:
If you need help with any specific queries, why not ask on the Forum?
There are many more tutorials, cookbooks and examples in the Learning Centre.
Or perhaps you’d prefer to get building!