A minimal, extensible framework for TextBox data validation

p.colbert · August 25, 2017, 3:17pm

This is a scheme I’m working on. Comments are invited.

A simple-to-use, reusable, chainable set of text-validators.

Introduction

This minimal framework includes support for

sanitizing badly-pasted input text
validating text against zero or more criteria (pre-defined and custom)
converting the text to other types of values (e.g., numbers)

The latter is often critical for performing additional sanity-checks.

While this was originally written for use by Anvil.works’ text-entry fields, it is independent of any specific data-sources.

Operating principles

This framework uses a “whiteboard” metaphor, borrowed from expert-system architecture. The data source writes its initial data (the entered text) to a whiteboard (a dictionary). Then it calls in your chosen expert (a function) to review the whiteboard contents.

If the input passes inspection, the expert leaves without comment. If it fails, then the expert leaves a description of the error on the whiteboard.

The caller can do anything it wants with that message: display it, log it, raise an exception, … The expert doesn’t need to know, which keeps the expert (function) simple and focused on the diagnostic task.

An expert doesn’t have to know and do everything by itself. (In fact, such a beast can be very tricky to write and maintain.) Instead, it can call upon other proven experts, in exactly the same way.

In fact, we include two trivial mechanisms for “building” an expert, using other experts as building-blocks. This allows every expert to be as simple and well-focused as possible.

Informally, we can divide experts into the three categories noted above: sanitizers, validators, and converters. We’re sure you can find more uses.

Specifics

A whiteboard is simply a Python dict. By convention:

“text” contains the input string (possibly “sanitized”).
“err_msg”, if present, contains the error message
“value”, if present, contains the converted value (e.g., a number)

You can easily extend this system with tags of your own, as needed.

Here’s a sample whiteboard, as provided by the caller:

wb = {‘text’: ’ your text here '}
print(wb)
{‘text’: ’ your text here '}

We can define a validator quite simply:

def length_check(wb) :
… if len(wb[‘text’] > 10 :
… wb[‘err_msg’] = ‘must be 10 characters or less’

Notice that we’re not even returning a value. The mere presence of the error message tells the caller that the text has failed the test.

The caller will invoke the validator as follows:

length_check(wb)

This changes the whiteboard contents:

print(wb)
{‘err_msg’: ‘must be 10 characters or less’,‘text’: ’ your text here '}

Thereafter, we can ask the whiteboard how things went:

print(validation_failed(wb))
True

print(validation_succeeded(wb))
False

This is sometimes useful when an expert calls upon other experts. Otherwise, we leave such testing to the caller.

I’m currently building this scheme, including some (useful?) sample “experts”.

I’m also building an Anvil-specific caller, that will link any TextBox to such a validator, for plug-and-play operation.

I’ve tried simpler designs, but they all seemed to make the job harder. With this approach, any validation job should be decomposible into a small set of trivially-reused parts, one part per criterion, or per transformation.

I’ve also skimmed a number of 3rd-party validation libraries on the Python Package Index. They’re aimed at checking things after you’ve got them all collected into a composite structure.

That’s no help here. For data-entry purposes, the user should have immediate feedback, preferably before they even leave the entry field. The sooner they understand what’s needed, the better. My scheme supports that.

Does it look to you like I’m on the right track? Or at least a useful one?

meredydd · August 25, 2017, 3:33pm

This looks cool! Do you have a demo app you can show us yet?

(As well as the Publish link, at the bottom of the Publish dialog is an option to allow someone else to make a copy of your app. If you paste that link into the forum, we can all take copies and look at the source code)

p.colbert · August 25, 2017, 6:02pm

Work is under way. I’ll keep you posted.

p.colbert · August 30, 2017, 4:27pm

I have a first-draft demo app that folks can inspect:

https://anvil.works/ide#clone:SIPDQKMP5VKPYDZN=ZKQYBKSLJE3KH5K76QOBNXOS

Docstrings are included. Comments are welcome.

The validation part uses a simplified whiteboard (or blackboard) metaphor, outlined here:

In my simplified system, I call upon each “expert” (function) exactly once, in your specified sequence, so everything’s done in a well-understood order.

Some unresolved topics:

Where to put each file’s copyright and MIT license so that code can easily find and display it. I used _license = """ ... """ at end of each module, but if there’s an established standard (including module-member name), I’ll use it. Likewise for version history.
Names.
a. Module/file names: We don’t have subdirectories for related modules (or do we?), so I’ve used a prefix. If there’s a standard naming convention, to keep us from stepping on each others’ file names, that’d help.
b. Class and function names: Groups sometimes come up with better names than individuals.

Please feel free to suggest names that seem clearer to you.

In this framework, I conceptually distinguish among several roles a “validation” function may take. They are usually combined and executed in this order:

Sanitizers: These generally filter out “junk” characters (e.g., leading/trailing blanks) that may have been typed or pasted accidentally. Usually doesn’t report an error.
Validators: These do a syntax-check on the (sanitized) text, and report any errors.
Converters: Not always needed, these functions convert the text to a value of a different data type. Subsequent constraints can then be written in terms of this value. When conversion fails, must report an error.
Beautifiers: These convert values back into text, in a readable format, for display and further editing. Usually doesn’t report an error.

Generally, the “upstream” functions make the “downstream” functions much simpler to write and test, and more reliable, too.

Many simple examples are included. The general idea is, use whichever provided functions fit your needs; roll your own (reusable!) functions when they don’t; and easily combine them in series (or parallel!) to fit your needs.

troberts · August 31, 2017, 7:14am

Looks really great!

One extra feature that I would find useful is the ability to run all the error checks regardless of whether one failed and return them all to the user. I was recently writing a project where I read in data from an external program. A lot of it had ‘problems’ and I wanted to be able to feedback all of them to the user, not just the first one. For example, if an item code was too long and had invalid characters I’d want to know about both, not just the first failure (for all lines imported, not just the first). It looks like it would be easy to add this functionality, and I may well do it myself when I get to that part, as it looks really useful.

p.colbert · August 31, 2017, 2:14pm

Thanks!

My whiteboard functionality was deliberately kept as Anvil-independent as possible, so that it could be used in other contexts, like yours. (Or in Anvil’s server side, as well as the client/browser side.)

It’s also extremely open-ended. What you’re looking for is not only feasible, but deliberately easy for you to add.

Your own top-level expert would move any error messages it found into a list, kept under its own name in the whiteboard. The simplest way for you to do this would be to clone two whiteboard_core functions that work together:

require_all
_stop_at_first_validation_failure

under their own, new (descriptive?) names. (report_all() comes to mind.) Then customize your clones to work as desired.

Whatever invokes your expert can then do whatever it pleases with that list.

Procedurally, you may find that some early errors raise questions about later errors. For example, if the text can’t be converted to an integer, does it make any sense to check that the (non-existent) integer is in the range of legal values?

Probably not. In this case, you’ll probably build some mid-level experts, using a mix of report_all() and require_all() to prevent execution of those senseless tests.

p.colbert · November 29, 2017, 4:00pm

Updated link:
https://anvil.works/ide#clone:7SAMSSQEYNCR3LG3=KETE3T5PP67TFJ4GDKW4FG4X

p.colbert · December 19, 2017, 5:26pm

In this case, the concept turns out to be relatively simple. Much simpler than the Wikipedia article suggests. Pretty much what would happen if you were at a real whiteboard, doing a step-by-step solution of the problem; but preserved as code, so that next time, it can run on its own.

Each data-entry field is treated independently. One field, one whiteboard.
Its whiteboard is in the form of a Python dictionary, so that each entry on the whiteboard is clearly and uniquely named. A few names are reserved, by convention, but you can name everything else as you see fit. (Almost) no restrictions on what you can put on the whiteboard.
Each “expert” looks at the whiteboard, to read the item(s) it’s interested in (by name); write new item(s) (by name); and/or judge the result (“aye” or “nay”).
Validation functions (“experts”) are called in sequence. Each only has to do a well-defined part of the job, so it’s simple and (potentially) reusable in similar validation/transformation jobs.

A few tricks are needed in a few cases, but that’s the view from 20,000 feet. The rest of it is just fitting it into Python, and into Anvil’s way of storing and triggering things. Both are remarkably well-suited for this sort of thing.

Although I designed this from scratch, Python has been used this way before. And I was introduced to the underlying ideas decades before Wikipedia.

No formal training in AI, but I did subscribe to a few print magazines back in the day . Wanted to broaden my methods of thinking, as well as programming. And if that led to some programming shortcuts, so much the better.