The best practice for storing data (datatables, data files, etc, etc.)

emma1 · April 7, 2023, 10:36pm

Hi All - first really appreciate the community and the functionality that lets marginally competent coder like me make something cool! This is probably a CS101 type thing - but I’m a mechanical engineer playing coder, so feel free to talk to me like I’m 6 years old.

This question isn’t specific to a piece of code per se, but the number of options for storing user data on anvil is kind of having me scratch my head on the best choice. I could try these things empirically to see how they perform, but I suspect someone here already has a pretty good clue:

What I’m trying to do:

I’m trying to evaluate different data storage schemes - and thinking about datatables vs. datafiles vs. external databases - AND the data structure within.

I expect that recent data - and data from a year ago for the same date range will be accessed occasionally by users using the client app - but mostly will be used by the server for forecasting, etc.

I’ve read lots of docs and tried examples, but I’m just not 100% clear on the best overall practice in general, and within anvil specifically. What are the deciding factors for putting your data in a datatable vs. a data file vs. an external database?

Once you’ve made that decision is there a data structure that makes the most sense (again within anvil specifically) for better performance, less code, extensibility, whatever based on the options below:

Option A:

for this user:
Option A1: (This one seems wasteful - but trivial to bring into a datagrid, dataframe, etc.)
	All data is in one file, every entry is explicitly labeled - one row per data element (CSV style)
	filename = all my data
	‘year’	‘month’	‘day’	‘hour’	‘data_type’	value
	‘year’	‘month’	‘day’	‘hour’	‘data_type’	value
	‘year’	‘month’	‘day’	‘hour’	‘data_type’	value

Option A2:
	Hourly data is retrieved by position (Dictionary Style)
	filename = all my data
	‘year’	‘month’	‘day’	‘data_type’	value1	value2	valuen

Option A3:
	Every type of data is in its own file (Dictionary with more files to open)
	filename = myspecificdata_type
	‘year’	‘month’	‘day’	‘hour’	value1	value2	valuen

	Something else?


Option B:
for all users:
	All data is in one file, every entry is explicitly labeled - one row per data element (CSV style)
	filename = all my data
	‘year’	‘month’	‘day’	‘hour’	‘user’	‘data_type’	value
	‘year’	‘month’	‘day’	‘hour’	‘user’	‘data_type’	value
	‘year’	‘month’	‘day’	‘hour’	‘user’	‘data_type’	value

etc.

jshaffstall · April 7, 2023, 11:05pm

Welcome to the forum!

Data files are a fairly recent Anvil feature intended primarily to make it easier to upload machine learning models and use them in server code. They’re not intended for storing dynamic data.

So your choice is between Anvil data tables or an external database. You should really stick with Anvil data tables unless you come up with a compelling scenario that points you toward an external database (needing complex queries, for example, is one reason people will head in that direction).

When you’re thinking about how to structure your data, think in terms of having a data table per type of object you’re storing information about. For example, a Customers data table, an Orders data table, etc. Each table will have columns specific to that table.

emma1 · April 8, 2023, 2:12am

Thanks - didn’t really understand the role of data files - that helps clarify things.