Name Generator based on 700K names

Tom_Ber · September 1, 2022, 1:06pm

I am very new to Anvil and created my first app which is a Name Generator.

On a local Python Notebook on my computer I uploaded different csv-files with names. After cleaning the data eventually there is a DataFrame with over 700’000 names. With a function the user gets a selection of names. With Anvil uplink I connected the Python notebook to the anvil app and it works fine as long as my notebook on my computer is running. However, since I want to share this app with others I need it to work permanently.

How could I have my app running all the time so it does not depend on my local Python notebook? Would I need to create a table in Anvil with a paid plan or is there another option?

Thank you for your help!

divyeshlakhotia · September 1, 2022, 1:10pm

Anvil’s free plan offers 50k rows. Besides, you get free hosting too and everything will remain live 24×7.

However, in your case, you’ll have to upgrade your plans or limit your data to 50k

divyeshlakhotia · September 1, 2022, 1:13pm

Apologies. I read the number wrong. Edited the answer.

stefano.menci · September 1, 2022, 2:13pm

This would not work if you need to sort or search the names, but if you need random names an ugly trick could be to create a table with one simple object column, then add 1000 rows, each with a list of 700 names.

ianb · September 1, 2022, 2:57pm

Or even 26 rows, just lists containing all the same 1st letter

p.colbert · September 1, 2022, 7:01pm

If we’re assuming English-only …

ianb · September 1, 2022, 7:21pm

…just to be clear, grouping them by first letter is actually not a good way to get a randomized sample, because it will over-sample the names with the smallest first letter frequency, instead of each name having a 1 in 700000 chance to show up.

@stefano.menci 's answer is the real one if you just wanted a low number of rows and just want random names. But if you are doing this, I would have pandas create the 1000 (non overlapping) random samples to be put into the rows, and not just put them in sorted in some ordinal way.

Tom_Ber · September 2, 2022, 12:40pm

Many thanks for all your answers so far.

I now reduced my record of names to 50.000 and loaded them up into a Anvil table.
In a next step I created a DateFrame in a server module as follows:

df = pd.DataFrame(app_tables.names_table.search())

And as a test I created this function to display a random name:

@anvil.server.callable
def test():
    return df.Name.sample(1).iloc[0]

Then, in the client code I called the function like this:

  def __init__(self, **properties):
    self.placeholder.content = anvil.server.call('test')

This results in the following error code:

anvil.server.TimeoutError: Server code took too long at [Form1, line 30]

Any ideas on how to solve this issue are highly appreciated.

nickantonaccio · September 2, 2022, 1:29pm

I have a little typing tutor app with a list of ~54000 words. I just store and load that list in the code of a server module - it stays in memory, and I access it with server calls. Is there anything keeping you from uploading your file and keeping the dataframe in memory on the server (you have to run pandas on the server anyway)?

My only concern, would be that this could put some undue strain on memory and CPU use in a shared account.

nickantonaccio · September 2, 2022, 1:31pm

To be clear, I load the data from a Python list stored in code in a server module, as opposed to loading from a data table.

ianb · September 2, 2022, 1:34pm

2 things,

Using 100% of the 50 k limit is a really bad idea. You will not be able to use data tables for anything else on the free plan across any other apps that you create on your account in totality which is probably really bad for developing anything, including even having a secure users service. (It uses a data table)

This code is asking that you have the server digest the entirety of a 50,000 row data table and turn it into a data frame (I’m not sure where you are doing this actually, but if it’s in the global scope and not a function that’s also not great).
The way this is coded highlights all of the weaknesses of a data table, and uses none of its strengths.

To use the data table strengths you should be using the python random module to pick an index number from 0 to the length of your search iterator, then pull a sample name from that iterators index.

Like:

from random import randint

#  Yes, You can refactor this for brevity, I wrote it with clarity in mind.
def get_random_name_from_rows():
  rows = app_tables.names_table.search()
  
  random_index_int = randint(0, len(rows) - 1)
  
  return rows[random_index_int]['name_of_name_column']

nickantonaccio · September 2, 2022, 1:36pm

To handle the timeout:

Basically, you can decorate your test() function with:

@anvil.server.background_task
def test():
    return df.Name.sample(1).iloc[0]

And then create a new function to launch it:

@anvil.server.callable
def launch_slow_request_task():
  task = anvil.server.launch_background_task('test')
  return task

nickantonaccio · September 2, 2022, 1:38pm

You shouldn’t get a timeout, however, if you’re just loading a list of 50,000 words.

ianb · September 2, 2022, 1:45pm

@nickantonaccio 's method is probably the best way to go if you are always going to use these names in your app.
If you don’t like the clutter of a 50k word list in your server module code, create a second server module containing just this list of names and import it into your main server module code like explained here:

The loading and unloading through the iteration of 50k data table rows every time the function is called is most likely what is timing out, for such a (relatively) small amount of memory to load.

stefano.menci · September 2, 2022, 2:10pm

If all you need is one random name, then you could add one name_id column with a sequential number, then pick a random row, so you only read one row:

name = app_tables.names_table.get(name_id=random.randint(0, 50000))

Or, if you have one table with 1000 rows, each with 700(-ish) names:

name_list = app_tables.names_table.get(name_id=random.randint(0, 1000))
name = name_list[random.randint(0, len(name_list)]

EDIT
I just realized that his technique is not really random if the number of names on each row is different.

For example, if you have one row with 700 names and one row with 350 names, each name on the former row has half the chances of being picked than each of the names on the second row.

The solution is simple: just make sure each row has the same number of names (350 vs 700 generates a noticeable bias, but 699 vs 700 is virtually the same number).

Tom_Ber · September 5, 2022, 2:44pm

I now uploaded my names in a Anvil Data Table as follows:

Now there are only 179 rows. As you can see there are lists of names in the column Name.

My goal is now to build an app that returns random names based on the parameters gender, length, amount of names and the starting letter. These parameters sould be chosen by the user with a drop down:

How should I write my code in the Server Module to access the Data Table so that I achieve this most efficiently? I tried creating a Pandas DataFrame and return a sample of it but this attempt failed.

@anvil.server.callable
def name_function(letter, length, gender, amount):
    selection = df[(df['Anfangsbuchstabe'] == letter) &
                   (df.Laenge == length) &
                   (df.Gender == gender)]['Name'].values[0]
    sample = random.sample(selection, amount)
    return sample

divyeshlakhotia · September 5, 2022, 4:23pm

As was mentioned previously, using pandas might not be a good suggestion.

You should consider modifying your code to make it compatible with Anvil’s Data Tables.

But if you want to stick to pandas (not something I will personally recommend), please provide some more details

Tom_Ber · September 5, 2022, 7:37pm

I am very open to use another way than pandas that is compatible with Anvil’s Data Tables.

What would be your concrete suggested solution?

stefano.menci · September 5, 2022, 7:47pm

Something like this should work:

anvil.server.callable
def name_function(letter, length, gender, amount):
    row = app_tables.names.get(
        Gender=gender,
        Anfangsbuchstabe=letter,
        Laenge=length)
    name_list = row['Name']
    return random.sample(name_list, amount)

Tom_Ber · September 5, 2022, 8:12pm

Thanks.

When I use this function in the server code and the following function in the client code I get the following error stated below.

   def button_1_click(self, **event_args):
    
    result = anvil.server.call('name_function',
                               letter = self.letter.selected_value,
                               length = self.length.selected_value,
                               gender = self.gender.selected_value,
                               amount = int(self.amount.selected_value)               
                              )
    self.results.text =  result

error:

TypeError: ‘NoneType’ object is not subscriptable at [ServerModule1, line 48](javascript:void(0)) called from [Form1, line 17](javascript:void(0))