Python trie implementation - efficiently search trie based on prefixes

Hi,

I have a python question:

I found different python codes on the internet that create and search a trie at the same time (the trie created at run time), but, none of these codes search an existing trie and produce a list of possible matches, I tried to create a code based on those trie codes, so I can search an existing trie, but for some reason, it is not working for me.

Here is the code

class TrieNode:
    def __init__(self):
       # self.children is the existing trie
        self.children = {'a': {'*': '*', 'p': {'p': {'l': {'e': {'*': '*'}}}}, 'n': {'g': {'l': {'e': {'*': '*'}}, 'e': {'l': {'*': '*'}}}}}, 'b': {'a': {'t': {'*': '*', 's': {'*': '*'}}}}}

    def all_words(self, prefix):
        if self.children[w] == "*":
            yield prefix
        print (self.children.items())
        for letter, child in self.children.items():
            yield from child.all_words(prefix + letter)

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def all_words_beginning_with_prefix(self, prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            if cur is None:
                return  # No words with given prefix
        print (cur)    
        yield from cur.all_words(prefix)


trie = Trie()
print(list(trie.all_words_beginning_with_prefix('ap')))

Please review the code and advise on how I can create a method that searches an existing trie.
Thank you for your help :slightly_smiling_face:

Hello,

Have you tried this library?

Here is some code that I found online to create and search a trie:

class Trie:
    head = {}

    def add(self, word):
        cur = self.head
        for ch in word:
            if ch not in cur:
                cur[ch] = {}
            cur = cur[ch]
        cur['*'] = True

    def search(self, word):
        cur = self.head
        for ch in word:
            if ch not in cur:
                return False
            cur = cur[ch]

        if '*' in cur:
            return True
        else:
            return False

Let’s add a few nodes:

t=Trie()
t.add('hi')
t.add('hello')

Now, let’s search the trie and see the result:

print(t.search('hi'))
print(t.search('hello'))
print(t.search('hel'))

#returns
True
True
False

Is this what you are looking for? If not, please provide an example trie and the expected result.

Hi @alcampopiano,

Thank you for replying. This library doesn’t do what I am looking for. I am trying to make a fast autocomplete feature using trie, the problem is when the trie created at the same time of the search, it takes a longer time to load/present results, like this code:

https://BPL4V5G2KQRNJYMT.anvil.app/DQI5IK3YPTP5FT2ISYV5ZAZI

But if I have the trie ready from the beginning, the search procedure will be very fast, and you can see that when you click the button.

Thank you for looking into this.

That app has a button that doesn’t do anything when I click it, regardless of what I type into the search box (are there specific terms I am supposed to type?).

Can you share the actual clone so I can see the code (if possible)?

sorry I copied the wrong link :blush: , here is the link:
https://anvil.works/build#clone:BPL4V5G2KQRNJYMT=BIRTABIFEXS6EWQBYJNE43CE

it should print the results on the output when you click the button.

Thanks

I decided to come at this from a different angle.

If you are looking for an autocomplete feature based on a known list of strings, then could you take a simpler approach?

For example:

prefix='ri'

words=['peter', 'petrol', 'peewie', 'purdie', 
       'paul', 'mary', 'ringo', 'rant', 
       'ringer', 'pingo', 'dingo', 'go']

results=[x for x in words if x.startswith(prefix)]

#returns
['ringo', 'ringer']

Now we can extend that basic example into an Anvil app:

clone:
https://anvil.works/build#clone:LNKHIF7M7K4R4FH7=Y6RLNKHP5CWIXX77FOIWRNZ5

I hope this helps.

Thank for the suggestion, the problem is that I am dealing with a huge list of names (as you can see on the app, which is just one of many lists), and unfortunately the direct approach takes a very long time for search (I even tried generators, but it is still slow), this is why I am trying to use python trie, which is faster if I can find away to search an existing trie without the need for making it at the run time.

I used your code with generator:
https://anvil.works/build#clone:AFFJQ2QTY7IXDMI4=6ONP2RQV3ASFF7AVXBVYRK36

As you can see it is slow.

But with the trie (assuming that I have the trie ready from the beginning) when you click the button it doesn’t take a second.
https://anvil.works/build#clone:BPL4V5G2KQRNJYMT=BIRTABIFEXS6EWQBYJNE43CE

Hi there,

I don’t have time to dig in too deeply at the moment, but I did find a library that works on your list instantly (at least on my computer).

That is, it returns all valid strings with a given prefix:

# build the trie
t=pytrie.StringTrie()
for w in words:
    t[w]=w


# search the trie (returning list of valid words given a prefix)
prefix='WEL'
t.values(prefix)

# returns
['WELCHOL', 'WELLBUTRIN', 'WELLBUTRIN SR', 'WELLBUTRIN XL', 'WELLCOVORIN']

I know you’ll need this in Anvil, but I’m sure it is possible to find the right trie algorithm with enough internet searching. If I have time later, I’ll see what else I can find.

@adhamfaisal

Okay, I found a nice implementation of trie that works directly in an Anvil app.

As you can see here, on your very large list, the autocompletion works instantly, returning a list of a prefix matches in the trie.

I have your words stored in a module, but this causes the app to load very slowly. You will want to put the words somewhere else (perhaps in a DataTable with a “please wait” Notification for your users during the initial load).

clone:
https://anvil.works/build#clone:A7T6NRQIS5RDKXGK=3T6LAP6WSNC7CLKBSBL5AK4O

Does this work for you?

1 Like

Hi @alcampopiano,
Thank you for the app. I think it loads very slowly because of the “trie creation”, as you can see on the app I shared before
https://anvil.works/build#clone:BPL4V5G2KQRNJYMT=BIRTABIFEXS6EWQBYJNE43CE

As you can see it is very slow when starts and then when you click the button it instantly prints a list of possible matches (like your app), because the trie is already made.

I have been trying to figure it out for about a week with no luck (I am still a beginner :slightly_frowning_face:), I found this code (which creates a trie, and only check if the full word present on the trie or not):

_end = '*'

# takes in a list of words
def make_trie(*words):
    # creates our root dict()
    trie = dict()

    for word in words:
        # create a temporary dict based off our root
        # dict object
        temp_dict = trie

        for letter in word:
            # update our temporary dict and add our current
            # letter and a sub-dictionary
            temp_dict = temp_dict.setdefault(letter, {})

        # If our word is finished, add {'*': '*'}
        # this tells us our word is finished
        temp_dict[_end] = _end
    return trie
def find_word(trie, word):
    sub_trie = trie
    #print (sub_trie)
    mylist = []
    for letter in word:
        if letter in sub_trie:
            #print (sub_trie[letter])
            sub_trie = sub_trie[letter]
            print ("******************************")
            print (sub_trie)
            
            mylist.append(letter)
            print (mylist)
            
        
        else:
            return False
    
    else:
        if _end in sub_trie:
            return True
        else:
            return False


mytrie = make_trie("hi",'hello', "home", "I am", "her")
print (mytrie)

You can test it here: https://repl.it/@adhamfaisal/LimeElaborateSupport

and another code which can make a trie and does the search at the run time, I tried to modify it to search an existing trie without any luck :pensive:

class TrieNode:
    def __init__(self):
        self.end = False
        self.children = {}

    def all_words(self, prefix):
        if self.end:
            yield prefix
        #print (self.children.items())
        for letter, child in self.children.items():
            yield from child.all_words(prefix + letter)

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        curr = self.root
        for letter in word:
            node = curr.children.get(letter)
          
            if not node:
                node = TrieNode()
                
                curr.children[letter] = node
            curr = node
        curr.end = True
        #print (curr.children)

    def search(self, word):
        curr = self.root
        for letter in word:
            node = curr.children.get(letter)
            if not node:
                return False
            curr = node
        return curr.end

    def all_words_beginning_with_prefix(self, prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            #print (cur.children)
            if cur is None:
                return  # No words with given prefix
        #print (cur)    
        yield from cur.all_words(prefix)

trie = Trie()

trie.insert('foobar')
trie.insert('foo')
trie.insert('bar')
trie.insert('toob')
trie.insert('foof')
#print (trie)
#print (trie.root.children)

print(list(trie.all_words_beginning_with_prefix('foo')))
print (trie.search("bar"))

You can run it in this link: https://repl.it/@adhamfaisal/ClearLimegreenCharmap

I am sorry for the headache

Just to clarify, the initial issues were related to search speed and returning a list of results:

and

I believe we have addressed these issues.


What is the remaining issue?

If the initial load time is the remaining issue, I don’t have a solutions for that. In fact, it might even be a trade-off that has to be accepted. That is, the trie is great for searching, but takes a while to create initially. I’m not aware of any way to precompute the trie object in Anvil prior to loading the app. Others may know of a way.

As far as I can tell, the only remaining issue is the initial load time of the app due to the creation of the object. Please correct me if I am wrong and give a clear description of what you would like to achieve.


A couple other friendly notes about asking questions:

and

If possible, try to be very specific about what is not working and that may help others to identify the problem. For example,

  • are you getting an error (what is the error)
  • is the initial load time slow
  • are the returned search values not what you expect
  • is searching too slow
  • etc…

Also, when writing python code in the forum, you can get nice syntax highlighting (which makes it easier for people to read), by wrapping your code in triple backticks with the word “python”. For example,

```python

print('this will be syntax highlighted')

```
1 Like

Thank you @alcampopiano for looking into this and thank you for the tips. You are right, my problem is the slow loading time due to trie creation, what I am trying to achieve is creating a method that can search an existing trie instead of creating a new one every time the app starts, for example


trie = {'a': {'*': '*', 'p': {'p': {'l': {'e': {'*': '*'}}}}, 'n': {'g': {'l': {'e': {'*': '*'}}, 'e': {'l': {'*': '*'}}}}}, 'b': {'a': {'t': {'*': '*', 's': {'*': '*'}}}}}

If I can bypass the trie creation, I think I can avoid the slow loading problem (especially that I am using a fixed trie). I tried to modify one of the codes that do both (create a trie and search it) as in this example

class TrieNode:
    def __init__(self):
       # self.children is the existing trie
        self.children = {'a': {'*': '*', 'p': {'p': {'l': {'e': {'*': '*'}}}}, 'n': {'g': {'l': {'e': {'*': '*'}}, 'e': {'l': {'*': '*'}}}}}, 'b': {'a': {'t': {'*': '*', 's': {'*': '*'}}}}}

    def all_words(self, prefix):
        if self.children[w] == "*":
            yield prefix
        print (self.children.items())
        for letter, child in self.children.items():
            yield from child.all_words(prefix + letter)

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def all_words_beginning_with_prefix(self, prefix):
        cur = self.root
        for c in prefix:
            cur = cur.children.get(c)
            if cur is None:
                return  # No words with given prefix
        print (cur)    
        yield from cur.all_words(prefix)


trie = Trie()
print(list(trie.all_words_beginning_with_prefix('ap')))

Which gave me this error:

Python 3.7.4 (default, Jul  9 2019, 00:06:43)
[GCC 6.3.0 20170516] on linux
Traceback (most recent call last):
  File "main.py", line 28, in <module>
    print(list(trie.all_words_beginning_with_prefix('ap')))
  File "main.py", line 20, in all_words_beginning_with_prefix
    cur = cur.children.get(c)
AttributeError: 'dict' object has no attribute 'children'

here is the link: https://repl.it/@adhamfaisal/BaggyTransparentRuby

Sorry again for the headache

@adhamfaisal

Okay, I think I have it here.

This clone does the following:

  • allows you to build a trie from a list of strings (see the build_trie function)
  • saves the trie in a DataTable as JSON (and returns it as a dictionary)
  • autocomplete search terms using the precomputed trie (see Module1)

I did not use the code above exactly, but mine is similar (e.g., there are no classes).

You will have to find a way of getting your large list of words into the build_trie function. However, that should be easy, as you can just run it on your computer and paste it into the DataTable if you want.

Once you have your trie saved in the DataTable, you never have to compute it again.

Here you can see a precomputed trie being searched and a new one being built.

clone:
https://anvil.works/build#clone:A7T6NRQIS5RDKXGK=3T6LAP6WSNC7CLKBSBL5AK4O

Let me know if this helps.

3 Likes

You are awesome :star_struck: :star_struck: :star_struck: :star_struck: :star_struck:
Thank you so much, this was a headache for me, I really appreciate it.
Sorry for the late reply I was traveling.
Thank you again.

1 Like