Import tiktoken

What I’m trying to do:

Trying to import tiktoken on the client side, but getting an error:

ModuleNotFoundError: No module named ‘tiktoken’

Is this package only available on the server side? If yes, why?

What I’ve tried and what’s not working:

I made sure to add tiktoken 0.7.0 to the python packages, running on Python 3.10 (Beta).

Code Sample:

import tiktoken

Client-side Python is transpiled to JavaScript, in order to run in the browser. Not every Python 3.7 feature can be supported this way, and the underpinnings are different, too.

Most third-party Python packages are built to sit on a conventional foundation, instead, and so don’t work in the browser.

That said, it may be possible to repackage some source-only packages as Anvil Dependencies, for use in Anvil apps.

2 Likes

Thanks for the quick response!

tiktoken also has a Javascript library. That can be used in client-side Anvil using the Javascript bridge. Anvil Docs | Accessing JavaScript

2 Likes

That’s super helpful @jshaffstall, thanks! I found a native js version and added it to the Native Libraries:

<script src="https://cdn.jsdelivr.net/npm/js-tiktoken@1.0.14/dist/lite.min.js"></script>

I’m struggling to access it from the client side though, I tried

from anvil.js.window import tiktoken

As in the Anvil doc, but got

AttributeError: ‘Window’ object has no attribute ‘tiktoken’

I then tried to add some event listeners to the Native Libraries section, but that didn’t help.

<script>
  // Wait for the js-tiktoken script to fully load
  window.addEventListener('load', function() {
    if (window.tiktoken) {
      // Expose the methods globally
      window.tiktoken = {
        get_encoding: tiktoken.get_encoding,
        encoding_for_model: tiktoken.encoding_for_model
      };
      console.log("tiktoken loaded and methods exposed");
    } else {
      console.error("tiktoken library not loaded");
    }
  });
</script>

My apologies if this is a total noob question!

Definitely not a noob question, Javascript has a few different ways to use libraries, and it largely depends on who writes it and the context in which they expect it to be used. The way you’re trying to use it is for libraries that are intended to be used in the browser.

From the docs they show code like this import { get_encoding, encoding_for_model } from "tiktoken"; which shows that it’s a module based library intended to be used in things like React, which means you need to use different techniques.

Totally untested, but instead of using the native libraries to include the library you could do something like:

self.tiktoken = anvil.js.import_from("https://cdn.jsdelivr.net/npm/js-tiktoken@1.0.14/dist/lite.min.js")

From there you should be able to access the exports from self.tiktoken, e.g. self.tiktoken.get_encoding. That’s at least the approach I would start with for a module based library. It’ll likely take some inspection of what self.tiktoken provides to get working.

3 Likes

Thanks for the follow up. I gave this approach a try, but could not resolve a problem with anvil.js.import_from also wanting to import base64-js

self.tiktoken = anvil.js.import_from("https://cdn.jsdelivr.net/npm/js-tiktoken@1.0.14/dist/lite.min.js")

ExternalError: TypeError: Failed to resolve module specifier "base64-js". Relative references must start with either "/", "./", or "../".

I then tried an altogether different approach suggested by GPT-o1-preview to bundle js-tiktoken and its dependencies into a single JavaScript file that I added as an asset. That was another rabbit hole that resulted in a heavy file (5.4 mb) that was still not available from anvil.js.window import tiktoken.

At this point I think I will concede and rely on my prior simple word counter which does not detect some edge cases where token counts are way higher than word counts.

If you’re making a server call as part of your flow anyway, you could install the Python version of the library in the server and use it there, too.

2 Likes

That seems like the path of least resistance indeed. I am trying to avoid server calls like the plague though, word counts have to be done quickly to maintain a reasonable UX.

I really appreaciate your and @p.colbert suggestions!

I had some time to play with it tonight, and here’s how to access the tiktoken Javascript library from the client:

        self.tiktoken = anvil.js.import_from("https://esm.sh/js-tiktoken@1.0.14")
        self.encoding = self.tiktoken.getEncoding("gpt2")
        print(self.encoding.encode("Hello World"))

esm.sh is a site that’s designed to package these module based libraries for use like this (thanks to Stu Cork for his excellent post about libraries like this: Importing javascript plugin libraries - #9).

Once you have the import done, you can use print(dir(self.tiktoken)) to see what’s available on the module. getEncoding was one of those that dir showed, and that matched up with the Javascript sample code, so it looks like the import worked. And the encoding prints out something.

Presumably you have a better idea than I do about what to do with it from there.

3 Likes

That’s some jedi s*** right there, @jshaffstall, thanks! All I need from there is to run a count of tokens, as in:

token_count = len(self.encoding.encode(text))

Thanks again!!

1 Like

Glad it’s working!

All the Jedi credit goes to Stu. He’s a wizard with Javascript integrations, I just bookmark his posts about it for future reference.

2 Likes