Anvil Server GPU Passthrough?

I’m trying to implement Anvil Server on a local Ubuntu server. This is a machine learning app that makes use of GPU. I believe I have resolved all dependencies and when I try to run the app Anvil says it cannot detect the GPU. This is a VM running GPU passthrough, and outside of anvil the GPU is picking up correctly via the nvidia-smi command.

  • Is there a way to tell anvil to pass in the GPU when running it?
  • Are there any tensorflow/anvil containers that would allow me to run this as well so as to need to worry about so many dependencies in the future?

What kind of graphics card are you running, many gaming cards will only work with drivers that disable GPU passthrough if it detects a VM attempting to render graphics.

They don’t want you running a remotely accessible machine for your friends to all share a single gaming card, each using their own VM.
There are now ways and drivers that get around this and are entirely legal, but it may require a whole lot of troubleshooting different drivers and settings to get it to work.

This guy got it to work a few different ways:
Youtube: CraftComputing GPU Passthrough

Oh, also welcome to anvil! :wave:

Hello and thanks for the reply. This is NVIDIA RTX8000P-16Q our GPUs are setup for this type of environment so we should be all set.

The latest error I get is:

[INFO  anvil.app-server.run] [LOG :new-session] {:type browser}
Calling function 'predict_iris' for app '*****' (ID server-mZi3NSN+N+ZIfw==)
2022-02-03 15:46:50.556611: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-03 15:46:50.556648: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-03 15:46:51.834219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-03 15:46:51.834487: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834538: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834580: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834622: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834705: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834747: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834788: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834800: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[ERROR anvil.app-server.run] Error report from client code:
AnvilWrappedError: GPU device not found

I had to remove the nvidia-cuda-toolkit that I had installed, as it was causing a driver conflict. Clearly some libraries are missing to implement on the Anvil side. I’m thinking I need to run inside of a container with the correct libraries and pass through the GPU to docker. Just not sure the best way to proceed with that idea.

:thinking: interesting stuff. I have never tried to mix anvil server and tf together in the same place, and I am not the correct person for running the anvil server stand alone at all, so I will wait for someone else to step in, however…

Is there a reason that anvil and tensorflow have to run in the same place? Have you looked into anvil uplink?