Anvil Server GPU Passthrough?

SonicPixel · February 3, 2022, 3:07pm

I’m trying to implement Anvil Server on a local Ubuntu server. This is a machine learning app that makes use of GPU. I believe I have resolved all dependencies and when I try to run the app Anvil says it cannot detect the GPU. This is a VM running GPU passthrough, and outside of anvil the GPU is picking up correctly via the nvidia-smi command.

Is there a way to tell anvil to pass in the GPU when running it?
Are there any tensorflow/anvil containers that would allow me to run this as well so as to need to worry about so many dependencies in the future?

ianb · February 3, 2022, 3:37pm

What kind of graphics card are you running, many gaming cards will only work with drivers that disable GPU passthrough if it detects a VM attempting to render graphics.

They don’t want you running a remotely accessible machine for your friends to all share a single gaming card, each using their own VM.
There are now ways and drivers that get around this and are entirely legal, but it may require a whole lot of troubleshooting different drivers and settings to get it to work.

This guy got it to work a few different ways:
Youtube: CraftComputing GPU Passthrough

Oh, also welcome to anvil!

SonicPixel · February 3, 2022, 3:51pm

Hello and thanks for the reply. This is NVIDIA RTX8000P-16Q our GPUs are setup for this type of environment so we should be all set.

The latest error I get is:

[INFO  anvil.app-server.run] [LOG :new-session] {:type browser}
Calling function 'predict_iris' for app '*****' (ID server-mZi3NSN+N+ZIfw==)
2022-02-03 15:46:50.556611: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-03 15:46:50.556648: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-02-03 15:46:51.834219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-02-03 15:46:51.834487: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834538: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834580: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834622: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834664: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834705: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834747: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834788: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-02-03 15:46:51.834800: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[ERROR anvil.app-server.run] Error report from client code:
AnvilWrappedError: GPU device not found

I had to remove the nvidia-cuda-toolkit that I had installed, as it was causing a driver conflict. Clearly some libraries are missing to implement on the Anvil side. I’m thinking I need to run inside of a container with the correct libraries and pass through the GPU to docker. Just not sure the best way to proceed with that idea.

ianb · February 3, 2022, 4:04pm

interesting stuff. I have never tried to mix anvil server and tf together in the same place, and I am not the correct person for running the anvil server stand alone at all, so I will wait for someone else to step in, however…

Is there a reason that anvil and tensorflow have to run in the same place? Have you looked into anvil uplink?