General, Programming, Technical

Tensorflow, Ubuntu, and NVidia Drivers. Getting it All Working.

What. A. Mess.

I recently wanted to repurpose an old gaming machine by strapping a NVidia GeForce RTX 3050 to its backside, giving it some racing stripes (obviously to make it go faster), and making it my machine learning model-training workhorse: A noble and achievable goal…. oh such was my naiveté.

Now, yes, an RTX 3050 isn’t exactly the AI/ML card of choice, but it’s not *nothing* either. After the process below was done, it took my basic MNIST image classification training model from 20-30 minutes of training to about 10-30 seconds — a pretty decent bump in speed for fun models. For anything else, if you are lucky to have access to a University, there are beast machines to be used, but you have to request permission, get time slots allotted, and then you can build your next ChatGPT competitor.

The problem that I immediately ran into was that my OS of choice, Ubuntu, was a little behind when it came to drivers. This caused me to spend a week or two trying to match NVidia drivers with Ubuntu OS versions and kernel versions, then match those up with Cuda and cudnn versions: Not much fun, and so I decided to save any future enthusiasts the trouble.

Ultimate Requirements

  • Python 3.12
    • This is because Tensorflow 2 doesn’t seem to work with Python 3.13+. Even 3.12 is a stretch based on the official Tensorflow docs which state that Python 3.8–3.11 is required.
    • This forces us to use Ubuntu 24.04 as Ubuntu 25.04 comes with Python 2.13 and does not include 3.12 in its sources list. Even the deadsnakes ppa solution didn’t work for me — was easier to simply downgrade the OS to 24.04 LTS
  • Ubuntu 24.04 Server (LTS)
  • The Proper NVidia Drivers
    • I had trouble with the NVidia drivers. The only one that worked for me and my GeForce RTX 3050 was “NVIDIA-Linux-x86_64-570.169”. This is NOT what you will get recommended to you if you run the command “ubuntu-drivers devices” and if you use “sudo ubuntu-drivers autoinstall” — Which was a very frustrating ordeal.
    • Go through the selection tool to find the drivers for your particular setup
  • The Proper CUDNN Library
    • This was a simple enough install once I figured out what was needed based on your cuda version. You can find your cuda version by running ‘nvidia-smi’ after installing the proper nvidia drivers for your card.
    • In my case, I needed “cudnn-local-repo-ubuntu2404-9.10.1_1.0-1_amd64.deb”

The Commands

Straight to brass tacks: The below bash commands worked for the following setup and might work (perhaps with minor modifications) for others:

# OS:     Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-63-generic x86_64) (Server)
# CARD:   GeForce RTX 3050
# NVIDIA DRIVER:  NVIDIA-Linux-x86_64-570.169
# CUDNN:          cudnn-local-repo-ubuntu2404-9.10.1_1.0-1_amd64.deb

# TESTED and works for RTX 3050 on Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-63-generic x86_64) (Server)
# For some reason driver may disappear after update...
# so check if your drivers are messed up after you do any "sudo update; sudo upgrade" 

# Assumption: RTX 3050 Installed on Ubuntu 24.04 LTS

# STEP 1: Initial cleanup of any old drivers
# Get rid of ALL nvidia drivers and cuda content before trying to install new drivers/libraries
# NOTE: REVIEW this if you are not on Ubuntu 24.04
sudo apt remove --purge "*cublas*" "cuda*" "nsight*" -y
sudo apt remove --purge '^nvidia-.*' -y
sudo apt-get remove --purge nvidia* -y
sudo apt remove --purge '^libnvidia-.*' -y
sudo apt remove --purge '^linux-objects-nvidia-.*' -y
sudo apt remove --purge '^linux-signatures-nvidia-.*' -y
sudo apt autoremove -y
sudo apt autoclean
sudo apt-get remove --purge '^nvidia-.*'
sudo rm /etc/X11/xorg.conf
sudo nvidia-uninstall
sudo reboot

# STEP 2: NVIDIA DRIVERS
# Download the NVidia driver
# If you have a GeForce RTX 3050, you can run this command to grab it:
# wget https://us.download.nvidia.com/XFree86/Linux-x86_64/570.169/NVIDIA-Linux-x86_64-570.169.run
# Install NVidia driver -> Select the Open Source license (MIT? Not proprietary) -> agree (a few times)
sudo ./NVIDIA-Linux-x86_64-570.169.run
sudo reboot

# This checks that your driver has installed -- it should show you driver version, cuda version, etc.
# NOTE what cuda version you have as you may need to change what cudnn library you get because of it
nvidia-smi

# STEP 3: CUDNN
# Move on to installing CUDNN
sudo apt install nvidia-cuda-toolkit -y
nvcc --version

# Check the below link's version here: https://developer.nvidia.com/cudnn-downloads
# If you need a different version, just replace the wget link with the one you need -- and then modify the following dpkg and cp lines to reflect the changes
wget https://developer.download.nvidia.com/compute/cudnn/9.10.1/local_installers/cudnn-local-repo-ubuntu2404-9.10.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.10.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.10.1/cudnn-*-keyring.gpg /usr/share/keyrings/

sudo apt-get update
sudo apt-get -y install cudnn

# CHECK: Should show you where it was installed. Mine were /usr/lib/cuda and /usr/lib/cuda/lib64 -- I then added these to my paths below
whereis cuda

# If your paths are different to what I got in the whereis step, replace them here so that the system can find your cuda install
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/cuda:/usr/lib/cuda/lib64' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/lib/cuda' >> ~/.bashrc
echo 'export PATH=/usr/lib/cuda/bin:$PATH' >> ~/.bashrc
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda' >> ~/.bashrc
source ~/.bashrc

# STEP 4: Python and Tensorflow
# Set up python venv and install tensorflow
sudo apt install python3.12-venv -y
python3 -m venv generalvenv
source generalvenv/bin/activate
pip install tensorflow

# TEST if tensorflow detects GPU
# Go into python (in your new virtual env)
python

# in python...
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# This should show you at least one detected GPU
# Success!

Good luck out there.

Standard