HPC Series: How to use Ollama on a HPC cluster

HPC

Ollama

In this guide, I will introduce how to use Ollama with a university’s HPC cluster

Author

Wei Miao

Published

February 16, 2025

Modified

June 30, 2025

1 Pre-requisites

Access to your university’s HPC cluster. For instance, at UCL, students and staff can apply for an account to use the UCL HPC cluster through this link. Also, note that you may need a VPN connection to access the HPC cluster from outside the university network.
Working knowledge of how to use the HPC cluster or Linux OS. For instance, you should be comfortable with using shell commands, submitting jobs, and managing files on the cluster. If you are new to the HPC cluster, I highly recommend that you could refer to the UCL HPC documentation.

2 Method 1: Use Ollama as a Container Image

In this first method, I will show you how to set up Ollama on the UCL HPC cluster using the container image method. This is the most straightforward method to set up Ollama on the UCL HPC cluster.

Why container image? Because different University HPC clusters have different configurations (different Linux distros etc.), and the container image method is a portable way to set up Ollama on the UCL HPC cluster. You can think of the container image as a pre-packaged version of Ollama that contains all the necessary dependencies and configurations to run Ollama on the UCL HPC cluster.

More importantly, University HPC clusters usually use outdated versions of software (for stability reasons), and the container image method allows you to use the latest version of Ollama without worrying about the underlying software versions on the HPC cluster. For instance, the UCL HPC cluster uses RHEL 7 (CentOS 7) as the operating system, which is quite outdated. However, with the container image method, you can run the latest version of Ollama without worrying about the underlying software versions on the HPC cluster.

Caution

UCL HPC’s glibc version is 2.17, which is quite old. Therefore, if you directly install Ollama on the UCL HPC cluster, you will encounter the following error message:

ollama: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by ollama)

Till date (June 30, 2025), the issue is still not resolved. See github issue in this link. Therefore, the container image method is the recommended way to set up Ollama on a HPC cluster with an old version of glibc.

2.1 Step 1: Connect to the UCL HPC cluster

First, you would need to have access to UCL’s internal network. You can do this by connecting to the UCL VPN. Alternatively, if you are on campus, you can connect to the UCL network directly.

Then, you need to SSH into the UCL HPC cluster.¹. Depending on your OS, you can launch the terminal and type the following command. You can refer to this page for more information on how to connect to the UCL HPC cluster.

ssh UCL_ID@myriad.rc.ucl.ac.uk

Note

ssh is the command to initiate an SSH connection to a remote server.
UCL_ID is your UCL user ID, e.g., ucabxyz.
myriad.rc.ucl.ac.uk is the hostname of the UCL HPC cluster.
UCL_ID\@myriad.rc.ucl.ac.uk is the full address to connect to the UCL HPC cluster, meaning you are connecting to the myriad.rc.ucl.ac.uk server with the UCL_ID account.

You will be prompted to enter your password. You won’t see the password as you type it, but it’s being entered. Press Enter after you’ve typed your password.

If this is your first time connecting to the UCL HPC cluster, you will be prompted to accept the RSA key fingerprint. You can type yes and press Enter to accept the RSA key fingerprint.

Once you are connected to the UCL HPC cluster, you will see the following welcome screen.

2.2 Step 2: Load the necessary modules for building container images

Once you are connected to the UCL HPC cluster, you need to load the necessary modules to use Ollama.

Note

HPC clusters usually have a module system to manage software packages. You can think of the module system as a way to load and unload software packages on the HPC cluster. For instance, if you need to use python, you can load the python module by typing module load python. Then python will be available in your terminal. You can refer to this page for more information on how to use the module system on the UCL HPC cluster.

In this guide, I’m using a module called apptainer to pull the Ollama container image from the Docker Hub and build it. Apptainer is a tool to manage container images on the UCL HPC cluster. You may have used Docker before, and apptainer is a similar tool to manage container images on HPC clusters.

As mentioned, because HPC clusters use a module system, you need to load the apptainer module before you can use it. You can do this by typing the following command in the terminal. For more information on how to use apptainer on UCL’s cluster, you can refer to this page.

# This is to load the apptainer module
module load apptainer

# Create a directory to store the Ollama models
mkdir -p ~/Scratch/ollama/models

Note

mkdir -p is a command to create a directory. The -p flag is used to create the parent directory if it doesn’t exist.
~/Scratch/ollama/models is the path to the Ollama models you will download later. For UCL users, on our disk on the cluster, we have a directory called Scratch where we can store large files. We will store the Ollama models in the Scratch directory.

Next, we need to set some additional environment variables to make the apptainer tool work for Ollama. When building or running container images, apptainer will look for these environment variables to determine where to store the models and logs.

You can copy paste the following command into your terminal and hit enter to set the environment variables.

# This is the path to the Ollama models you will download later
export OLLAMA_MODELS="~/Scratch/ollama/models"

# This is the path to the Ollama logs. You can change the log level to "debug" to see more logs
export OLLAMA_LOG_LEVEL="error"

Note

export is a command to set environment variables in the shell.
OLLAMA_MODELS is the environment variable to set the path to the Ollama models. You can change the path to your preferred location.
OLLAMA_LOG_LEVEL is the environment variable to set the log level of Ollama. You can change the log level to debug to see more logs. “error” will only show error logs.

2.3 Step 3: Pull the Ollama container image

In this tutorial, we will not use apptainer build to build the Ollama container image from scratch, as it requires a lot of time and resources to build the image (However, the benefit is that you can customize the image to your needs if you choose to build the image from scratch).

Instead, we will directly pull the Ollama container image from the Docker Hub, and then use the apptainer tool to convert the Docker image to a Singularity image format (SIF) file.²

The most recent version of the image is tagged as ollama/ollama:latest. You can type the following command to pull the Ollama container image from the Docker Hub.

apptainer pull ollama-latest.sif docker://ollama/ollama:latest

After running the command, the apptainer tool will pull the Ollama container image from the Docker Hub and convert it as ollama-latest.sif in the current directory. Internally, it will download the image to a temporary location and then save it as ollama-latest.sif in the current directory.

If you would like to pull a specific version of the Ollama container image, you can specify the tag. For instance, if you would like to pull the Ollama container image tagged as 0.5.11 (which is the latest version as of Feb 16, 2025), you can type the following command.

apptainer pull ollama-0.5.11.sif docker://ollama/ollama:0.5.11

Note

apptainer pull is a command to pull the Ollama container image from the Docker Hub.
ollama-0.5.11.sif is the name of the Ollama container image to be saved in the current directory.
docker://ollama/ollama:0.5.11 is the address of the Ollama container image on the Docker Hub. The image is tagged as 0.5.11.

2.4 Step 4: Run Ollama on the UCL HPC cluster

Once the image is pulled, you can run Ollama on the UCL HPC cluster. You can type the following command to run Ollama as a background service on the UCL HPC cluster.

apptainer run --nv ~/ollama-0.5.11.sif &

Note

apptainer run is a command to run the Ollama container image on the UCL HPC cluster.
--nv is a flag to enable GPU inference. Note that you would need to request a session on the GPU nodes to use this flag. See below.
~/ollama-0.5.11.sif is the path to the Ollama container image you pulled earlier. Note that this is the path to the pulled image we have saved in the current directory.
& is a command to run the Ollama container image as a service in the background. This way, you can continue to use the terminal while Ollama is running in the background.

Note that by default, if you are at the login node (you will see userid\@login at your terminal prompt), you won’t have access to GPU inference. Therefore, you can only run Ollama in CPU mode, and you will see the following warning message.

WARNING: Could not find any nv files on this host!

If you would like to run Ollama in GPU mode, you will either need to:

request an interactive session on the GPU nodes. Refer to the UCL HPC documentation on interactive sessions for more information on how to request an interactive session on the GPU nodes.
submit a GPU job to the GPU nodes. Refer to the UCL HPC documentation on GPU nodes for more information on how to submit a job to the GPU nodes.

Now that the Ollama service is run in the background, you need to download some Ollama models to the UCL HPC cluster.

For instance, you can type the following command to download the qwen2.5:14b model.


apptainer run --nv ~/ollama-0.5.11.sif pull qwen2.5:14b

Note

apptainer run --nv ~/ollama-0.5.11.sif can be thought of as ollama run if you are familiar with the ollama command. It is a command to run the Ollama container image on the UCL HPC cluster.
pull qwen2.5:14b is a command to download the qwen2.5:14b model to the UCL HPC cluster. The model will be saved in the OLLAMA_MODELS directory you set earlier.
If you tried ollama on your laptop before, this is similar to ollama pull qwen2.5:14b to download the model to your local machine.
Therefore, in order to enter the chat mode with Ollama, you can type the following command: apptainer run --nv ~/ollama-0.5.11.sif run qwen2.5:14b.

You can test Ollama is running by typing the following command. For how to use the API, you can refer to the Ollama API documentation.

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

2.5 Step 5: Call Ollama API with your preferred language

Since it’s very likely that you will use Ollama in your own programming environment, you can call Ollama with your preferred programming language. For instance, you can use R/Python to classify whether a Twitter text contains hate speech. You can also call Ollama with your preferred programming language. Say, You can use R/Python to classify whether a Twitter text contains hate speech.

2.6 Conclusion

The above steps show you a quick example of how to use Ollama with the UCL HPC cluster. You can replace the toy example with your own use case. You can also streamline the process by writing a shell script to automate the process of loading the modules, pulling the Ollama container image, and running Ollama on the UCL HPC cluster.

Below is my example shell script to automate the process of loading the modules, pulling the Ollama container image, and running Ollama on the UCL HPC cluster.

#!/bin/bash -l

#$ -l h_rt=48:00:0
#$ -l mem=32G
#$ -l gpu=1
#$ -l tmpfs=10G
#$ -N find_company_matches
#$ -wd ~/Scratch/Accounting-Marketing
#$ -m be
#$ -M wei.miao@ucl.ac.uk
#$ -t 1-3

# Load the R module and run your R program
# source /shared/ucl/apps/bin/defmods
export OLLAMA_MODELS="~/Scratch/ollama/models"
export R_LIBS_USER="~/R/x86_64-pc-linux-gnu-library/4.4"
export GIN_MODE="release"
export OLLAMA_LOG_LEVEL="error"

module -f unload compilers mpi gcc-libs
module load curl/7.86.0/gnu-4.9.2
module load r/4.4.2-openblas/gnu-10.2.0
module load apptainer

apptainer run --nv ~/ollama-0.5.11.sif &
apptainer run ~/ollama-0.5.11.sif pull qwen2.5:7b # change this to your preferred model

export WORK_DIR="~/Scratch/"

# below is the R script to run
cd $TMPDIR
R --no-save < $WORK_DIR/shell/find_company_name_matches.R > $JOB_NAME$SGE_TASK_ID.out

# Copy the output files back to the current directory
tar zcvf $WORK_DIR/shell/files_from_job_$JOB_NAME$SGE_TASK_ID.tgz $TMPDIR

Enjoy using Ollama on the the HPC cluster!

Footnotes

SSH stands for Secure Shell. It is widely used for remote login to computer systems by users. As long as you have the user ID and your user password, you will be able to SSH into the cluster.↩︎
Apptainer is previously known as Singularity, and the SIF file is the container image format used by Apptainer.↩︎