HPC Grid Tutorial: How to Run a GPU Job

Follow these steps to run a GPU job on the Grid.

Watch the video tutorial here!

1. Log on to the Grid. Note: Make sure to have access to accq to be able to run your script. 

2. From your home directory, copy the GPU job script to your home directory by typing: cp /wsu/el7/scripts/tutorial/gpu_job .

3. Run the GPU job script by typing: qsub gpu_job

This is the gpu_job script:

#!/bin/bash

##      Script is submitted to this Queue:

#PBS -q accq

##      One core, 1GBof RAM, and 1 GPU selected:

#PBS -l select=1:ncpus=1:mem=1GB:ngpus=1

##      Commands to be executed:

##      List assigned GPU:

echo Assigned GPU: $CUDA_VISIBLE_DEVICES

##      Check state of GPU:

nvidia-smi

##      Sleep for 5minutes:

sleep 300

4. Check to see that your job is running by typing: qme

Notice the Job ID, $PBS_JOBID, in the red boxes, and the Node, $HOSTNAME, in the blue box your job is running on. You can also see that the job is submitted to accq in the yellow box.

5. You can login to any of the nodes that your job is running on by typing: ssh $HOSTNAME where $HOSTNAME is the name of the node your job is running on. In this example taken in the screenshot, ssh can be done to node by typing ssh acc4

6. You can check to see how many resources your job is using on the GPU(s) by typing: nvidia-smi

8. When your job is finished you should have an error and an output file (gpu_job.e$PBS_JOBID and gpu_job.o$PBS_JOBID) in your home directory, check by typing: ls