HPC Grid Tutorial: How to Run a Job for Tensorflow

Follow these steps to run a Tensorflow job. Note: Make sure you have access to nodes with GPUs.

1. Log in to the Grid.

2. Copy the required contents using the following command:

cp -R /wsu/el7/scripts/tutorial/addition.py .

cp -R /wsu/el7/scripts/tutorial/tensorflow_job .

Image

3. The file that has the job script is tensorflow_job. It contains the following script:

#!/bin/bash

# Job name

#SBATCH --job-name Tensorflow

# Submit to the GPU QoS

#SBATCH -q gpu

# Request one node

#SBATCH -N 1

# Total number of cores, in this example it will 1 node with 1 core each.

#SBATCH -n 1

# Request memory

#SBATCH --mem=5G

# Request the GPU type

#SBATCH --constraint="k40"

# Mail when the job begins, ends, fails, requeues

#SBATCH --mail-type=ALL

# Where to send email alerts

#SBATCH --mail-user=xxyyyy@wayne.edu

# Create an output file that will be output_<jobid>.out

#SBATCH -o output_%j.out

# Create an error file that will be error_<jobid>.out

#SBATCH -e errors_%j.err

# Set maximum time limit

#SBATCH -t 1:0:0

ml python/3.7

source /wsu/e17/pre-compiled/python/3.7/etc/profile.d/conda.sh

conda activate tensorflow_env

python addition.py

Note: Make sure that addition.py is in your home directory.

4. To submit the job, type: sbatch tensorflow_job

Once the job is submitted you can check to see job information with the following command: qme

Image

5. You will find the output and error files in your home directory once the job has completed, check the contents of your home directory by typing: ls

Image