HPC Grid Tutorial: How to Run a Job for TensorflowShare & Print
Follow these steps to run a Tensorflow job. Note: Make sure you have access to nodes with GPUs.
1. Log in to the Grid.
2. Copy the required contents using the following command:
cp -R /wsu/el7/scripts/tutorial/addition.py .
cp -R /wsu/el7/scripts/tutorial/tensorflow_job .
3. The file that has the job script is tensorflow_job. It contains the following script:
# Job name
#SBATCH --job-name Tensorflow
# Submit to the GPU QoS
#SBATCH -q gpu
# Request one node
#SBATCH -N 1
# Total number of cores, in this example it will 1 node with 1 core each.
#SBATCH -n 1
# Request memory
# Request the GPU type
# Mail when the job begins, ends, fails, requeues
# Where to send email alerts
# Create an output file that will be output_<jobid>.out
#SBATCH -o output_%j.out
# Create an error file that will be error_<jobid>.out
#SBATCH -e errors_%j.err
# Set maximum time limit
#SBATCH -t 1:0:0
conda activate tensorflow_env
Note: Make sure that addition.py is in your home directory.
4. To submit the job, type: sbatch tensorflow_job
Once the job is submitted you can check to see job information with the following command: qme
5. You will find the output and error files in your home directory once the job has completed, check the contents of your home directory by typing: ls