HPC Grid Tutorial: How to Use SAMtools

Start an interactive job and load the samtools module. For larger tasks write and submit a job script.

qsub -I

module load samtools

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats. Here are a few commands that can be utilized: 

view

samtools view sample.bam > sample.sam

Convert a bam file into a sam file.

samtools view -bS sample.sam > sample.bam

Convert a sam file into a bam file. The -b options compresses or leaves compressed input data.

samtools view sample_sorted.bam "chr1:10-13"

Extract all the reads aligned to the range specified, which are those that are aligned to the reference element named chr1 and cover its 10th, 11th, 12th or 13th base. The results is saved to a BAM file including the header. An index of the input file is required for extracting reads according to their mapping position in the reference genome, as created by samtools index.

samtools view -h -b sample_sorted.bam"chr1:10-13" > tiny_sorted.bam

Extract the same reads as above, but instead of displaying them, writes them to a new bam file, tiny_sorted.bam. The -b option makes the output compressed and the -h option causes the SAM headers to be output also. These headers include a description of the reference that the reads in sample_sorted.bam were aligned to and will be needed if the tiny_sorted.bam file is to be used with some of the more advanced SAMtools commands. The order of extracted reads is preserved.

tview

samtools tview sample_sorted.bam

Start an interactive viewer to visualize a small region of the reference, the reads aligned, and mismatches. Within the view, can jump to a new location by typing: and a location, like g:chr1:10,000,000. If the reference element name and following colon is replaced with {{{1}}}, the current reference element is used, i.e. if {{{1}}} is typed after the previous "goto" command, the viewer jumps to the region 200 base pairs down on chr1. Typing ? brings up help information. 

sort

samtools sort unsorted_in.bam sorted_out

Read the specified unsorted_in.bam as input, sort it by aligned read position, and write it out to sorted_out.bam, the bam file whose name (without extension) was specified.

samtools sort -m 5000000 unsorted_in.bam sorted_out

Read the specified unsorted_in.bam as input, sort it in blocks up to 5 million k (5 Gb) [TODO: verify units here, this could be wrong] and write output to a series of bam files named sorted_out.0000.bam, sorted_out.0001.bam, etc., where all bam 0 reads come before any bam 1 read, etc. [TODO: verify this is correct].

samtools index sorted.bam

Creates an index file, sorted.bam.bai for the sorted.bam file.

 

Further resources:

https://en.wikipedia.org/wiki/SAMtools

http://quinlanlab.org/tutorials/samtools/samtools.html

For tips on parallelization: 

http://zvfak.blogspot.com/2012/02/samtools-in-parallel.html