Glossary

BAM
Binary SAM format. BAM files are binary formatted, indexed and allow random access.
BCF
Binary VCF
bgzip
Utility in the htslib package to block compress genomic data files.
cigar
Stands for Compact Idiosyncratic Gapped Alignment Report and represents a compressed (run-length encoded) pairwise alignment format. It was first defined by the Exonerate Aligner, but was alter adapted and adopted as part of the SAM standard and many other aligners. In the Python API, the cigar alignment is presented as a list of tuples (operation,length). For example, the tuple [ (0,3), (1,5), (0,2) ] refers to an alignment with 3 matches, 5 insertions and another 2 matches.
column
Reads that are aligned to a base in the reference sequence.
contig
The sequence that a tid refers to. For example chr1, contig123.
csamtools
The samtools C-API.
faidx
Utility in the samtools package to index fasta formatted files.
fetching
Retrieving all mapped reads mapped to a region.
hard clipping
hard clipped
In hard clipped reads, part of the sequence has been removed prior to alignment. That only a subsequence is aligend might be recorded in the cigar alignment, but the removed sequence will not be part of the alignment record, in contrast to soft clipped reads.
pileup
Pileup
Reference
Synonym for contig
region
A genomic region, stated relative to a reference sequence. A region consists of reference name (‘chr1’), start (10000), and end (20000). Start and end can be omitted for regions spanning a whole chromosome. If end is missing, the region will span from start to the end of the chromosome. Within pysam, coordinates are 0-based, half-open intervals, i.e., the position 10,000 is part of the interval, but 20,000 is not. An exception are samtools compatible region strings such as ‘chr1:10000:20000’, which are closed, i.e., both positions 10,000 and 20,000 are part of the interval.
SAM
A textual format for storing genomic alignment information.
sam file
A file containing aligned reads. The sam file can either be a BAM file or a TAM file.
samtools
The samtools package.
soft clipping
soft clipped
In alignments with soft clipping part of the query sequence are not aligned. The unaligned query sequence is still part of the alignment record. This is in difference to hard clipped reads.
tabix
Utility in the htslib package to index bgzip compressed files.
tabix file
A sorted, compressed and indexed tab-separated file created by the command line tool tabix or the commands tabix_compress() and tabix_index(). The file is indexed by chromosomal coordinates.
tabix row
A row in a tabix file. Fields within a row are tab-separated.
TAM
Text SAM file. TAM files are human readable files of tab-separated fields. TAM files do not allow random access.
target
The sequence that a read has been aligned to. Target sequences have bot a numerical identifier (tid) and an alphanumeric name (Reference).
tid
The target id. The target id is 0 or a positive integer mapping to entries within the sequence dictionary in the header section of a TAM file or BAM file.
VCF
Variant call format