============ Introduction ============ Pysam is a python module that makes it easy to read and manipulate mapped short read sequence data stored in SAM/BAM files. It is a lightweight wrapper of the htslib_ C-API. This page provides a quick introduction in using pysam followed by the API. See :ref:`usage` for more detailed usage instructions. To use the module to read a file in BAM format, create a :class:`~pysam.AlignmentFile` object:: import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb") Once a file is opened you can iterate over all of the reads mapping to a specified region using :meth:`~pysam.AlignmentFile.fetch`. Each iteration returns a :class:`~pysam.AlignedSegment` object which represents a single read along with its fields and optional tags:: for read in samfile.fetch('chr1', 100, 120): print(read) samfile.close() To give:: EAS56_57:6:190:289:82 0 99 <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; 69 CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA 0 192 1 EAS56_57:6:190:289:82 0 99 <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; 137 AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC 73 64 1 EAS51_64:3:190:727:308 0 102 <<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844 99 GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG 99 18 1 ... You can also write to a :class:`~pysam.AlignmentFile`:: import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb") pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile) for read in samfile.fetch(): if read.is_paired: pairedreads.write(read) pairedreads.close() samfile.close() An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the :meth:`~pysam.AlignmentFile.pileup` method. Each iteration returns a :class:`~pysam.PileupColumn` which represents all the reads in the SAM file that map to a single base in the reference sequence. The list of reads are represented as :class:`~pysam.PileupRead` objects in the :attr:`PileupColumn.pileups ` property:: import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb" ) for pileupcolumn in samfile.pileup("chr1", 100, 120): print("\ncoverage at base %s = %s" % (pileupcolumn.pos, pileupcolumn.n)) for pileupread in pileupcolumn.pileups: if not pileupread.is_del and not pileupread.is_refskip: # query position is None if is_del or is_refskip is set. print('\tbase in read %s = %s' % (pileupread.alignment.query_name, pileupread.alignment.query_sequence[pileupread.query_position])) samfile.close() The above code outputs:: coverage at base 99 = 1 base in read EAS56_57:6:190:289:82 = A coverage at base 100 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 101 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 102 = 2 base in read EAS56_57:6:190:289:82 = G base in read EAS51_64:3:190:727:308 = G ... Commands available in `samtools`_ are available as simple function calls. For example:: pysam.sort("-o", "output.bam", "ex1.bam") corresponds to the command line:: samtools sort -o output.bam ex1.bam Analogous to :class:`~pysam.AlignmentFile`, a :class:`~pysam.TabixFile` allows fast random access to compressed and tabix indexed tab-separated file formats with genomic data:: import pysam tabixfile = pysam.TabixFile("example.gtf.gz") for gtf in tabixfile.fetch("chr1", 1000, 2000): print(gtf.contig, gtf.start, gtf.end, gtf.gene_id) :class:`~pysam.TabixFile` implements lazy parsing in order to iterate over large tables efficiently. More detailed usage instructions are available at :ref:`usage`. .. note:: Coordinates in pysam are always 0-based (following the python convention). SAM text files use 1-based coordinates. .. note:: The above examples can be run in the :file:`tests` directory of the installation directory. Type 'make' before running them. .. seealso:: https://github.com/pysam-developers/pysam The pysam code repository, containing source code and download instructions http://pysam.readthedocs.org/en/latest/ The pysam website containing documentation === API === SAM/BAM/CRAM files ================== Objects of type :class:`~pysam.AlignmentFile` allow working with BAM/SAM formatted files. .. autoclass:: pysam.AlignmentFile :members: .. autoclass:: pysam.AlignmentHeader :members: An :class:`~pysam.AlignedSegment` represents an aligned segment within a SAM/BAM file. .. autoclass:: pysam.AlignedSegment :members: .. autoclass:: pysam.PileupColumn :members: .. autoclass:: pysam.PileupRead :members: .. autoclass:: pysam.IndexedReads :members: VCF/BCF files ============= .. autoclass:: pysam.VariantFile :members: .. autoclass:: pysam.VariantHeader :members: .. autoclass:: pysam.VariantRecord :members: .. autoclass:: pysam.VariantContig :members: .. autoclass:: pysam.VariantHeaderRecord :members: .. autoclass:: pysam.VariantMetadata :members: .. autoclass:: pysam.VariantRecordSample :members: Internal classes ---------------- These classes are used internally to represent particular header and record fields. They cannot be instantiated directly in your Python or Cython code. .. autoclass:: pysam.VariantHeaderContigs :members: .. autoclass:: pysam.VariantHeaderMetadata :members: .. autoclass:: pysam.VariantHeaderRecords :members: .. autoclass:: pysam.VariantHeaderSamples :members: .. autoclass:: pysam.VariantRecordFilter :members: .. autoclass:: pysam.VariantRecordFormat :members: .. autoclass:: pysam.VariantRecordInfo :members: .. autoclass:: pysam.VariantRecordSamples :members: Tabix files =========== :class:`~pysam.TabixFile` opens tabular files that have been indexed with tabix_. .. autoclass:: pysam.TabixFile :members: To iterate over tabix files, use :func:`~pysam.tabix_iterator`: .. autofunction:: pysam.tabix_iterator .. autofunction:: pysam.tabix_compress .. autofunction:: pysam.tabix_index .. autoclass:: pysam.asTuple :members: .. autoclass:: pysam.asVCF :members: .. autoclass:: pysam.asBed :members: .. autoclass:: pysam.asGTF :members: FASTA files =========== .. autoclass:: pysam.FastaFile :members: FASTQ files =========== .. autoclass:: pysam.FastxFile :members: .. autoclass:: pysam.FastqProxy :members: HTSFile ======= HTSFile is the base class for :class:`pysam.AlignmentFile` and :class:`pysam.VariantFile`. .. autoclass:: pysam.HTSFile :members: