Minimap2

Minimap2 performs alignment of DNA, spliced long mRNA/cDNA, and assembly contigs to large reference genomes and databases to enable accurate mapping of short reads, noisy long reads, full-length transcripts, and ultra-long genomic contigs.


Key Features:

  • General-purpose alignment: Maps DNA or long mRNA/cDNA sequences against large reference genomes and databases.
  • Supported sequence types: Handles accurate short reads ≥100 base pairs (bp), genomic reads >1 kilobase (kb) with ~15% error, full-length noisy Direct RNA or cDNA reads, and assembly contigs or chromosomes spanning hundreds of megabases (Mb).
  • Ultra-long read support: Supports ultra-long reads exceeding 100 kilobases and genomic contigs exceeding 100 megabases.
  • Spliced alignment: Performs spliced nucleotide sequence alignment for mRNA/cDNA mapping.
  • Split-read alignment strategy: Employs split-read alignment to effectively handle long insertions and deletions.
  • Concave gap costs: Uses concave gap cost functions to model long indels.
  • Heuristics to reduce spurious alignments: Incorporates heuristics that minimize spurious alignments.
  • Performance: Outpaces mainstream short-read mappers in speed while maintaining comparable accuracy and is reported to be at least 30× faster than other long-read genomic or cDNA mappers when higher accuracy is required.
  • Scalability: Designed to process very large datasets arising from ultra-long-read sequencing and large-contig assemblies.

Scientific Applications:

  • Reference mapping: Mapping DNA and long mRNA/cDNA reads to large reference genomes and databases.
  • Transcriptome alignment: Spliced alignment of full-length mRNA, cDNA, and Direct RNA reads for transcript mapping.
  • Long-read and assembly alignment: Aligning noisy long genomic reads (>1 kb, ~15% error) and assembly contigs or chromosomes spanning hundreds of megabases.
  • Long indel analysis: Facilitating analysis of long insertions and deletions via split-read alignment and concave gap costs.

Methodology:

Performs split-read alignment using concave gap costs and heuristics to minimize spurious alignments.

Topics

Details

License:
MIT
Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
JavaScript, Python, C
Added:
11/29/2018
Last Updated:
12/10/2018

Operations

Publications

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094-3100. doi:10.1093/bioinformatics/bty191. PMID:29750242. PMCID:PMC6137996.

PMID: 29750242
PMCID: PMC6137996
Funding: - National Human Genome Research Institute: 1R01HG010040-01

Documentation