UMICollapse

The software tool 'UMICollapse' addresses the challenge of deduplicating reads with Unique Molecular Identifiers (UMIs) in large datasets efficiently. PCR duplicates are a concern, and while existing tools tackle this by identifying reads with the same alignment coordinates and UMIs, they often struggle with substitution errors or are computationally expensive for large datasets. 'UMICollapse' presents a novel approach to UMI deduplication, allowing for optimized data structures and efficient solutions. The tool is implemented with these optimizations, offering rapid deduplication of over one million unique UMIs (of length 9) at a single alignment position in approximately 26 seconds using only a single thread and minimal memory (less than 10 GB). This approach proves to be faster and more resource-efficient in comparison to existing methods for UMI deduplication.

Topic

RNA-Seq;Gene expression

Detail

  • Operation: RNA-seq analysis

  • Software interface: Command-line user inteface

  • Language: Java

  • License: The MIT licence

  • Cost: Free

  • Version name: v1.0.0

  • Credit: -

  • Input: -

  • Output: -

  • Contact: Daniel Liu daniel.liu02@gmail.com, Smriti Chawla smritic@iiitd.ac.in

  • Collection: -

  • Maturity: -

Publications

  • Algorithms for efficiently collapsing reads with Unique Molecular Identifiers.
  • Liu D. Algorithms for efficiently collapsing reads with Unique Molecular Identifiers. Algorithms for efficiently collapsing reads with Unique Molecular Identifiers. 2019; 7:e8275. doi: 10.7717/peerj.8275
  • https://doi.org/10.7717/peerj.8275
  • PMID: 31871845
  • PMC: 31871845

Download and documentation


< Back to DB search