Shepherd
'Shepherd' is a software tool designed to address the challenge of correcting errors in DNA barcode sequences used for lineage tracking. DNA barcodes are random nucleotide sequences introduced into cell populations to monitor and trace the relative counts of individual lineages over time. These barcodes help researchers understand evolutionary dynamics in microbial populations and the progression of diseases like breast cancer. However, errors can occur during next-generation sequencing, making it difficult to accurately identify barcode sequences.
Key features of Shepherd:
1. Clustering Method: Shepherd approaches barcode error correction as a clustering problem. It aims to identify true barcode sequences from noisy sequencing data.
2. K-mer-Based Indexing: The tool employs an indexing system of barcode sequences using k-mers, which are short, fixed-length nucleotide sequences. This indexing helps group similar barcode sequences.
3. Bayesian Statistical Test: Shepherd uses a Bayesian statistical test that considers a substitution error rate to differentiate true barcode sequences from erroneous ones.
Shepherd is implemented in Python and is freely available for use, providing a valuable resource for researchers working with DNA barcode data. It enhances the accuracy of lineage tracking and supports precise biological analyses.
Topic
Sequencing;Oncology;DNA;Genetic variation
Detail
Operation: DNA barcoding;k-mer counting;Clustering
Software interface: Command-line user interface
Language: Python
License: Not stated
Cost: Free
Version name: -
Credit: The Swedish Research Council, Knut and Alice Wallenberg Foundation, Wenner-Gren Foundations, PhD programme of the Faculty of Science, Stockholm University.
Input: -
Output: -
Contact: Chun-Biu Li cbli@math.su.se
Collection: -
Maturity: -
Publications
- Shepherd: accurate clustering for correcting DNA barcode errors.
- Tavakolian N, et al. Shepherd: accurate clustering for correcting DNA barcode errors. Shepherd: accurate clustering for correcting DNA barcode errors. 2022; 38:3710-3716. doi: 10.1093/bioinformatics/btac395
- https://doi.org/10.1093/BIOINFORMATICS/BTAC395
- PMID: 35708611
- PMC: PMC9344852
Download and documentation
Documentation: https://github.com/Nik-Tavakolian/Shepherd#readme
< Back to DB search