SOPRA
SOPRA assembles mate-pair and paired-end short reads from high-throughput sequencing platforms (Illumina, SOLiD) into scaffolds by selecting consistent mate-pair constraints to improve de novo genome assembly.
Key Features:
- Input data: Accepts short reads from high-throughput sequencing platforms including Illumina and SOLiD, including mate-pair and paired-end libraries.
- Mate-pair constraint selection: Selects an optimal subset of mate-pair constraints that are simultaneously satisfiable to balance scaffold size and quality.
- Contig connectivity graph optimization: Formulates scaffold assembly as an optimization problem with variables associated with vertices (contigs) and edges (mate-pair relationships) in a contig connectivity graph.
- Constraint weighting and filtering: Treats all constraints equally during optimization to identify problematic constraints such as chimeric or repetitive contig connections.
- Iterative refinement: Iteratively solves the optimization and removes inconsistent constraints until a core set of consistent constraints remains.
- SOLiD color-space translation: Uses a dynamic programming approach to translate color-space assemblies from SOLiD data into base-space.
- Assembly quality metrics: Assesses assemblies using the no-match/mismatch error rate and various rearrangement error rates.
Scientific Applications:
- De novo genome scaffolding: Improves scaffold assembly for moderate-sized genomes using mate-pair spatial information to connect contigs.
- Bacterial genome assembly: Demonstrated assembly of bacterial genomes into scaffolds with high continuity (reported N50 up to 200 Kb) with few introduced errors.
- Color-space sequence analysis: Processes SOLiD color-space data and converts results into base-space for downstream assembly evaluation.
Methodology:
Scaffold assembly is formulated as an optimization problem on a contig connectivity graph with variables on vertices and edges; SOPRA selects an optimal subset of mate-pair constraints that are simultaneously satisfiable, treats constraints equally to identify and remove problematic (chimeric or repetitive) constraints through iterative solving, applies dynamic programming to translate SOLiD color-space to base-space, and evaluates assemblies using no-match/mismatch and rearrangement error rates.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux, Mac
- Programming Languages:
- Perl
- Added:
- 12/18/2017
- Last Updated:
- 1/17/2019
Operations
Publications
Dayarian A, Michael TP, Sengupta AM. SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics. 2010;11(1). doi:10.1186/1471-2105-11-345. PMID:20576136. PMCID:PMC2909219.