BUSCO
BUSCO assesses genome assembly, annotated gene set, and transcriptome completeness by detecting near-universal single-copy orthologs to provide biologically meaningful completeness and redundancy metrics.
Key Features:
- Comprehensive Assessment: Evaluates genome assemblies, annotated gene sets, assembled transcriptomes, and metagenome-assembled genomes for both eukaryotes and prokaryotes.
- Biologically Meaningful Metrics: Uses near-universal single-copy ortholog datasets from OrthoDB to report completeness and redundancy by quantifying presence, absence, and fragmentation of orthologs, complementing contiguity measures such as N50.
- Flexible and Extensible Software: Implemented in Python and refactored for flexibility and extensibility, supporting batch analysis and novel workflows for high-throughput assessments.
- Automatic Lineage Workflow: Automatically selects an appropriate lineage dataset when not specified, facilitating analysis of metagenome-assembled genomes of unknown origin.
- Enhanced Data Sets: Expanded OrthoDB-derived datasets improve species sampling and resolution across vertebrates, arthropods, fungi, prokaryotes, nematodes, protists, and plants.
- Phylogenomic Applications: Produces gene sets usable for construction of phylogenomic trees and visualization of syntenies.
Scientific Applications:
- Quality Control: Assess completeness and identify missing or fragmented orthologs in genome assemblies and gene sets.
- Comparative Genomics: Compare expected gene content across species using lineage-specific single-copy ortholog sets.
- Gene Predictor Training: Inform training and evaluation of gene prediction models by indicating genome completeness and redundancy.
- Metagenomics: Assess metagenome-assembled genomes and transcriptomes derived from metagenomic datasets.
- Phylogenomics: Provide single-copy ortholog markers for phylogenomic inference and synteny analyses.
Methodology:
Compares input sequences against curated sets of near-universal single-copy orthologs (from OrthoDB) expected for a specified lineage, quantifies presence, absence, or fragmentation to produce completeness and redundancy scores, and supports manual dataset specification, automatic lineage selection, and batch processing.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Python
- Added:
- 10/10/2016
- Last Updated:
- 6/17/2025
Operations
Data Inputs & Outputs
Sequence assembly validation
Publications
Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: Assessing Genomic Data Quality and Beyond. Current Protocols. 2021;1(12). doi:10.1002/cpz1.323. PMID:34936221.
Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, Kriventseva EV, Zdobnov EM. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Molecular Biology and Evolution. 2017;35(3):543-548. doi:10.1093/molbev/msx319. PMID:29220515. PMCID:PMC5850278.
Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution. 2021;38(10):4647-4654. doi:10.1093/molbev/msab199. PMID:34320186. PMCID:PMC8476166.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-3212. doi:10.1093/bioinformatics/btv351. PMID:26059717.