BUSCO

BUSCO assesses genome assembly, annotated gene set, and transcriptome completeness by detecting near-universal single-copy orthologs to provide biologically meaningful completeness and redundancy metrics.


Key Features:

  • Comprehensive Assessment: Evaluates genome assemblies, annotated gene sets, assembled transcriptomes, and metagenome-assembled genomes for both eukaryotes and prokaryotes.
  • Biologically Meaningful Metrics: Uses near-universal single-copy ortholog datasets from OrthoDB to report completeness and redundancy by quantifying presence, absence, and fragmentation of orthologs, complementing contiguity measures such as N50.
  • Flexible and Extensible Software: Implemented in Python and refactored for flexibility and extensibility, supporting batch analysis and novel workflows for high-throughput assessments.
  • Automatic Lineage Workflow: Automatically selects an appropriate lineage dataset when not specified, facilitating analysis of metagenome-assembled genomes of unknown origin.
  • Enhanced Data Sets: Expanded OrthoDB-derived datasets improve species sampling and resolution across vertebrates, arthropods, fungi, prokaryotes, nematodes, protists, and plants.
  • Phylogenomic Applications: Produces gene sets usable for construction of phylogenomic trees and visualization of syntenies.

Scientific Applications:

  • Quality Control: Assess completeness and identify missing or fragmented orthologs in genome assemblies and gene sets.
  • Comparative Genomics: Compare expected gene content across species using lineage-specific single-copy ortholog sets.
  • Gene Predictor Training: Inform training and evaluation of gene prediction models by indicating genome completeness and redundancy.
  • Metagenomics: Assess metagenome-assembled genomes and transcriptomes derived from metagenomic datasets.
  • Phylogenomics: Provide single-copy ortholog markers for phylogenomic inference and synteny analyses.

Methodology:

Compares input sequences against curated sets of near-universal single-copy orthologs (from OrthoDB) expected for a specified lineage, quantifies presence, absence, or fragmentation to produce completeness and redundancy scores, and supports manual dataset specification, automatic lineage selection, and batch processing.

Topics

Details

Tool Type:
command-line tool
Operating Systems:
Linux
Programming Languages:
Python
Added:
10/10/2016
Last Updated:
6/17/2025

Operations

Data Inputs & Outputs

Publications

Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO: Assessing Genomic Data Quality and Beyond. Current Protocols. 2021;1(12). doi:10.1002/cpz1.323. PMID:34936221.

Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, Kriventseva EV, Zdobnov EM. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Molecular Biology and Evolution. 2017;35(3):543-548. doi:10.1093/molbev/msx319. PMID:29220515. PMCID:PMC5850278.

Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution. 2021;38(10):4647-4654. doi:10.1093/molbev/msab199. PMID:34320186. PMCID:PMC8476166.

PMID: 34320186
PMCID: PMC8476166
Funding: - Swiss National Science Foundation: 310030_189062

Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-3212. doi:10.1093/bioinformatics/btv351. PMID:26059717.

Documentation