FGTED

FGTED extends the Graph Traversal Edit Distance to quantify distances between heterogeneous string sets represented by genome graphs and to bound deviation from Earth Mover's Edit Distance (EMED).


Key Features:

  • Extension of GTED: Extends the Graph Traversal Edit Distance (GTED) to better model distances between heterogeneous string sets.
  • Genome graph expressiveness: Models the expressiveness of genome graphs and their capacity to represent broader string sets than those present in a sample.
  • String set universe diameter: Defines a "string set universe diameter" metric for genome graphs to quantify their expressiveness.
  • Deviation bound to EMED: Uses the string set universe diameter to upper-bound the deviation between FGTED and Earth Mover's Edit Distance (EMED).
  • Empirical performance: Demonstrated reduction in deviation from true string set distances by over 250% in experiments on simulated T-cell receptor sequences and real Hepatitis B virus genomes.

Scientific Applications:

  • Intra-sample heterogeneity quantification: Quantifies distances between heterogeneous string sets to assess intra-sample sequence diversity.
  • T-cell receptor sequence analysis: Applied to simulated T-cell receptor sequences to evaluate and improve accuracy of string set distance estimates.
  • Viral genome comparison: Applied to real Hepatitis B virus genomes to reduce deviation from true string set distances.

Methodology:

FGTED extends GTED, computes a string set universe diameter for genome graphs, and uses that diameter to upper-bound the deviation between FGTED and Earth Mover's Edit Distance (EMED).

Topics

Details

License:
Not licensed
Cost:
Free of charge
Tool Type:
command-line tool
Operating Systems:
Mac, Linux, Windows
Programming Languages:
Python
Added:
9/16/2022
Last Updated:
9/16/2022

Operations

Publications

Qiu Y, Kingsford C. The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance. Bioinformatics. 2022;38(Supplement_1):i404-i412. doi:10.1093/bioinformatics/btac264. PMID:35758819. PMCID:PMC9235494.

PMID: 35758819
PMCID: PMC9235494
Funding: - Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative: GBMF4554 - US National Institutes of Health: R01GM122935 - US National Science Foundation: DBI-1937540