For better experience, turn on JavaScript!

40 Free Whole Genome Assembly (WGA) Analysis Tools - Software and Resources

40 Free Whole Genome Assembly (WGA) Analysis Tools - Software and Resources

Graph: The word 'WGA' occurences in scientific articles stored in PubMed from 1990 to June 2019.
The word "WGA" occurences in scientific articles stored in PubMed from 2000 to December 2018.

The advanced search function is under maintenance and coming up shortly. We apologize the inconvenience.

  1. rampart
    • Description : Rampart is a tool for de novo genome assembly that is implemented as a workflow management system that automatically identify suitable assemblers given users' sequenced data. The workflow is configurable and help users evaluate which assemblers and settings produce the best genome according to some assembly metrics.
  2. Kourami
    • Description : Kourami is a tool for assembling HLA haplotypes. It uses high coverage whole genome sequencing data and implements a graph-guided assembly method for classical HLA genes, which is capable of discovering new HLA alleles.
  3. Ray Meta
    • Description : Ray Meta is a tool for de novo assembly of metagenomes using distributed computing to enable parallel assemblies of multiple genomes. The program is connected to other useful tools part of the Ray series such as Ray Communities, which performs microbiome profiling. The program can assemble and profile numerous microbiomes in a computationally efficient manner.
  4. AutoSeqMan
    • Description : AutoSeqMan is a tool for assembling Sanger sequences into contigs for users working with the Seqman program. While the SeqMan program is an excellent GUI tool, when users have multiple sequences to assemble contigs, the manual process of clicking through the various functions can be time consuming. Using SeqMan scripting language, AutoSeqMan added two modules to classify and assemble sequences.
  5. BioNanoAnalyst
    • Description : BioNanoAnalyst is a tool for evaluating potential mis-assemblies in reference genomes using optical maps. It is a cross-platform graphical user interface (GUI) program. It produces GFF3 output of potential mis-assembled regions and it also has a zoom in visualization of such genomic locations.
  6. dnaasm
    • Description : dnaasm is a tool for assembling tandem repeats. The algoritm implemented uses relative frequency of reads to resolve tandem repeats and is able to restore tandem repeats with lengths longer than the actual sequencing read length. The software is available as console and web applications.
  7. SMRT
    • Description : SMRT is a tool for calling SNPs and assembling haplotypes based on long PacBio reads. It is not a de novo assembly tool and hence, this tool should not be confused with other tools offered as part of the SMRT suite of tools by PacBio developers. The name for this tool is based on Single Molecule Real Time (SMRT) sequencing and the paper describing this tool used PacBio reads. Essentially, it is a method that is able to use the more error prone long PacBio reads to call SNP and haplotypes whereas other methods may need to resort to the use of more accurate short reads for SNP calling.
  8. RecoverY
    • Description : RecoverY is a tool for identifying Y chromosome specific reads for assembly of the Y chromosome. The approach is based on k-mer abundance of reads and uses knowledge of known Y chromosome sequences from related species or transcripts. The results of this tool on assembling human and gorilla Y chromosomes were given and users may try the approach for other species Y chromosomes.
  9. PASHA
    • Description : PASHA is a tool for assembling genomes based on short reads using the de Bruijn graphs with the main improvement being its code for distributed computing. It is able to assemble genomes efficiently in a short amount of time provided the users have access to high performance computing.
  10. Kollector
    • Description : Kollector is a tool for assembling gene sequences based on the assembler ABySS by using transcript sequences as baits to capture whole genome shotgun (WGS) reads. This way, the WGS reads used for assembly are specific to the genomic region. The algorithm identifies kmers from transcripts and seed them to a progressive bloom filter, which is needed to gather genes among WGS reads.
  11. runBNG
    • Description : runBNG is a wrapper script written in Bash to automate tasks for BioNano optical map data. It is a substitute to IrysView, which only works on Windows based platform. It performs optical map de novo assembly, super-scaffolding, and structural variant detection based on functions implemented in IrysView.
  12. drVM
    • Description : drVM is a tool for extracting known viral reads from metagenomics projects to automatically assemble their genomes. It is essentially a pipeline written in Python that integrates a few tools such as BLAST, SNAP, SPAdes, and khmer to reconstruct a variety of viral genomes among metagenomes. Additionally, it performs coverage profiling of the viruses.
  13. GRAbB
    • Description : GRAbB is a tool for assembling specific genomic loci by using these regions as baits to find corresponding reads (e.g. Illumina paired end reads) prior to de novo assembly. It can handle multiple loci assemblies simultaneously and is useful for assembling mitochondria genome, rDNA repeats and other poorly assembled regions of the genome.
  14. npScarf
    • Description : npScarf is a tool for assembly scaffolding and gap filling suitable for smaller genomes already assembled with short reads. It takes advantage of the long reads Oxford Nanopore streaming of sequencing result to continuously analyse how the new sequences generated improved the assembly by monitoring key metrics. Once sufficient contiguity or other metrics deemed suitable have been achieved, the long read sequencing can be stopped and hence saves time and money.
  15. MECAT
    • Description : MECAT is a tool for de novo assembly of long single molecule sequenced reads (e.g. PacBio). The tool implemented a pseudolinear alignment scoring algorithm to remove unnecessary alignments based on distance difference factors (DDFs) to score matched k-mer pairs. Large genomes can be assembled on a single computer using this tool.
  16. SOMA
    • Description : SOMA is a tool for scaffolding short-read based contigs of bacteria genome using optical maps. The method implemented is robust to sequencing and assembly errors. The program is available as a web-application and an open-source package.
  17. CAR
    • Description : CAR is a tool to rearrange contigs based on a known reference sequence. The algorithm implemented considers permutation of the contig groups and join them to match the reference. It is only implemented as a web application and only small prokaryotic genome can be scaffolded this way.
  18. CUDA-EC
    • Description : CUDA-EC is a tool to parallelize error correction of short reads by leveraging the power of GPU. The corrected short reads by this tool are ready for assembly. It implements a space-efficient Bloom filter data structure.
  19. HLAreporter
    • Description : HLAreporter is a tool for mapping reads to a known reference panel of HLA alleles and then use these reads for de novo assembly. Apparently, the tool has outperformed similar tools such as HLAminer and PHLAT.
    • Description : LACHESIS is tool to scaffold contigs based on Hi-C reads, which provide short to long range linkage information. It utilizes the contact probabilities of Hi-C reads to order and orientate contigs. Using this tool, it is possible to generate chromosome-level scaffolds.
  21. GenSeed
    • Description : GenSeed is a tool that allows for targeted assembly of specific sequences in the genome using reads relevant to the targets. This Perl program implements a recursive algorithm to find sequence similarity, select reads, and assembly. The program should be useful for assembling particular nuclear genes, transcripts and extrachromosomal genomes.
  22. RACA
    • Description : RACA is a tool to order and orientate scaffolds generated from short-read based scaffolds using reference genomes from closely related species. The tool takes advantage of conservation of homologous sequences and has demonstrated good performance in simulated and real datasets. The tool is suitable for assemblies made from short reads and no linkage map is available.
  23. GAM-NGS
    • Description : GAM-NGS is a tool to merge two or more assemblies to improve certain assembly metrics such as contiguity that is not achievable with the use of a single assembler. The merging process is aided by the use of weighted graph to optimally resolve problematic regions.
  24. ELOPER
    • Description : ELOPER is a tool to pre-process paired-end short reads for a better performance during assembly. It implements an algorithm that detects overlaps between both ends of the paired-end reads, which then merged those reads with significant overlaps. The performance is superior than assemblers that typically consider the two ends of each paired-end read separately for overlap detection. However, this tool does not perform the assembly step itself but rather it processed the paired-end reads for assembly.
  25. SR-ASM
    • Description : SR-ASM is a tool for assembling short reads from the 454 sequencing platform. The algorithm implemented is a heuristic method based on graph model and takes advantage of the way 454 sequence output is presented.
  26. CANU
    • Description : CANU is a tool to assemble long reads from either PacBio or Oxford Nanopore, which have higher error rates than short reads from Illumina. The tool runs much faster than its predecessor, Celera Assembler, and implemented some new overlapping and assembly algorithms such as adaptive overlapping strategy and sparse assembly graph construction. It can also provide output in graphical fragment assembly (GFA) format.
  27. ABySS
    • Description : ABySS is a tool for de novo genome assembly using short read data. It implements a distributed representation of de Bruijn graphs, which enable parallel computation of the assembly algorithm. ABySS stands for Assembly By Short Sequencing.
  28. Velvet
    • Description : Velvet is a tool for de novo assembly based on de Bruijn graphs and it is suitable for short read data with high coverage. The algorithm implemented is capable of de Bruijn graphs manipulation to remove sequencing errors and resolve repeats.
  29. BUSCO
    • Description : BUSCO is a tool to assess completeness of genome assembly, gene set and transcriptome. It is based on the concept of single-copy orthologs that should be highly conserved among the closely related species. For example, users who wish to study the completeness of a mammalian genome assembly will use single-copy orthologs discovered among mammalian species.
  30. VGA
    • Description : VGA is a tool for assembling individual viral genomes from a sample that consists of diverse populations of viruses. It takes advantage of high sequencing depth to detect rare variants and requires sequencing library with barcodes attached to sequencing fragments.
  31. VirAmp
    • Description : VirAmp is a tool for assembling viral genomes using the Galaxy workflow, which enables users to use web interface to click through a variety of programs and hence requires little programming experience to operate it. Three assemblers i.e. Velvet, SPAdes, and VICUNA are used by default following installation. The program covers quality checking of raw reads, coverage reduction, de novo assembly, scaffolding, gap filling, and assembly metrics evaluation.
  32. Rainbow
    • Description : Rainbow is a tool to cluster and assemble short reads sequences originating from restriction-site associated DNA sequencing (RAD-seq). The Rainbow algorithm discriminates repeats from heterozygous sequences by grouping the reads into haplotypes and creates a guide tree, and implements a greedy algorithm for contig assembly.
  33. HINGE
    • Description : HINGE is a tool for de novo genome assembly that addresses the challenge of using error prone long reads. It combines error tolerance feature of Overlap-Layout-Consensus and repeat resolution of de Bruijn graph assembler. Additionally, HINGE produces visually interpretable assembly graph.
  34. Racon
    • Description : Racon is a standalone consensus building tool that can be coupled with a fast assembler such as miniasm, which performs de novo assembly with error prone long reads without error corrections. This dramatically cut down the time needed for sequence assembly and consensus generation. Racon stands for Rapid Consensus and it can be used for PacBio and Oxford Nanopore data.
  35. HGAP
    • Description : HGAP is a tool for de novo genome assembly using PacBio reads. It implements a hierarchical assembly process that starts with using reads that are longer than the rest as seed reads to gather all other reads for constructing a highly accurate preassembled reads. After this step, the preassembled reads can be assembled using the overlap-layout-consensus approach.
  36. FALCON
    • Description : FALCON is a tool for de novo assembly of long PacBio reads and it is an improved version of its predecessor HGAP. Unlike HGAP, it is a diploid-aware assembler that is better suited to assemble larger genomes. Users should look into FALCON-Unzip if they wish to phase the assembly as well.
  37. FALCON-Unzip
    • Description : FALCON-Unzip is a tool for de novo assembly of long PacBio reads and it is similar to FALCON except it has the ability to phase the assembly.
  38. miniasm
    • Description : miniasm is a tool for de novo assembly of long reads from either the PacBio or Oxford Nanopore platforms. It does not perform an error correction step. This tool is likely used in conjunction with minimap in order to generate all-vs-all reads mapping to be used as input for the assembly.
  39. Kermit
    • Description : Kermit is a tool for using linkage maps to guide genome assembly. It simplifies assembly and reduce assembly errors for users with long-read based data for contig assembly. Linkage maps are often used to validate the assembly but, in this tool, these maps are used to guide the assembly instead of being used post-contig assembly. It implements a coloured overlap graphs strategy.
  40. poreTally
    • Description : poreTally is a tool to benchmark a few assemblers of Oxford Nanopore reads. It can run CANU, Flye, SMARTdenovo and wtdbg2 assembly pipelines and generates a report in article style.

If you find errors, please report here: comments and suggestions.