cshl_fastx_collapser

cshl_fastx_collapser collapses identical sequences in FASTA files to reduce redundancy and facilitate efficient processing of data from next-generation DNA sequencing (NGS).


Key Features:

  • Sequence Collapsing: Identifies and collapses identical sequences within FASTA files into single representative entries while preserving sequence information.
  • Galaxy Integration: Integrates with the Galaxy Project for execution within Galaxy workflows and platforms.
  • Provenance Tracking: Tracks computational steps and records processing provenance within the Galaxy ecosystem to support reproducibility.

Scientific Applications:

  • Genomics and Transcriptomics: Reduces dataset size in genomics and transcriptomics studies by collapsing identical sequences.
  • Data Reduction: Streamlines large-scale sequence datasets to simplify downstream analyses.
  • Resource Optimization: Decreases computational load and storage requirements for high-throughput sequencing datasets.
  • Improved Downstream Analysis: Emphasizes unique sequences to aid accuracy of downstream bioinformatic analyses.

Methodology:

Parses FASTA files to identify duplicate sequences and collapses them into single entries, with execution and provenance recording performed within the Galaxy platform.

Topics

Collections

Details

Maturity:
Mature
Tool Type:
web application
Operating Systems:
Linux, Windows, Mac
Programming Languages:
Perl
Added:
12/19/2016
Last Updated:
11/24/2024

Operations

Data Inputs & Outputs

Sequence merging

Publications

Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C, Grüning B, Guerler A, Hillman-Jackson J, Von Kuster G, Rasche E, Soranzo N, Turaga N, Taylor J, Nekrutenko A, Goecks J. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Research. 2016;44(W1):W3-W10. doi:10.1093/nar/gkw343. PMID:27137889. PMCID:PMC4987906.

Mareuil F, Doppelt-Azeroual O, Ménager H. A public Galaxy platform at Pasteur used as an execution engine for web services. Unknown Journal. 2017. doi:10.7490/f1000research.1114334.1.

Documentation

Links

Related Tools

fastx-toolkit
Relation: includedIn