seqkit

seqkit manipulates FASTA and FASTQ files to perform efficient processing and manipulation of nucleotide and protein sequence datasets.

Key Features:

Format and sequence support: Operates on FASTA and FASTQ formats and supports nucleotide and protein sequences.
Conversion: Converts between FASTA and FASTQ formats.
Search and filtering: Searches sequences and filters entries based on specified criteria.
Deduplication: Identifies and removes redundant sequence entries.
File partitioning: Splits large sequence files into smaller segments.
Randomization and sampling: Shuffles sequences for randomization and samples subsets of sequences.
Performance optimization: Employs optimized algorithms to reduce execution time and memory usage for large datasets.

Scientific Applications:

Dataset preparation: Prepares and formats FASTA/FASTQ datasets for downstream analyses.
Large-scale sequencing management: Processes and partitions large sequencing outputs for scalable analysis workflows.
Variant calling workflows: Performs preliminary processing steps required before variant calling.
Metagenomics studies: Filters, samples, and partitions metagenomic sequence datasets for taxonomic or functional analysis.
Comparative genomics: Prepares sequence collections for alignment, clustering, or comparative analyses.

Methodology:

Uses optimized algorithms to perform sequence file manipulations such as searching, filtering, deduplication, splitting, shuffling, and sampling on FASTA and FASTQ files.

Visit Official Homepage →

Topics

Database management Sequence analysis

Details

Added:: 2/15/2021
Last Updated:: 11/24/2024

Operations

Data Inputs & Outputs

DNA transcription

Inputs

Sequence
- xlsx

Outputs

Sequence alignment
- HTML

Publications

Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE. 2016;11(10):e0163962. doi:10.1371/journal.pone.0163962. PMID:27706213. PMCID:PMC5051824.

DOI: 10.1371/journal.pone.0163962

PMID: 27706213

PMCID: PMC5051824

Funding: - National Natural Science Foundation of China: 31570173, 81373133

← Back to search