GRIT
GRIT integrates short-read RNA sequencing (RNA-seq) with gene-boundary datasets to discover and quantify full-length transcripts for improved genome annotation.
Key Features:
- Integration of gene-boundary datasets: Integrates cap analysis of gene expression (CAGE) and poly(A)-site-seq with short-read RNA-seq to define transcript boundaries.
- Full-length transcript discovery and quantification: Assembles and quantifies full-length transcripts from short-read RNA-seq data.
- Improved accuracy: Achieves approximately 30% higher precision and recall than the most widely used transcript assembly tools.
- Empirical validation in Drosophila melanogaster: In modENCODE RNA-seq data, recovered nearly all previously annotated transcripts and doubled the total number of identified transcripts.
- Transcript isoform characterization: Detects isoform-level features including multiple polyadenylation sites and alternative splicing and promoters, with a reported higher prevalence of genes with multiple polyadenylation sites in adult fly heads.
- Protein-localization signal detection: Identified that 20% of protein-coding genes encode multiple protein-localization signals in the Drosophila analysis.
Scientific Applications:
- Genome annotation: Generation of high-quality, comprehensive genome annotations through full-length transcript models.
- Functional annotation: Identification of isoform-specific features such as protein-localization signals and polyadenylation sites for functional inference.
- Gene expression studies: Quantification of transcript-level expression from short-read RNA-seq data.
- Comparative genomics: Expansion and comparison of transcript catalogs across datasets, exemplified by the modENCODE Drosophila analysis.
Methodology:
Integrates short-read RNA-seq with gene-boundary datasets (CAGE and poly(A)-site-seq) to assemble and quantify full-length transcripts, and compares precision and recall against existing transcript assembly tools.
Topics
Details
- Tool Type:
- command-line tool
- Operating Systems:
- Linux
- Programming Languages:
- Python
- Added:
- 8/3/2017
- Last Updated:
- 11/25/2024
Operations
Data Inputs & Outputs
Sequence assembly
Inputs
Outputs
Other operations do not define inputs or outputs.
Publications
Boley N, Stoiber MH, Booth BW, Wan KH, Hoskins RA, Bickel PJ, Celniker SE, Brown JB. Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nature Biotechnology. 2014;32(4):341-346. doi:10.1038/nbt.2850. PMID:24633242. PMCID:PMC4037530.