Most of our work is based on our Java framework Gedi, a software platform for handling genomic data such as sequencing reads, sequences, per-base numeric values or annotations.
Its main feature are:
- Comprehensive general purpose software library (Managing iterators, Serialization, Random access I/O, Parallelization, Extension system, JS based template engine, JSON, Strings, Arrays, …)
- Specialized algorithms and data structures for Bioinformatics (Clustering, maximum scoring subsequences, sequence alignment, string searching, suffix trees, tries, union find, range minimum queries, …) and Statistics (Descriptive statistics, inference, kernel methods, regression)
- Even more specialized algorithms and data structures for genomic data (Random accessed fasta files, Memory based interval trees, disk based interval trees, space efficient handling of aligned reads, Annotation management, Id mapping,…)
- Graphical user interface (Genome browser, …)
Github page: https://github.com/erhard-lab/gedi
PRICE (Probabilistic inference of codon activities by an EM algorithm) is a method to identify ORFs using Ribo-seq experiments embedded in a pipeline for data analysis
Project wiki: https://github.com/erhard-lab/gedi/wiki/Price
Software download: https://github.com/erhard-lab/gedi/releases
Globally refined analysis of newly transcribed RNA and decay rates using SLAM-seq (GRAND-SLAM) is a computational approach to infer the proportion and the corresponding posterior distribution of new and old RNA for each gene from SLAM-seq experiments.
Project wiki: https://github.com/erhard-lab/gedi/wiki/GRAND-SLAM
Since the genome of herpes simplex virus 1 (HSV-1) was first sequenced more than 30 years ago, its predicted 80 genes have been intensively studied. Here, we unravel the complete viral transcriptome and translatome during lytic infection with base-pair resolution by computational integration of multi-omics data. We identified a total of 201 viral transcripts and 284 open reading frames (ORFs) including all known and 46 novel large ORFs. Multiple transcript isoforms expressed from individual gene loci explain translation of the vast majority of novel viral ORFs as well as N-terminal extensions (NTEs) and truncations thereof. We show that key viral regulators and structural proteins possess NTEs, which initiate from non-canonical start codons and govern subcellular protein localization and packaging. We validated a novel non-canonical large spliced ORF in the ICP0 locus and identified a 93 aa ORF overlapping ICP34.5 that is thus also deleted in the FDA-approved oncolytic virus Imlygic. Finally, we extend the current nomenclature to include all novel viral gene products.
To make the annotation and all the obtained data readily accessible to the research community, we here provide our HSV-1 genome browser software. Thereby, viral gene expression and all data can be visually examined from whole genome to single-nucleotide resolution.