Track hubs for all public RNA-Seq studies in plants
As part of a BBSRC-funded crop infrastructure project, we have established a pipeline for generating track hubs for all public RNA-Seq studies in the INSDC archives. The pipeline discovers and aligns reads from RNA-Seq studies across all plant species in Ensembl Plants. Alignments and their associated metadata are registered in the Track Hub Registry for search and discovery within the genome browser.
In detail, the pipeline uses the ENA's search APIs to discover all plant-spcific RNA-Seq studies. The read data for each study is aligned to the appropriate reference genome in Ensembl Plants using the iRAP pipeline developed by the EMBL-EBI Gene Expression team. Quality-filtered reads are aligned using TopHat 2. The resulting BAM files are converted to CRAM format. Each set of CRAM files for a specific study, along with their associated sample and study metadata, are used to create a track hub that is registered in the Track Hub Registry.