Helianthus annuus Assembly and Gene Annotation
About Helianthus annuus
The domesticated sunflower, Helianthus annuus, is a globally important oil crop that has promise for climate change adaptation, maintaining stable yields across a wide variety of environmental conditions, including drought. The large genome (3.6 gigabases) consists of long and highly similar repeats, making the assembly very challenging.
The assembly of Helianthus annuus inbred line XRQ was performed using 102-fold coverage of single-molecule real-time (SMRT) cells on the PacBio RS II platform. In total 32.8 million subreads were generated with a read N50 of 13.7 kb and a mean read length of 10.3 kb. The 367 Gbp of raw sequence (340 Gbp of subread data) was assembled into 3 Gbp (80% of the estimated genome size) in 13,957 sequence contigs. Four high-density genetic maps were combined with a sequence-based physical map to build the sequences of the 17 pseudo-chromosomes that anchor 97% of the gene content .
The assembly was performed using WGS 8.3. Reads were first corrected using the PBcR wgs8.3rc1 assembly pipeline and the assembly was polished with quiver after the construction of the pseudomolecules. To overcome challenges associated with the sunflower genome assembly, substantial parameter tuning, code modification and software development were required .
Gene models were predicted using EuGene 4.2. The plant early release of BUSCO (release July 2015) was run on the set of predicted transcripts, and it detected 92% of complete gene models (590 complete single copy and 291 duplicated, respectively) plus 10 additional fragmented gene models.
- The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.
Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brire C, Owens GL, Carrre S, Mayjonade B et al. 2017. Nature. 546:148-152.
Picture credit: By i_am_jim (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
General information about this species can be found in Wikipedia.
|Assembly||HanXRQr1.0, INSDC Assembly GCA_002127325.1, May 2017|
|Golden Path Length||3,027,844,945|
|Genebuild method||Imported from ENA|
|Data source||International Consortium for Sunflower Genomics|
|Non coding genes||8,940|
|Small non coding genes||8,473|
|Long non coding genes||467|