Helianthus annuus Assembly and Gene Annotation

About Helianthus annuus

The domesticated sunflower, Helianthus annuus, is a globally important oil crop that has promise for climate change adaptation, maintaining stable yields across a wide variety of environmental conditions, including drought. The large genome (3.6 gigabases) consists of long and highly similar repeats, making the assembly very challenging.

Assembly

The assembly of Helianthus annuus inbred line XRQ was performed using 102-fold coverage of single-molecule real-time (SMRT) cells on the PacBio RS II platform. In total 32.8 million subreads were generated with a read N50 of 13.7 kb and a mean read length of 10.3 kb. The 367 Gbp of raw sequence (340 Gbp of subread data) was assembled into 3 Gbp (80% of the estimated genome size) in 13,957 sequence contigs. Four high-density genetic maps were combined with a sequence-based physical map to build the sequences of the 17 pseudo-chromosomes that anchor 97% of the gene content [1].

The assembly was performed using WGS 8.3. Reads were first corrected using the PBcR wgs8.3rc1 assembly pipeline and the assembly was polished with quiver after the construction of the pseudomolecules. To overcome challenges associated with the sunflower genome assembly, substantial parameter tuning, code modification and software development were required [1].

Annotation

Gene models were predicted using EuGene 4.2. The plant early release of BUSCO (release July 2015) was run on the set of predicted transcripts, and it detected 92% of complete gene models (590 complete single copy and 291 duplicated, respectively) plus 10 additional fragmented gene models.

References

  1. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.
    Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brire C, Owens GL, Carrre S, Mayjonade B et al. 2017. Nature. 546:148-152.

Picture credit: By i_am_jim (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyHanXRQr1.0, INSDC Assembly GCA_002127325.1, May 2017
Database version94.1
Base Pairs2,925,295,703
Golden Path Length3,027,844,945
Genebuild byEnsemblPlants
Genebuild methodImported from ENA
Data sourceInternational Consortium for Sunflower Genomics

Gene counts

Coding genes52,191
Non coding genes8,940
Small non coding genes8,473
Long non coding genes467
Pseudogenes51
Gene transcripts61,182

About this species