Helianthus annuus (HanXRQr1.0)

Helianthus annuus Assembly and Gene Annotation

About Helianthus annuus

The domesticated sunflower, Helianthus annuus, is a globally important oil crop that has promise for climate change adaptation, maintaining stable yields across a wide variety of environmental conditions, including drought. The large genome (3.6 Gb) consists of long and highly similar repeats, making the assembly very challenging.

Assembly

The assembly of Helianthus annuus inbred line XRQ was performed by the International Consortium for Sunflower Genomics using 102-fold coverage of single-molecule real-time (SMRT) cells on the PacBio RS II platform. In total 32.8 million subreads were generated with a read N50 of 13.7 kb and a mean read length of 10.3 kb. The 367 Gb of raw sequence (340 Gb of subread data) was assembled into 3 Gb (80% of the estimated genome size) in 13,957 sequence contigs. Four high-density genetic maps were combined with a sequence-based physical map to build the sequences of the 17 pseudo-chromosomes that anchor 97% of the gene content.

The assembly was performed using WGS 8.3. Reads were first corrected using the PBcR wgs8.3rc1 assembly pipeline and the assembly was polished with quiver after the construction of the pseudomolecules. To overcome challenges associated with the sunflower genome assembly, substantial parameter tuning, code modification and software development were required.

Annotation

Gene models were predicted using EuGene 4.2. The plant early release of BUSCO (release July 2015) was run on the set of predicted transcripts, and it detected 92% of complete gene models (590 complete single copy and 291 duplicated, respectively) plus 10 additional fragmented gene models.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 3,564,833 Low complexity (Dust) features, covering 238 Mb (7.9% of the genome); 1,722,940 RepeatMasker features (with the nrTEplants library), covering 2131 Mb (70.4% of the genome); 1,612,551 Repeats: Red features, covering 2136 Mb (70.5% of the genome); 2057518 Tandem repeats (TRF) features, covering 194 Mb (6.4% of the genome).

Variation

A set of 11,834 SNPs called in three sunflower pre-breeding collections belonging to INTA (Argentina), INRA (France) and USDA-UBC (United States of America & Canada) was compiled to estimate the distribution pattern of global genetic diversity. A mixed genotyping strategy was implemented, by combining proprietary genotyping-by-sequencing data with public whole-genome-sequencing data from INRA and USDA-UBC accessions. The generated markers showed a uniform distribution across chromosomes, being the number of SNPs in accordance with chromosome length. This variation data data was produced in collaboration with Carla V Filippi with funding from CABANA

References

  1. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.
    Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brière C, Owens GL, Carrère S, Mayjonade B, Legrand L, Gill N, Kane NC, Bowers JE, Hubner S, Bellec A, Bérard A, Bergès H, Blanchet N, Boniface MC, Brunel D, Catrice O, Chaidir N, Claudel C, Donnadieu C, Faraut T, Fievet G, Helmstetter N, King M, Knapp SJ, Lai Z, Le Paslier MC, Lippi Y, Lorenzon L, Mandel JR, Marage G, Marchand G, Marquand E, Bret-Mestries E, Morien E, Nambeesan S, Nguyen T, Pegot-Espagnet P, Pouilly N, Raftis F, Sallet E, Schiex T, Thomas J, Vandecasteele C, Varès D, Vear F, Vautrin S, Crespi M, Mangin B, Burke JM, Salse J, Muños S, Vincourt P, Rieseberg LH, Langlade NB..
  2. Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
    Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB..
  3. Population structure and genetic diversity characterization of a sunflower association mapping population using SSR and SNP markers.
    Filippi CV, Aguirre N, Rivas JG, Zubrzycki J, Puebla A, Cordes D, Moreno MV, Fusari CM, Alvarez D, Heinz RA, Hopp HE, Paniego NB, Lia VV..
  4. Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
    Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB..

Image credit: By i_am_jim (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyHanXRQr1.0, INSDC Assembly GCA_002127325.1, May 2017
Database version104.1
Golden Path Length3,027,844,945
Genebuild byICSG
Genebuild methodImport
Data sourceInternational Consortium for Sunflower Genomics

Gene counts

Coding genes52,191
Non coding genes8,940
Small non coding genes8,473
Long non coding genes467
Pseudogenes51
Gene transcripts61,182

Other

Short Variants11,834

About this species