Helianthus annuus Assembly and Gene Annotation

About Helianthus annuus

The domesticated sunflower, Helianthus annuus, is a globally important oil crop that has promise for climate change adaptation, maintaining stable yields across a wide variety of environmental conditions, including drought. The large diploid genome (2n=34 ; 3.6 Gb) consists of long and highly similar repeats, making the assembly very challenging.

Assembly

DNA of the INRA inbred genotype XRQ was extracted and sequenced using 407 SMRT cells with P6/C4 chemistry. Subreads were obtained using the SMRT Analysis RS.Subreads.1 pipeline. In total 32.8 million subreads were generated with an N50 of 13.7 kb and a mean length of 10.3 kb. The targeted genome coverage of 102x was obtained with 367 Gb of raw sequence (340 Gb of subread data). The PBcR wgs8.3rc1 assembly pipeline was used to perform the correction of reads, WGS 8.3 to assemble the corrected reads and quiver to polish the consensus sequence after the construction of the pseudomolecules, which required physical and genetic maps. The final assembly included 17 pseudomolecules and 1,509 unanchored contigs.

Annotation

Gene models were predicted using EuGene 4.2. The plant early release of BUSCO (release July 2015) was run on the set of predicted transcripts, and it detected 92% of complete gene models (590 complete single copy and 291 duplicated, respectively) plus 10 additional fragmented gene models.

Repeated sequences called with the Repeat Detector, which is part of the [Ensembl Genomes repeat feature pipelines](http://plants.ensembl.or g/info/genome/annotation/repeat_features.html), cover 2.25 Gb (74.8% of the genome). Low complexity (Dust) features cover 196 Mb, RepeatMasker features (with the nrTEplants library) cover 1.94 Gb, Tandem repeats (TRF) features cover 143 Mb.

Variation

A set of 11,761 SNPs called in three sunflower pre-breeding collections belonging to INTA (Argentina), INRA (France) and USDA-UBC (United States of America & Canada) was compiled to estimate the distribution pattern of global genetic diversity. A mixed genotyping strategy was implemented, by combining proprietary genotyping-by-sequencing data with public whole-genome-sequencing data from INRA and USDA-UBC accessions. The generated markers showed a uniform distribution across chromosomes, being the number of SNPs in accordance with chromosome length. This variation data data was produced in collaboration with Carla V Filippi with funding from CABANA

References

The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.
Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brière C, Owens GL, Carrère S, Mayjonade B, Legrand L, Gill N, Kane NC, Bowers JE, Hubner S, Bellec A, Bérard A, Bergès H, Blanchet N, Boniface MC, Brunel D, Catrice O, Chaidir N, Claudel C, Donnadieu C, Faraut T, Fievet G, Helmstetter N, King M, Knapp SJ, Lai Z, Le Paslier MC, Lippi Y, Lorenzon L, Mandel JR, Marage G, Marchand G, Marquand E, Bret-Mestries E, Morien E, Nambeesan S, Nguyen T, Pegot-Espagnet P, Pouilly N, Raftis F, Sallet E, Schiex T, Thomas J, Vandecasteele C, Varès D, Vear F, Vautrin S, Crespi M, Mangin B, Burke JM, Salse J, Muños S, Vincourt P, Rieseberg LH, Langlade NB..
Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB..
Population structure and genetic diversity characterization of a sunflower association mapping population using SSR and SNP markers.
Filippi CV, Aguirre N, Rivas JG, Zubrzycki J, Puebla A, Cordes D, Moreno MV, Fusari CM, Alvarez D, Heinz RA, Hopp HE, Paniego NB, Lia VV..
Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB..

Image credit: By i_am_jim (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

Statistics

Summary

Assembly	HanXRQr2.0-SUNRISE, INSDC Assembly GCA_002127325.2, Jul 2020
Database version	114.2
Golden Path Length	3,009,595,538
Genebuild by	Heliagene
Genebuild method	External annotation import
Data source	International Consortium for Sunflower Genomics

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	70,864
Non coding genes	12,313
Small non coding genes	12,313
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	83,177

Other

Short Variants

11,671

Helianthus annuus Assembly and Gene Annotation

About Helianthus annuus

Assembly

Annotation

Variation

References

Links

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Helianthus annuus Assembly and Gene Annotation

About Helianthus annuus

Assembly

Annotation

Variation

References

Links

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us