Helianthus annuus Assembly and Gene Annotation
About Helianthus annuus
The domesticated sunflower, Helianthus annuus, is a globally important oil crop that has promise for climate change adaptation, maintaining stable yields across a wide variety of environmental conditions, including drought. The large diploid genome (2n=34 ; 3.6 Gb) consists of long and highly similar repeats, making the assembly very challenging.
Assembly
DNA of the INRA inbred genotype XRQ was extracted and sequenced using 407 SMRT cells with P6/C4 chemistry. Subreads were obtained using the SMRT Analysis RS.Subreads.1 pipeline. In total 32.8 million subreads were generated with an N50 of 13.7 kb and a mean length of 10.3 kb. The targeted genome coverage of 102x was obtained with 367 Gb of raw sequence (340 Gb of subread data). The PBcR wgs8.3rc1 assembly pipeline was used to perform the correction of reads, WGS 8.3 to assemble the corrected reads and quiver to polish the consensus sequence after the construction of the pseudomolecules, which required physical and genetic maps. The final assembly included 17 pseudomolecules and 1,509 unanchored contigs.
Annotation
Gene models were predicted using EuGene 4.2. The plant early release of BUSCO (release July 2015) was run on the set of predicted transcripts, and it detected 92% of complete gene models (590 complete single copy and 291 duplicated, respectively) plus 10 additional fragmented gene models.
Repeated sequences called with the Repeat Detector, which is part of the [Ensembl Genomes repeat feature pipelines](http://plants.ensembl.or g/info/genome/annotation/repeat_features.html), cover 2.25 Gb (74.8% of the genome). Low complexity (Dust) features cover 196 Mb, RepeatMasker features (with the nrTEplants library) cover 1.94 Gb, Tandem repeats (TRF) features cover 143 Mb.
Variation
A set of 11,761 SNPs called in three sunflower pre-breeding collections belonging to INTA (Argentina), INRA (France) and USDA-UBC (United States of America & Canada) was compiled to estimate the distribution pattern of global genetic diversity. A mixed genotyping strategy was implemented, by combining proprietary genotyping-by-sequencing data with public whole-genome-sequencing data from INRA and USDA-UBC accessions. The generated markers showed a uniform distribution across chromosomes, being the number of SNPs in accordance with chromosome length. This variation data data was produced in collaboration with Carla V Filippi with funding from CABANA
References
- The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.
Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brière C, Owens GL, Carrère S, Mayjonade B, Legrand L, Gill N, Kane NC, Bowers JE, Hubner S, Bellec A, Bérard A, Bergès H, Blanchet N, Boniface MC, Brunel D, Catrice O, Chaidir N, Claudel C, Donnadieu C, Faraut T, Fievet G, Helmstetter N, King M, Knapp SJ, Lai Z, Le Paslier MC, Lippi Y, Lorenzon L, Mandel JR, Marage G, Marchand G, Marquand E, Bret-Mestries E, Morien E, Nambeesan S, Nguyen T, Pegot-Espagnet P, Pouilly N, Raftis F, Sallet E, Schiex T, Thomas J, Vandecasteele C, Varès D, Vear F, Vautrin S, Crespi M, Mangin B, Burke JM, Salse J, Muños S, Vincourt P, Rieseberg LH, Langlade NB.. - Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB.. - Population structure and genetic diversity characterization of a sunflower association mapping population using SSR and SNP markers.
Filippi CV, Aguirre N, Rivas JG, Zubrzycki J, Puebla A, Cordes D, Moreno MV, Fusari CM, Alvarez D, Heinz RA, Hopp HE, Paniego NB, Lia VV.. - Genetic Diversity, Population Structure and Linkage Disequilibrium Assessment among International Sunflower Breeding Collections.
Filippi CV, Merino GA, Montecchia JF, Aguirre NC, Rivarola M, Naamati G, Fass MI, Álvarez D, Di Rienzo J, Heinz RA, Contreras Moreira B, Lia VV, Paniego NB..
Image credit: By i_am_jim (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Links
Statistics
Summary
Assembly | HanXRQr2.0-SUNRISE, INSDC Assembly GCA_002127325.2, Jul 2020 |
Database version | 113.2 |
Golden Path Length | 3,009,595,538 |
Genebuild by | Heliagene |
Genebuild method | External annotation import |
Data source | International Consortium for Sunflower Genomics |
Gene counts
Coding genes | 70,864 |
Non coding genes | 12,313 |
Small non coding genes | 12,313 |
Gene transcripts | 83,177 |
Other
Short Variants | 11,671 |