Asparagus officinalis Assembly and Gene Annotation
About Asparagus officinalis
Asparagus officinalis, commonly known as garden asparagus (2n = 2x = 20), represents a diecious clade within an otherwise hermaphroditic genus. It is an emerging model system for understanding sex chromosome evolution. Gender in Asparagus is inherited as an XY system with XX females and XY males. This genome corresponds to a doubled haploid YY super male genotype DH00/086 generated through anther culture.
Assembly
Nearly 341 Gb of Illumina reads were generated for DH00/086, utilizing insert sizes that ranged from short insert paired-end libraries to larger mate-pair libraries. An initial assembly was produced using SOAPdenovo, gap-filled with GapCloser, and further scaffolded with SSPACE. To improve its contiguity, 6.07Gb of Pacific Biosystems long-reads were produced for further gap-filling and scaffolding. To anchor the assembly onto pseudomolecules representing the 10 haploid chromosomes, a population of 74 doubled haploid individuals were low-depth resequenced to produce a genetic map. The genetic map data were used to identify chimeric contigs or scaffolds in the genome assembly resulting from mis-assembly. The resulting scaffolds were then mapped onto the optical map contigs and conflicting data manually edited. Only 6.2% of the genome assembly could not be assigned to chromosomal locations in the linkage map.
Annotation
RNA-Seq reads of vegetative shoot, reproductive spear tip tissue and root tissue were individually aligned to the genome using TopHat2 and processed with Cufflinks. In addition, de novo and genome-guided RNA-Seq assembly was performed using Trinity. These transcriptomes were integrated with PASA. In addition, the Augustus ab initio gene finder was used to identify gene models after training on the PASA database. To leverage annotations from additional genomes to provide additional evidence for gene models, protein models from five plant species were aligned using Exonerate. Evidence from RNA Seq, ab initio and homology-based approaches was combined with Evidence Modeler in order to produce the final gene set.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline.There are: 1496184 Low complexity (Dust) features, covering 90 Mb (7.6% of the genome); 384068 RepeatMasker features (with the nrTEplants library), covering 164 Mb (13.8% of the genome); 1013371 Tandem repeats (TRF) features, covering 137 Mb (11.5% of the genome); Repeat Detector repeats length 770.7Mb (64.8% of the genome).
References
- The asparagus genome sheds light on the origin and evolution of a young Y chromosome.
Harkess A, Zhou J, Xu C, Bowers JE, Van der Hulst R, Ayyampalayam S, Mercati F, Riccardi P, McKain MR, Kakrana A, Tang H, Ray J, Groenendijk J, Arikit S, Mathioni SM, Nakano M, Shan H, Telgmann-Rauber A, Kanno A, Yue Z, Chen H, Li W, Chen Y, Xu X, Zhang Y, Luo S, Chen H, Gao J, Mao Z, Pires JC, Luo M, Kudrna D, Wing RA, Meyers BC, Yi K, Kong H, Lavrijsen P, Sunseri F, Falavigna A, Ye Y, Leebens-Mack JH, Chen G..
Picture credit: CSvBibra, public domain
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Aspof.V1, INSDC Assembly GCA_001876935.1, |
Database version | 113.1 |
Golden Path Length | 1,187,538,024 |
Genebuild by | University of Georgia |
Genebuild method | Import |
Data source | University of Georgia |
Gene counts
Coding genes | 24,141 |
Non coding genes | 535 |
Small non coding genes | 535 |
Gene transcripts | 24,676 |