Nymphaea colorata Assembly and Gene Annotation
About Nymphaea colorata
Blue-petal water lilly Nymphaea colorata is a diploid (2n=28) with a relatively small genome size (approximately 400 Mb) popular in breeding programs. As other water lily species, it has large, showy flowers. Its flowers have limited differentiation in perianths (outer floral organs), possess both male and female organs and have diverse scents and colours, similar to many mesangiosperms (core angiosperms, including eudicots, monocots, and magnoliids). Water lilies belong to the angiosperm order Nymphaeales, which together with Amborellales and Austrobaileyales form the so-called ANA-grade of angiosperms, which represent the earliest lineages to diverge from the lineage leading to the extant mesangiosperms. This project was performed at Fujian Agriculture and Forestry University.
Assembly
Total DNA was extracted from young leaves of isolate Beijing-Zhang1983. A total of 34 SMRT cells and 49.8 Gb data composed of 5.5M reads were sequenced on PacBio RSII. The contig-level assembly was performed on full PacBio long reads using Canu v.1.3. The draft assembly was first polished using Arrow and then with Illumina short reads and Pilon. The paired-end reads from Hi-C were uniquely mapped onto the draft assembly contigs, which were grouped into chromosomes and scaffolded using the software Lachesis. The final assembly contains 1,429 contigs (contig N50 of 2.1 Mb) and total length of 409 Mb with 804 scaffolds, 770 of which were anchored onto 14 pseudo-chromosomes. Assembly quality was measured using BUSCO v.3.0, yielding a genome completeness of 94.4%. Genomic collinearity analysis unveiled evidence of a whole-genome duplication event.
Annotation
For Illumina transcriptome sequencing several organs and tissues were sampled, including mature leaf, mature leafstalk, juvenile flower, juvenile leaf, juvenile leafstalk, carpel, stamen, sepal, petal and root. Genscan and Augustus were used to carry out de novo predictions with gene model parameters trained from Arabidopsis thaliana. Furthermore, gene models were de novo predicted using MAKER. We then evaluated the genes by comparing MAKER results with the corresponding transcript evidence to select gene models that were the most consistent on the basis of an AED metric.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 708,889 Low complexity (Dust) features, covering 21 Mb (5.1% of the genome); 176,765 RepeatMasker features (with the REdat library), covering 51 Mb (12.5% of the genome); 15,309 RepeatMasker features (with the RepBase library), covering 3 Mb (0.6% of the genome); 216,157 Tandem repeats (TRF) features, covering 18 Mb (4.5% of the genome); Repeat Detector repeats length 132Mb (32.5% of the genome).
References
- The water lily genome and the early evolution of flowering plants.
Zhang L, Chen F, Zhang X, Li Z, Zhao Y, Lohaus R, Chang X, Dong W, Ho SYW, Liu X, Song A, Chen J, Guo W, Wang Z, Zhuang Y, Wang H, Chen X, Hu J, Liu Y, Qin Y, Wang K, Dong S, Liu Y, Zhang S, Yu X, Wu Q, Wang L, Yan X, Jiao Y, Kong H, Zhou X, Yu C, Chen Y, Li F, Wang J, Chen W, Chen X, Jia Q, Zhang C, Jiang Y, Zhang W, Liu G, Fu J, Chen F, Ma H, Van de Peer Y, Tang H..
Image credit: Kam Hong Leung, Carlos Magdalena & Kew Gardens
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | ASM883128v1, INSDC Assembly GCA_008831285.1, |
Database version | 113.1 |
Golden Path Length | 408,396,728 |
Genebuild by | FAFU |
Genebuild method | External annotation import |
Data source | Fujian Agriculture and Forestry University |
Gene counts
Coding genes | 28,438 |
Gene transcripts | 28,438 |