Rosa chinensis Assembly and Gene Annotation

About Rosa chinensis

Rose is the world’s most important ornamental plant, with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants, holding great symbolic and cultural value. Roses appeared as decoration on 5,000-year-old Asian pottery, and Romans cultivated roses for their flowers and essential oil. Today, no ornamental plants have greater economic importance than roses. Roses are also used for scent production and for culinary purposes. This genome corresponds to a doubled haploid (2n=14) of Rosa chinensis derived from Chinese variety ‘Old Blush’. ‘Old Blush’ (syn. Parsons’ Pink China) was brought to Europe and North America in the eighteenth century from China and is one of the most influential genotypes in the history of rose breeding.

Assembly

The homozygous doubled haploid was sequenced on the PacBio RS II platform. An 80x sequencing coverage was obtained with 40 single-molecule real-time cells. Preliminary assembly of the rose data with a single assembler generated several hundred of contigs, illustrating the challenge of assembling plant genomes despite long-reads data. A key step in improving the contiguity of the assembly was the detection and the filtering of spurious edges in the graph of overlaps. The assembler CANU implements filter parametrization at the read level, leading to more accurate and contiguous assemblies. For this purpose a software called til-r was developed which implements similar and alternate heuristics to clean the graph of overlaps of the FALCON assembler. Then, CANU was used to perform a meta-assembly of six complementary raw assemblies generated by CANU and FALCON/TIL-R. The final assembly was composed of 82 contigs for an N50 of 24Mb, increasing the contiguity metrics of a simple assembly threefold and demonstrating the power of meta-assembly approaches.

The seven pseudo-chromosomes were built by integrating 86.4% of the 25,695 markers of the K5 high-density genetic map. A large fraction of the assembly (97.7%, 503Mb) was oriented with Pearson's correlation coefficients ranging from 0.986 to 0.996, illustrating the high congruence between sequence and genetic data. The genome structure and quality was confirmed by the mapping of Hi-C chromosomal contact map information data. With very few remaining gaps and high consistency between genetics and sequence data, the rose genome assembly is one of the most contiguous obtained so far for a plant genome.

Annotation

The genome encodes 36,377 inferred protein-coding genes and 3,971 long non-coding RNAs. Annotation assessment with the Plantae BUSCO v2 dataset identified 96.5% complete gene models. Based on transcriptomic data from pooled tissues, 207 miRNA precursors were predicted.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 744,291 Low complexity (Dust) features, covering 22 Mb (4.3% of the genome); 187,134 RepeatMasker features (with the REdat library), covering 64 Mb (12.4% of the genome); 8,522 RepeatMasker features (with the RepBase library), covering 1 Mb (0.2% of the genome); 312,755 Tandem repeats (TRF) features, covering 28 Mb (5.5% of the genome); Repeat Detector repeats length 247.9Mb (48% of the genome).

References

The Rosa genome provides new insights into the domestication of modern roses.
Olivier Raymond, Jérôme Gouzy, Jérémy Just et al. 2018 . Nature Genetics . 50(6):772-777.

Picture credit: Wikimedia commons

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	RchiOBHm-V2, INSDC Assembly GCA_002994745.2,
Database version	115.1
Golden Path Length	515,588,973
Genebuild by	INRA/CNRS
Genebuild method	Import
Data source	INRA/CNRS

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	45,464
Non coding genes	4,969
Small non coding genes	4,969
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	84
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	50,517

Rosa chinensis Assembly and Gene Annotation

About Rosa chinensis

Assembly

Annotation

References

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Rosa chinensis Assembly and Gene Annotation

About Rosa chinensis

Assembly

Annotation

References

More information

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us