Citrullus lanatus (Cla97_v1)

Citrullus lanatus Assembly and Gene Annotation

About Citrullus lanatus

Citrullus lanatus (Watermelon, 2n=2x=22) is one of the most popular fruit crops worldwide. It belongs to the Cucurbitaceae family and originated in Africa. It has been domesticated for more than 4,000 years, and has been improved by domestication and breeding from wild watermelons with small fruits harboring hard, pale-colored and bitter- or bland-tasting flesh, into modern sweet watermelons carrying large fruits with crisp sweet and red flesh and a thin rind.

Assembly

The version 2 of the genome of east Asia cultivar 97103 was de novo assembled using PacBio long reads, combined with BioNano optical and Hi-C chromatin interaction maps. A total of 20.3 Gb PacBio sequences were generated with an N50 length of 10.8 kb, covering 47.2x of the genome. The resulting assembly had a total size of 359.8 Mb, containing 367 contigs with an N50 size of 2.3 Mb. A total of 410.7 Gb cleaned BioNano optical map data were generated and de novo assembled into BioNano genome maps, which were used to connect PacBio assembled contigs, resulting in 149 scaffolds with an N50 size of 21.9 Mb and a cumulative length of 365.1 Mb. Furthermore, 135.2M cleaned Hi-C reads were generated, of which 92.1M (68.1%) were uniquely mapped to the assembly, which resulted in a final of 69.5M valid read pairs. The Hi-C data, combined with previously published genetic maps were used to order and orient the scaffolds into chromosome-scale pseudomolecules. Finally, 31 scaffolds with a total size of 362.7 Mb (99.3% of the assembly) were clustered into 11 chromosomes ranging from 27.1 to 37.9 Mb in length.

Annotation

Illumina RNA-seq reads were assembled using Trinity v2.5.1 with the de novo mode and the genome-guided mode, respectively. The resulting transcriptome assemblies and the PacBio Iso-Seq full-length cDNA sequences were used as transcript evidence. Ab initio gene predictions were performed using Augustus v3.2.3, GeneMark-ET v4.33 and SNAP v2006-07-28. Proteins from SwissProt and from Arabidopsis, watermelon, cucumber and melon were aligned to the genome using Spaln v2.1.4, and the resulting alignments were used as protein homology evidence. Maker v3.01.02 was then run to generate high-confidence gene models by integrating ab initio predictions, transcript mapping evidence and protein homology evidence.

Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 841,860 Low complexity (Dust) features, covering 38 Mb (10.3% of the genome); 214,000 RepeatMasker features (with the nrTEplants library), covering 71 Mb (19.3% of the genome); 99,758 RepeatMasker features (with the REdat library), covering 34 Mb (9.2% of the genome); 284,068 Tandem repeats (TRF) features, covering 19 Mb (5.3% of the genome); Repeat Detector repeats length 149Mb (40.8% of the genome).

References

  1. Resequencing of 414 cultivated and wild watermelon accessions identifies selection for fruit quality traits.
    Guo S, Zhao S, Sun H, Wang X, Wu S, Lin T, Ren Y, Gao L, Deng Y, Zhang J, Lu X, Zhang H, Shang J, Gong G, Wen C, He N, Tian S, Li M, Liu J, Wang Y, Zhu Y, Jarret R, Levi A, Zhang X, Huang S, Fei Z, Liu W, Xu Y..
  2. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions.
    Guo S, Zhang J, Sun H, Salse J, Lucas WJ, Zhang H, Zheng Y, Mao L, Ren Y, Wang Z, Min J, Guo X, Murat F, Ham BK, Zhang Z, Gao S, Huang M, Xu Y, Zhong S, Bombarely A, Mueller LA, Zhao H, He H, Zhang Y, Zhang Z, Huang S, Tan T, Pang E, Lin K, Hu Q, Kuang H, Ni P, Wang B, Liu J, Kou Q, Hou W, Zou X, Jiang J, Gong G, Klee K, Schoof H, Huang Y, Hu X, Dong S, Liang D, Wang J, Wu K, Xia Y, Zhao X, Zheng Z, Xing M, Liang X, Huang B, Lv T, Wang J, Yin Y, Yi H, Li R, Wu M, Levi A, Zhang X, Giovannoni JJ, Wang J, Li Y, Fei Z, Xu Y..

cucurbitgenomics.org

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyCla97_v1, INSDC Assembly GCA_000238415.2,
Database version111.1
Golden Path Length365,450,462
Genebuild byCuGenDB
Genebuild methodExternal annotation import
Data sourceNational Engineering Research Center for Vegetables, Beijing Academy of Agriculture and Forestry Sciences

Gene counts

Coding genes22,541
Gene transcripts22,541