Echinochloa crus-galli Assembly and Gene Annotation
About Echinochloa crus-galli
Barnyardgrass (Echinochloa crus-galli) is a pernicious weed in agricultural fields worldwide.
Assembly
The E. crus-galli line STB08, collected from rice paddy fields in the lower Yangtze River region of China, highly resembles cultivated rice in morphology, and has a chromosome number of 2n = 6x = 54. A total of 207.4 Gb of sequence data were generated using the Illumina HiSeq 2000 system from STB08 genomic DNA libraries with fragment sizes varying between 160 bp to 20 Kb. In addition, the Pacbio RS II system was used to generate 32.9 Gb third-generation long reads, totally representing ~ 171× coverage of the E. crus-galli genome estimated to be ~ 1.4 Gb in size based on the K-mer analysis and flow cytometry. De novo assembly yielded a draft genome of 1.27 Gb, representing 90.7% of the E. crus-galli genome ( > 1 Kb), with a scaffold N50 length of 1.8 Mb. Five fosmid clones ( > 15 Kb) were sequenced and compared with the assembly, and is confirm to be of good consistence. About 92.3% of the core eukaryotic genes (CEGs) could be completely aligned with the E. crus-galli gene set. We have also used BUSCO to judge the assembly of E. crus-galli, and found that the ‘complete’ percent is 95.5%, which is comparable to that of S. bicolor (96.4%) and S. italica (94.3%) genome.
Annotation
For gene annotation, transcriptomic data from the whole plant were generated by RNA-Seq. By integrating gene finding results from ab initio, homology- and transcript-based approaches, 108,771 protein-coding in the E. crus-galli genome were predicted. Of the 108,771 genes, 85% were supported by either the identification of homologues in other species or RNA-Seq data. In addition to protein-coding genes, 785 microRNAs (miRNAs) and other non-coding RNAs were also identified in the E. crus-galli genome.
- Echinochloa crus-galli genome analysis provides insight into its adaptation and invasiveness as a weed.
Guo L, Qiu J, Ye C, Jin G, Mao L, Zhang H, Yang X, Peng Q, Wang Y, Jia L, Lin Z, Li G, Fu F, Liu C, Chen L, Shen E, Wang W, Chu Q, Wu D, Wu S, Xia C, Zhang Y, Zhou X, Wang L, Wu L, Song W, Wang Y, Shu Q, Aoki D, Yumoto E, Yokota T, Miyamoto K, Okada K, Kim DS, Cai D, Zhang C, Lou Y, Qian Q, Yamaguchi H, Yamane H, Kong CH, Timko MP, Bai L, Fan L.. Nat Commun 8 (1)
Picture credit: Wikipedia
Statistics
Summary
Assembly | ec_v3, INSDC Assembly GCA_020466025.1, Oct 2021 |
Database version | 113.1 |
Golden Path Length | 1,340,710,827 |
Genebuild by | ARRAY(0x6644f00) |
Genebuild method | External annotation import |
Data source | ZhejiangUniversity |
Gene counts
Coding genes | 103,850 |
Gene transcripts | 103,850 |