Lactuca sativa Assembly and Gene Annotation
About Lactuca sativa
Lettuce (Lactuca sativa L.) is an important vegetable crop species and is widely consumed as salad greens in many countries. It was first depicted on wall paintings of Egyptian tombs around 2,500 BC, making it one of the oldest known vegetable crops. It is believed that cultivated lettuce was domesticated from its progenitor L. serriola, and several hypotheses were proposed regarding the domestication center of lettuce, including Egypt, the Mediterranean area, the Middle East and Southwest Asia. Lettuce belongs to Compositae (also known as Asteraceae) family which is the most successful family of flowering plants on earth in terms of number of species and diversity of habitats colonized. The family is thought to have originated in the mid-Eocene (45–49 Myr) and expanded greatly during the Oligocene (28–36 Myr). It encompasses 1,620 recognized genera and at least 23,600 species, constituting approximately 10% of all angiosperms. L. sativa is diploid with 2n=2x=18 chromosomes and an estimated genome size of 2.5 Gb.
A whole-genome shotgun strategy was used to sequence and assemble the genome of L. sativa cultivar Salinas from Illumina short reads. A total of 198.5 Gb Illumina paired-end and mate-pair reads were generated from seven libraries of different fragment sizes. After filtering, this provided 72.5-fold coverage of the 2.7 Gb genome as estimated by K-mer analysis. This initial SOAPdenovo assembly consisted of 153,952 contigs and 21,686 scaffolds greater than 1 kb with the largest scaffold being 3.1 Mb. The N50s of contigs and scaffolds were 12 and 476 kb, respectively. The mean size of gaps in the scaffolds was 1.3 kb. The Chicago library data (in vitro proximity ligation) scaffolded with the HiRise software pipeline increased the contiguity of scaffolds considerably. The final HiRise assembly decreased the 21,686 scaffolds to 11,474 superscaffolds and increased the N50 from 476 to 1,769 kb; 50 and 90% of the genome is represented in only 385 and 1,520 superscaffolds, respectively. The largest superscaffold is 12.2 Mb and contains 27 SOAPdenovo scaffolds. The total length of the assembly is 2.38 Gb, covering ∼88% of the estimated genome size of L. sativa. A total of 9,140 scaffolds of the L. sativa assembly could be clustered into nine chromosomal linkage groups and then mapped into genetic bins ordered along each chromosomal linkage group.
A high confidence gene set of 38,919 gene models with good protein or EST support was constructed for L. sativa by merging gene models from different prediction pipelines. These gene models have average coding lengths of 1.05 kb and 4.5 exons per gene, similar to those in other sequenced plant genomes. The average intergenic distance of 39.5 kb places L. sativa between Nicotiana tomentosis and Capsicum annum consistent with a direct correlation between intergenic distance and genome size. Out of the total number of predicted genes, 29,681 (76.27%) genes had similarity to Arabidopsis TAIR10 annotations and 28,951 (74.3%) were annotated using InterProScan521. Annotation using KEGG22 database yielded information for 7,553 L. sativa gene predictions. The combined data sets provided functional annotation for 31,348 (80.5%) gene models.(1)
Variation from the European Variation Archive was added.
Take from Genome-wide SNP discovery and population structure analysis conducted using genotyping by sequencing (https://data.nal.usda.gov/dataset/genetic-diversity-analysis-lettuce-using-genotyping-sequencing).
- Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.
Reyes-Chin-Wo S, Wang Z, Yang X, Kozik A, Arikit S, Song C, Xia L, Froenicke L, Lavelle DO, Truco MJ, Xia R, Zhu S, Xu C, Xu H, Xu X, Cox K, Korf I, Meyers BC, Michelmore RW.. Nat Commun 8
Picture credit: Wikipedia
|Assembly||Lsat_Salinas_v7, INSDC Assembly GCA_002870075.2, Apr 2020|
|Golden Path Length||2,391,062,152|
|Genebuild method||External annotation import|