Digitaria exilis (Fonio_CM05836)

Digitaria exilis Assembly and Gene Annotation

About Digitaria exilis

White fonio (Digitaria exilis (Kippist) Stapf) is an indigenous West African millet species with a great potential for agriculture in marginal environments. Fonio is a small annual herbaceous C4 plant, which produces very small (∼1 mm) grains that are tightly surrounded by a husk. Fonio is cultivated under a large range of environmental conditions, from a tropical monsoon climate in western Guinea to a hot, arid desert climate in the Sahel zone. Some extra-early maturing fonio varieties produce mature grains in only 70–90 days, which makes fonio one of the fastest maturing cereals. Because of its quick maturation, fonio is often grown to avoid food shortage during the lean season (period before main harvest), which is why fonio is also referred to as ‘hungry rice’. In addition, fonio is drought tolerant and adapted to nutrient-poor, sandy soils.

Assembly

Fonio is a tetraploid species (2n = 4× = 36) with a highly inbreeding reproductive system. To build a D. exilis reference assembly, an accession was chosen from one of the driest regions of fonio cultivation, CM05836 from the Mopti region in Mali. The size of the CM05836 genome was estimated to be 893 Mb/1C by flow cytometry, which is in line with previous reports. The CM05836 genome was sequenced and assembled using deep sequencing of multiple short-read libraries, including Illumina paired-end (321-fold coverage), mate-pair (241-fold coverage) and linked-read (10× Genomics, 84-fold coverage) sequencing. The raw reads were assembled and scaffolded with the software package DeNovoMAGIC3 (NRGene), which has recently been used to assemble various high-quality plant genomes. Integration of Hi-C reads (122-fold coverage) and a Bionano optical map resulted in a chromosome-scale assembly with a total length of 716,471,022 bp, of which ~91.5% (655,723,161 bp) were assembled in 18 pseudomolecules. A total of 60.75 Mb were unanchored. Of 1440 Embryophyta single copy core genes (BUSCO v3.0.2), 96.1% were recovered in the CM05836 assembly, 2.9% were missing, and 1% was fragmented. As no genetic D. exilis map is available, chromosome painting was used to further assess the quality of the CM05836 assembly. Pools of short oligonucleotides covering each one of the 18 pseudomolecules were designed based on the CM05836 assembly, fluorescently labeled, and hybridized to mitotic metaphase chromosome spreads of CM0583626. Each of the 18 libraries specifically hybridized to only one chromosome pair, confirming that the assembly unambiguously distinguished homoeologous chromosomes. Centromeric regions contained a tandem repeat with a 314 bp long unit, which was found in all fonio chromosomes. All the data was also re-assembled with the open-source TRITEX pipeline and the two assemblies showed a high degree of collinearity.

Annotation

Gene annotation was performed using the MAKER pipeline (v3.01.02) with 34.1% of the fonio genome masked as repetitive. Transcript sequences of CM05836 from flag leaves, grains, panicles, and whole above-ground seedlings in combination with protein sequences of publicly available plant genomes were used to annotate the CM05836 assembly. This resulted in the annotation of 59,844 protein-coding genes (57,023 on 18 pseudomolecules and 2821 on unanchored chromosome) with an average length of 2.5 kb and an average exon number of 4.6. The analysis of the four CM05836 RNA-seq samples showed that 44,542 protein coding genes (74.3%) were expressed (>0.5 transcripts per million), which is comparable to the annotation of the bread wheat genome.

Picture credit: Wikipedia

Statistics

Summary

AssemblyFonio_CM05836, INSDC Assembly GCA_902859565.1, Feb 2021
Database version113.1
Golden Path Length716,471,022
Genebuild byKAUST
Genebuild methodExternal annotation import
Data sourceKAUST

Gene counts

Coding genes59,844
Gene transcripts59,985