Ficus carica Assembly and Gene Annotation
About Ficus carica
The fig tree (Ficus carica L., Moraceae) is a heterozygous diploid (2n=2x=26) species widely grown for its fruit throughout the temperate world. This crop is one of the oldest known domesticated species. Fig has valuable nutritional characteristics and the ability to adapt to marginal soils and difficult environmental conditions. However, rapid perishability of its fresh fruits, and deficiencies to its responses to abiotic stresses and new diseases, restricts its world distribution and commercial success. The availability of a high-quality reference genome would provide an important resource to genetic improvement and breeding programmes, to improve the fig’s ability to be cultivated and distribute its fruit on a more extensive global scale. This is the genome sequence of ancient Italian cultivar Dottato.
PacBio long-reads were assembled with the diploid FALCON-Unzip assembler, producing a primary set of contigs and a set of linked haplotigs that represented the alternative genome structures of the primary contigs. The primary assembly was upgraded using FinisherSC and then polished, together with the haplotigs, using Arrow and Pilon tools. A total of 333 Mbp of the fig genome sequence, corresponding to ~95% of the estimated size, were obtained, and the primary contigs had a mean contig size of 368Kbp and N50 of 823Kbp. The mean contig size of the haplotigs was 58 Kbp and the N50 was 89 Kbp. The final phased genome is ~333 Mbp in size, of which 80% has been anchored to 13 chromosomes. After excluding fungal and bacterial contamination, we measured the core gene completeness using BUSCO software. BUSCO recovered 1283 of the 1375 (93.3%) highly conserved Embryophyta core genes, of which 1177 (85.6%) were complete and single-copy and 106 (7.7%) were complete and duplicated, 35 genes (2.5%) were fragmented, and 57 (4.2%) missing.
To produce a comprehensive gene annotation both ab initio and transcriptome-based strategies were used. The assembled transcriptome consisted of 127470 contigs with an average length of 1455 bp. Gene average length was 2460 bp and coding sequence (CDS) average length was 956 bp. The average exon number per gene was 4.56, with an average length of 251 bp, while intron average length was 367 bp. In total, exon length was 43.5 Mbp, while intron length was 49.55 Mbp. Functional annotation showed that 28737 of the predicted protein-coding genes (76% of the total) had a BLAST hit (E-value < 0.001) in NCBI nr.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: * 357554 Red features, covering 124 Mb (37.2% of the genome); * 1105757 Low complexity (Dust) features, covering 38 Mb (11.4% of the genome); * 137725 RepeatMasker features (with the nrTEplants library), covering 37 Mb (11.0% of the genome); * 420051 Tandem repeats (TRF) features, covering 26 Mb (7.8% of the genome).
- Epigenetic patterns within the haplotype phased fig (Ficus carica L.) genome.
Usai G, Mascagni F, Giordani T, Vangelisti A, Bosi E, Zuccolo A, Ceccarelli M, King R, Hassani-Pak K, Zambrano LS, Cavallini A, Natali L..
- Cultivar-specific transcriptome prediction and annotation in Ficus carica L.
Solorzano Zambrano L, Usai G, Vangelisti A, Mascagni F, Giordani T, Bernardi R, Cavallini A, Gucci R, Caruso G, D'Onofrio C, Quartacci MF, Picciarelli P, Conti B, Lucchi A, Natali L..
- Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).
Mori K, Shirasawa K, Nogata H, Hirata C, Tashiro K, Habu T, Kim S, Himeno S, Kuhara S, Ikegami H..
Picture credit: Adapted from https://en.wikipedia.org/wiki/Common_fig