Camelina sativa Assembly and Gene Annotation
About Camelina sativa
Camelina sativa (false flax, gold of pleasure, German sesame) is a relict oilseed crop of the Crucifer family (Brassicaceae) with centres of origin in SE Europe and SW Asia. C. sativa was cultivated in Europe as an important oilseed crop for many centuries before being displaced by higher-yielding crops such as canola and wheat. It has several agronomic advantages for production, including early maturity, low requirement for water and nutrients, adaptability to adverse environmental conditions and resistance to common cruciferous pests and pathogens. It is currently being re-embraced as an industrial oil platform crop. C. sativa is diploid (2n=40) with an estimated genome size of 785 Mb, retaining a well preserved hexaploid genome as a result of a whole-genome triplication event.
The genome of a homozygous doubled haploid line (DH55) was sequenced using a hybrid Illumina and Roche 454 next-generation sequencing (NGS) approach. Filtered sequence data (96.53 Gb) provided 123x coverage of the estimated genome size, which was assembled using a hierarchical assembly strategy into 37,871 scaffolds. A high-density genetic map based on 3,575 polymorphic markers allowed 608.54 Mb of the assembled genome, represented by 588 scaffolds to be anchored to the 20 chromosomes of C. sativa, thereby producing a highly contiguous final assembly with an N50 size of >30 Mb. The final genome assembly contains 641.45 Mb of sequence, covering 82% of the estimated genome size, 95% of which is in 20 chromosomes.
RNA-seq data (78.5 Gb) was generated from tissue samples collected at 12 different growth stages to assist with annotation of protein-coding genes. Based on a comprehensive strategy of ab initio gene prediction and homology evidence from proteome data sets, ESTs and RNA-seq transcripts, 89,418 non-redundant genes were predicted, of which 4,753 (5.3%) genes encoded two or more alternatively spliced isoforms. More than 95% (85,274) of these annotated genes were located on the pseudochromosomes with the remainder on unanchored scaffolds. Based on sequence identity 97% of the predicted C. sativa genes have homologues in UniProt. RNA-seq evidence suggested that >90% of the genes were expressed (FPKM>0) in one or more developmental stages.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 1,298,859 Low complexity (Dust) features, covering 84 Mb (13.1% of the genome); 333,331 RepeatMasker features (with the REdat library), covering 128 Mb (20.0% of the genome); 5,216 RepeatMasker features (with the RepBase library), covering 1 Mb (0.1% of the genome); 333,890 Tandem repeats (TRF) features, covering 30 Mb (4.7% of the genome).
Image credit: Fornax CC BY-SA 3.0
General information about this species can be found in Wikipedia.
|Assembly||Cs, INSDC Assembly GCA_000633955.1,|
|Golden Path Length||641,356,059|
|Genebuild by||Camelina sativa Genome Project|
|Genebuild method||External annotation import|
|Data source||Agriculture & AgriFood Canada|