Ostreococcus lucimarinus Assembly and Gene Annotation
About Ostreococcus lucimarinus
Ostreococcus lucimarinus is a unicellular green alga and an important member of the picoplankton community, which plays a central role in the oceanic carbon cycle. It is one of the smallest known free-living eukaryotic species, with an average size of 0.8 µm. Its cellular structure is characterised by remarkable simplicity, lacking a cell wall and containing a single chloroplast, a single mitochondrion, and a single Golgi body as well as its nucleus.
Assembly
The genome of Ostreococcus lucimarinus CCE9901 was sequenced by JGI and finished at the Stanford Genome Center. The v2.0 release has 13.204,894 Mb of finished sequence. The sequences have been deposited in GenBank under accession numbers CP000581-CP000601. Detailed information about the project is availabe at the JGI website.
The assembly release v.2.0 contains 13,204,894 bp of finished quality sequence in 21 chromosomes.
In detail, whole genome shotgun Sanger sequences were assembled using the Phred, Phrap, Consed pipeline. Manual inspection and finishing was performed by targeted resequencing. Because of the high GC content, primer walks failed to resolve a large number of the gaps; these were resolved by generating pooled small insert shatter libraries from 3 kb plasmid clones. Repeats were resolved by transposon-hopping 8 kb plasmid clones. Fosmid clones were shotgun-sequenced and finished to fill large gaps, resolve large repeats, or resolve chromosome duplications and extend into chromosome telomere regions. Finished chromosomes have no gaps, and the sequence has less than one error in 100,000 bp.
Annotation
This release includes a total of 7,651 predicted gene models produced through the collaboration of JGI, Ghent University (Belgium) and UCSD annotation teams.
In detail, gene prediction methods included ab initio Fgenesh, Fgenesh+, Genewise, MAGPIE, estExt, and EuGene. All predicted models were clustered and the best model per locus was selected based on homology to other proteins and EST support. The predicted set of gene models has been validated by using available experimental data and computational analysis.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 5898261 - Repeats content: 44.7%
References
- The tiny eukaryote Ostreococcus provides genomic insights into the
paradox of plankton
speciation.
Palenik B, Grimwood J, Aerts A, Rouz P, Salamov A, Putnam N, Dupont C, Jorgensen R, Derelle E, Rombauts S et al. 2007. Proc. Natl. Acad. Sci. U.S.A.. 104:7705-7710.
Links
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | ASM9206v1, INSDC Assembly GCA_000092065.1, Apr 2007 |
Database version | 113.1 |
Golden Path Length | 13,204,888 |
Genebuild by | JGI |
Genebuild method | Import |
Data source | Joint Genome Institute |
Gene counts
Coding genes | 7,603 |
Non coding genes | 24 |
Small non coding genes | 24 |
Pseudogenes | 37 |
Gene transcripts | 7,664 |