Eucalyptus grandis Assembly and Gene Annotation
About Eucalyptus grandis.
Eucalyptus grandis is a diploid (2n = 2x = 22) naturally found in Queensland and New South Wales, Australia. Eucalypts are the world’s most widely planted hardwood trees. They are predominantly outcrossers with hermaphroditic animal-pollinated flowers, and thus highly heterozygous. This genome corresponds to BRASUZ1, a S1 individual (one generation of selfing) of an elite tree originally derived from seed lots collected in Coffs Harbor (Australia).
Assembly
Sequencing reads were collected with standard Sanger sequencing protocols on ABI 3730XL automated sequencers at the Joint Genome Institute, USA. Three different sized libraries were used for the plasmid subclone sequencing process and paired-end sequencing. A total of 3,446,208 reads from the 2.6-kb sized libraries, 3,479,232 reads from the 6.0-kb sized libraries and 518,016 reads from a 36.2–40.6-kb library were sequenced. Two BAC libraries (EG_Ba, 127.5-kb insert and EG_Bb, 155.0-kb insert) were end sequenced to add an additional 294,912 reads for long-range linking.
The sequence reads were assembled using a modified version of Arachne v.20071016. The resulting output was then passed through Rebuilder and SquashOverlaps subsequently run through another complete Arachne assembly process to finalize the assembly. This produced 6,043 scaffold sequences, with a scaffold L50 of 4.9 Mb and total scaffold size of 692.7 Mb. For chromosome-scale pseudomolecule construction, markers from a genetic map were placed and a total of 19 breaks made based on linkage group discontiguity. A subset of the broken scaffolds were combined using 257 joins to form the 11 pseudomolecule chromosomes, which contained 605.9 Mb out of 691.3 Mb (88%) of the assembled sequence. The final assembly contains 4,952 scaffolds with a contig L50 of 67.2 kb and a scaffold L50 of 53.9 Mb.
Annotation
The completeness of the assembly was estimated using 1,007,962 ESTs from BRASUZ1, finding that 98.98% of available expressed gene loci were included in the 11 chromosome assemblies. To produce the gene set the homology-based FgenesH and GenomeScan were used. The best gene prediction at each locus was selected and integrated with EST assemblies using PASA.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline.There are: 1104174 Low complexity (Dust) features, covering 73 Mb (10.5% of the genome); 276345 RepeatMasker features (with the nrTEplants library), covering 95 Mb (13.7% of the genome); 410973 Tandem repeats (TRF) features, covering 34 Mb (4.9% of the genome); Repeat Detector repeats length 278Mb (40.3% of the genome).
References
- The genome of Eucalyptus grandis.
Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, Jenkins J, Lindquist E, Tice H, Bauer D, Goodstein DM, Dubchak I, Poliakov A, Mizrachi E, Kullan AR, Hussey SG, Pinard D, van der Merwe K, Singh P, van Jaarsveld I, Silva-Junior OB, Togawa RC, Pappas MR, Faria DA, Sansaloni CP, Petroli CD, Yang X, Ranjan P, Tschaplinski TJ, Ye CY, Li T, Sterck L, Vanneste K, Murat F, Soler M, Clemente HS, Saidi N, Cassan-Wang H, Dunand C, Hefer CA, Bornberg-Bauer E, Kersting AR, Vining K, Amarasinghe V, Ranik M, Naithani S, Elser J, Boyd AE, Liston A, Spatafora JW, Dharmwardhana P, Raja R, Sullivan C, Romanel E, Alves-Ferreira M, Külheim C, Foley W, Carocha V, Paiva J, Kudrna D, Brommonschenkel SH, Pasquali G, Byrne M, Rigault P, Tibbits J, Spokevicius A, Jones RC, Steane DA, Vaillancourt RE, Potts BM, Joubert F, Barry K, Pappas GJ, Strauss SH, Jaiswal P, Grima-Pettenati J, Salse J, Van de Peer Y, Rokhsar DS, Schmutz J..
Picture credit: Copyright Peter Woodard, CC BY-SA 3.0
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Egrandis1_0, INSDC Assembly GCA_000612305.1, |
Database version | 113.1 |
Golden Path Length | 691,269,672 |
Genebuild by | Geneglob |
Genebuild method | Import |
Data source | Geneglob |
Gene counts
Coding genes | 36,779 |
Gene transcripts | 46,920 |