Setaria viridis Assembly and Gene Annotation
About Setaria viridis
The green foxtail, Setaria viridis (2n=2x=18, AA), is a wild ancestor of cultivated foxtail millet. It is a model system for warm-season (C4) grasses within the Panicoideae, a group of ca. 3300 species that includes not only essential grain, forage and biofuel crops, but also ecological dominants of tropical and warm temperate environments. Green foxtail plants are generally small, with a short life cycle (seed to seed in 8-10 weeks) and self-compatible, with a single inflorescence that often produces hundreds of seeds. Transformation is efficient, and is amenable to CRisPR-Cas9 mediated mutagenesis. This genome is the result of the collaboration led by the Donald Danforth Plant Science Center, the Joint Genome Institute, HudsonAlpha Institute for Biotechnology and RIKEN.
A platinum-quality genome sequence (v2.0) for reference line A10.1 was generated by assembling 4.7M PACBIO reads (118x) with the MECAT assembler and subsequently polished using QUIVER. 425.6M Illumina HiSeq reads (240x) were used for correcting homozygous SNP/indel errors. A set of 36,061 syntenic markers derived from the version 2.2 Setaria italica release was aligned to the MECAT assembly. Misjoins were characterized as a discontinuity in the italica linkage group. A total of 15 breaks were identified and made. The viridis scaffolds were then oriented, ordered, and joined together into 9 chromosomes using syntenic markers. A total of 61 joins were made during this process, each padded with 10K Ns. Significant telomeric sequence was identified using the TTTAGGG repeat. The final assembly contains 395.1 Mb of sequence, consisting of 75 contigs with a contig N50 of 11.2 Mb and a total of 99.95% of assembled bases in chromosomes.
The genome sequence was annotated using the unpublished Joint Genome Institute plant annotation pipeline to identify 38,334 gene models with 14,125 alternative transcripts.
Repeats were annotated with the Ensembl Genomes repeat feature pipeline. There are: 384,561 Low complexity (Dust) features, covering 11 Mb (2.8% of the genome); 194,994 RepeatMasker features (with the REdat library), covering 100 Mb (25.2% of the genome); 2,178 RepeatMasker features (with the RepBase library), covering 0 Mb (0.1% of the genome); 159600 Tandem repeats (TRF) features, covering 17 Mb (4.2% of the genome).
General information about this species can be found in Wikipedia.