
Triticum aestivum Alchemy Assembly and Gene Annotation
About Triticum aestivum Alchemy
Bread wheat, also known as common wheat (Triticum aestivum), is a cultivated wheat species. About 95% of wheat produced worldwide is bread wheat; it is the most widely grown of all crops and the cereal with the highest monetary yield.
"Alchemy" is a bread wheat cultivar and one of eight UK wheat varieties that were used to establish the "NIAB Elite MAGIC" multi-founder inter-cross population.
Picture credit: Creative Commons Attribution 2.0 BY Richard Horsnell, Niab.
Taxonomy ID 4565
(Text from Wikipedia.)
More information General information about this species can be found in Wikipedia
Assembly
The assembly presented here has been imported from INSDC and is linked to the assembly accession GCA_951799155.1. The total length of the assembly is 15.3 Gb contained within 121 scaffolds. The scaffold N50 value is 502,756,319, the scaffold L50 value is 13. The GC% content of the assembly is 46.0%. The scaffolds were generated using the W2RAP pipeline (Clavijo, B.J. et al. 2017) and assembled into contigs with w2rap-contigger (k=200, default parameters). Mate-pair libraries were prepared, filtered, and used for scaffolding with the W2RAP version of the SOAP scaffolder (K=71), prioritizing paired-end libraries followed by mate-pairs ordered by insert size. Scaffolds shorter than 500 bp were removed.
Reference-guided pseudomolecules were constructed with a modified TRITEX pipeline (Monat et al. 2019), generating a guide map derived from the chromosome-scale sequence of the ‘winter’ German bread wheat cv. Julius (Walkowiak et al. 2020). Single-copy regions were extracted from the Julius assembly using BBDuk (Bushnell et al. 2017) and aligned to the W2RAP assemblies using Minimap2 (Li, 2018). Contigs longer than 300 kb, with sufficient single-copy alignments, were ordered and oriented based on majority rule, and assembled into pseudomolecules using TRITEX tools.
Hi-C reads were aligned to the WR2AP contigs using the TRITEX pipeline using tools Minimap2 for alignment, Novosort for sorting, SAMtools (Danecek et al. 2021), and BEDTools (Quinlan & Hall, 2010) for aggregation of information. Hi-C contact maps at 1 Mb resolution arranged according to the chromosomal AGP files were plotted with TRITEX functions and manually inspected for off-diagonal signals to spot large structural variants between the assembled genomes and the Julius guide genome.
Annotation
Ensembl Plants displays genes imported from a community GFF3 file provided by Niab linked to the assembly with accession GCA_951799155.1. Gene models for the Alchemy genome were transferred from the Triticum aestivum Chinese Spring reference annotation (IWGSC RefSeq v1.1; Alaux et al., 2018; IWGSC, 2018) using Liftoff (Shumate & Salzberg, 2021). The annotations were mapped to Alchemy pseudomolecules, including unanchored contigs. The annotation process used Liftoff with the following parameters: -flank 0.05 -exclude_partial -copies -polish -chroms -unplaced with minimap2 (Li, 2018) configured as the aligner using: -mm2_options "-a --end-bonus 5 --eqx -N 50 -p 0.5 -I 20G"
Post-processing of the resulting GFF files involved filtering out duplicate gene annotations with identical coordinates. In such cases, the best-supported model was retained based on coverage and sequence identity.
Genomic annotation was provided along with initial assembly submission by "Niab".
Small RNA features, protein features, BLAST hits and cross-references have been computed by Ensembl Plants.
References
Alaux, M. et al. (2018). Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biology, 19:111.
Appels, R., Eversole, K., Feuillet, C., et al. (IWGSC) (2018). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science, 361:eaar7191.
Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge - Accurate paired shotgun read merging via overlap. PLoS ONE, 12(10):e0185056. https://doi.org/10.1371/journal.pone.0185056
Clavijo, B. J. et al. (2017). W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data. bioRxiv [Preprint]. https://doi.org/10.1101/110999
Danecek, P., Bonfield, J. K., Liddle, J., et al. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. https://doi.org/10.1093/gigascience/giab008
Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. https://doi.org/10.1093/bioinformatics/bty191
Monat, C., Padmarasu, S., Lux, T., et al. (2019). TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biology, 20, 284. https://doi.org/10.1186/s13059-019-1899-5
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. https://doi.org/10.1093/bioinformatics/btq033
Shumate, A., & Salzberg, S. L. (2021). Liftoff: accurate mapping of gene annotations. Bioinformatics, 37(12), 1639–1643. https://doi.org/10.1093/bioinformatics/btaa1016
Walkowiak, S., Gao, L., Monat, C., et al. (2020). Multiple wheat genomes reveal global variation in modern breeding. Nature, 588, 277–283. https://doi.org/10.1038/s41586-020-2961-x
Statistics
Summary
Assembly | GCA951799155v1, INSDC Assembly GCA_951799155.1, |
Database version | 115.1 |
Golden Path Length | 15,334,867,051 |
Genebuild by | NIAB |
Genebuild method | Import |
Data source | NIAB |
Gene counts
Coding genes | 103,027 |
Non coding genes | 13 |
Small non coding genes | 13 |
Pseudogenes | 3,746 |
Gene transcripts | 131,360 |