Triticum aestivum Alchemy (GCA951799155v1)

Triticum aestivum Alchemy Assembly and Gene Annotation

About Triticum aestivum Alchemy

Bread wheat, also known as common wheat (Triticum aestivum), is a cultivated wheat species. About 95% of wheat produced worldwide is bread wheat; it is the most widely grown of all crops and the cereal with the highest monetary yield.

"Alchemy" is a bread wheat cultivar and one of eight UK wheat varieties that were used to establish the "NIAB Elite MAGIC" multi-founder inter-cross population.

Picture credit: Creative Commons Attribution 2.0 BY Richard Horsnell, Niab.

Taxonomy ID 4565

(Text from Wikipedia.)

More information General information about this species can be found in Wikipedia

Assembly

The assembly presented here has been imported from INSDC and is linked to the assembly accession GCA_951799155.1. The total length of the assembly is 15.3 Gb contained within 121 scaffolds. The scaffold N50 value is 502,756,319, the scaffold L50 value is 13. The GC% content of the assembly is 46.0%. The scaffolds were generated using the W2RAP pipeline (Clavijo, B.J. et al. 2017) and assembled into contigs with w2rap-contigger (k=200, default parameters). Mate-pair libraries were prepared, filtered, and used for scaffolding with the W2RAP version of the SOAP scaffolder (K=71), prioritizing paired-end libraries followed by mate-pairs ordered by insert size. Scaffolds shorter than 500 bp were removed.

Reference-guided pseudomolecules were constructed with a modified TRITEX pipeline (Monat et al. 2019), generating a guide map derived from the chromosome-scale sequence of the ‘winter’ German bread wheat cv. Julius (Walkowiak et al. 2020). Single-copy regions were extracted from the Julius assembly using BBDuk (Bushnell et al. 2017) and aligned to the W2RAP assemblies using Minimap2 (Li, 2018). Contigs longer than 300 kb, with sufficient single-copy alignments, were ordered and oriented based on majority rule, and assembled into pseudomolecules using TRITEX tools.

Hi-C reads were aligned to the WR2AP contigs using the TRITEX pipeline using tools Minimap2 for alignment, Novosort for sorting, SAMtools (Danecek et al. 2021), and BEDTools (Quinlan & Hall, 2010) for aggregation of information. Hi-C contact maps at 1 Mb resolution arranged according to the chromosomal AGP files were plotted with TRITEX functions and manually inspected for off-diagonal signals to spot large structural variants between the assembled genomes and the Julius guide genome.

Annotation

Ensembl Plants displays genes imported from a community GFF3 file provided by Niab linked to the assembly with accession GCA_951799155.1. Gene models for the Alchemy genome were transferred from the Triticum aestivum Chinese Spring reference annotation (IWGSC RefSeq v1.1; Alaux et al., 2018; IWGSC, 2018) using Liftoff (Shumate & Salzberg, 2021). The annotations were mapped to Alchemy pseudomolecules, including unanchored contigs. The annotation process used Liftoff with the following parameters: -flank 0.05 -exclude_partial -copies -polish -chroms -unplaced with minimap2 (Li, 2018) configured as the aligner using: -mm2_options "-a --end-bonus 5 --eqx -N 50 -p 0.5 -I 20G"

Post-processing of the resulting GFF files involved filtering out duplicate gene annotations with identical coordinates. In such cases, the best-supported model was retained based on coverage and sequence identity.

Genomic annotation was provided along with initial assembly submission by "Niab".

Small RNA features, protein features, BLAST hits and cross-references have been computed by Ensembl Plants.

References

  1. Alaux, M. et al. (2018). Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biology, 19:111.

  2. Appels, R., Eversole, K., Feuillet, C., et al. (IWGSC) (2018). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science, 361:eaar7191.

  3. Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge - Accurate paired shotgun read merging via overlap. PLoS ONE, 12(10):e0185056. https://doi.org/10.1371/journal.pone.0185056

  4. Clavijo, B. J. et al. (2017). W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data. bioRxiv [Preprint]. https://doi.org/10.1101/110999

  5. Danecek, P., Bonfield, J. K., Liddle, J., et al. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. https://doi.org/10.1093/gigascience/giab008

  6. Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. https://doi.org/10.1093/bioinformatics/bty191

  7. Monat, C., Padmarasu, S., Lux, T., et al. (2019). TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biology, 20, 284. https://doi.org/10.1186/s13059-019-1899-5

  8. Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. https://doi.org/10.1093/bioinformatics/btq033

  9. Shumate, A., & Salzberg, S. L. (2021). Liftoff: accurate mapping of gene annotations. Bioinformatics, 37(12), 1639–1643. https://doi.org/10.1093/bioinformatics/btaa1016

  10. Walkowiak, S., Gao, L., Monat, C., et al. (2020). Multiple wheat genomes reveal global variation in modern breeding. Nature, 588, 277–283. https://doi.org/10.1038/s41586-020-2961-x

Statistics

Summary

AssemblyGCA951799155v1, INSDC Assembly GCA_951799155.1,
Database version115.1
Golden Path Length15,334,867,051
Genebuild byNIAB
Genebuild methodImport
Data sourceNIAB

Gene counts

Coding genes103,027
Non coding genes13
Small non coding genes13
Pseudogenes3,746
Gene transcripts131,360