Triticum aestivum Alchemy Assembly and Gene Annotation

About Triticum aestivum Alchemy

Bread wheat, also known as common wheat (Triticum aestivum), is a cultivated wheat species. About 95% of wheat produced worldwide is bread wheat; it is the most widely grown of all crops and the cereal with the highest monetary yield.

"Alchemy" is a bread wheat cultivar and one of eight UK wheat varieties that were used to establish the "NIAB Elite MAGIC" multi-founder inter-cross population.

Picture credit: Creative Commons Attribution 2.0 BY Richard Horsnell, Niab.

Taxonomy ID 4565

(Text from Wikipedia.)

More information General information about this species can be found in Wikipedia

Assembly

The assembly presented here has been imported from INSDC and is linked to the assembly accession GCA_951799155.1. The total length of the assembly is 15.3 Gb contained within 121 scaffolds. The scaffold N50 value is 502,756,319, the scaffold L50 value is 13. The GC% content of the assembly is 46.0%. The scaffolds were generated using the W2RAP pipeline (Clavijo, B.J. et al. 2017) and assembled into contigs with w2rap-contigger (k=200, default parameters). Mate-pair libraries were prepared, filtered, and used for scaffolding with the W2RAP version of the SOAP scaffolder (K=71), prioritizing paired-end libraries followed by mate-pairs ordered by insert size. Scaffolds shorter than 500 bp were removed.

Reference-guided pseudomolecules were constructed with a modified TRITEX pipeline (Monat et al. 2019), generating a guide map derived from the chromosome-scale sequence of the ‘winter’ German bread wheat cv. Julius (Walkowiak et al. 2020). Single-copy regions were extracted from the Julius assembly using BBDuk (Bushnell et al. 2017) and aligned to the W2RAP assemblies using Minimap2 (Li, 2018). Contigs longer than 300 kb, with sufficient single-copy alignments, were ordered and oriented based on majority rule, and assembled into pseudomolecules using TRITEX tools.

Hi-C reads were aligned to the WR2AP contigs using the TRITEX pipeline using tools Minimap2 for alignment, Novosort for sorting, SAMtools (Danecek et al. 2021), and BEDTools (Quinlan & Hall, 2010) for aggregation of information. Hi-C contact maps at 1 Mb resolution arranged according to the chromosomal AGP files were plotted with TRITEX functions and manually inspected for off-diagonal signals to spot large structural variants between the assembled genomes and the Julius guide genome.

Annotation

Ensembl Plants displays genes imported from a community GFF3 file provided by Niab linked to the assembly with accession GCA_951799155.1. Gene models for the Alchemy genome were transferred from the Triticum aestivum Chinese Spring reference annotation (IWGSC RefSeq v1.1; Alaux et al., 2018; IWGSC, 2018) using Liftoff (Shumate & Salzberg, 2021). The annotations were mapped to Alchemy pseudomolecules, including unanchored contigs. The annotation process used Liftoff with the following parameters: -flank 0.05 -exclude_partial -copies -polish -chroms -unplaced with minimap2 (Li, 2018) configured as the aligner using: -mm2_options "-a --end-bonus 5 --eqx -N 50 -p 0.5 -I 20G"

Post-processing of the resulting GFF files involved filtering out duplicate gene annotations with identical coordinates. In such cases, the best-supported model was retained based on coverage and sequence identity.

Genomic annotation was provided along with initial assembly submission by "Niab".

Small RNA features, protein features, BLAST hits and cross-references have been computed by Ensembl Plants.

References

Alaux, M. et al. (2018). Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data. Genome Biology, 19:111.
Appels, R., Eversole, K., Feuillet, C., et al. (IWGSC) (2018). Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science, 361:eaar7191.
Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge - Accurate paired shotgun read merging via overlap. PLoS ONE, 12(10):e0185056. https://doi.org/10.1371/journal.pone.0185056
Clavijo, B. J. et al. (2017). W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data. bioRxiv [Preprint]. https://doi.org/10.1101/110999
Danecek, P., Bonfield, J. K., Liddle, J., et al. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. https://doi.org/10.1093/gigascience/giab008
Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. https://doi.org/10.1093/bioinformatics/bty191
Monat, C., Padmarasu, S., Lux, T., et al. (2019). TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biology, 20, 284. https://doi.org/10.1186/s13059-019-1899-5
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. https://doi.org/10.1093/bioinformatics/btq033
Shumate, A., & Salzberg, S. L. (2021). Liftoff: accurate mapping of gene annotations. Bioinformatics, 37(12), 1639–1643. https://doi.org/10.1093/bioinformatics/btaa1016
Walkowiak, S., Gao, L., Monat, C., et al. (2020). Multiple wheat genomes reveal global variation in modern breeding. Nature, 588, 277–283. https://doi.org/10.1038/s41586-020-2961-x

Statistics

Summary

Assembly	GCA951799155v1, INSDC Assembly GCA_951799155.1,
Database version	115.1
Golden Path Length	15,334,867,051
Genebuild by	NIAB
Genebuild method	Import
Data source	NIAB

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	103,027
Non coding genes	13
Small non coding genes	13
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	3,746
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	131,360

Triticum aestivum Alchemy Assembly and Gene Annotation

About Triticum aestivum Alchemy

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Triticum aestivum Alchemy Assembly and Gene Annotation

About Triticum aestivum Alchemy

Assembly

Annotation

References

Statistics

Summary

Gene counts

About Us

Get help

Our sister sites

Follow us