Trifolium pratense (Trpr)

Trifolium pratense Assembly and Gene Annotation

The red clover genome is a joint effort of TGAC and IBERS. This work was funded by an Institute Programme Grant to IBERS (BB/J004405/1) from the Biotechnology and Biological Sciences Research Council (BBSRC), by the ERANET Plant Genomics programme (ERAPG038A-TRANSLEG) and by a Capacity, Capability Challenge Programme from TGAC.

About Trifolium pratense

Red clover (Trifolium pratense) is one of the most important forage legume crops in temperate agriculture, and a key component of sustainable intensification of livestock farming systems. Red clover is a highly heterozygous diploid (2n = 14) species due to its gametophytic self-incompatibility system. This assembly provides a chromosome-scale reference draft genome for a red clover genotype of the variety Milvus (Milvus B).


This assembly from the Earlham Institute provides a chromosome-scale reference draft genome for a red clover genotype of the variety Milvus (Milvus B) by integration of Whole Genome Sequencing (WGS) of short-length reads, Sanger-based bacterial artificial chromosome (BAC) end sequences, a physical and two genetic maps. WGS was assembled from paired-end and mate-pair libraries using the Platanus assembler. Three BAC libraries were created using high molecular weight DNA from a specific genotype of the Milvus variety (Milvus B). The mapping population used in this work consisted of 188 genotypes of F1 progeny from a cross between a genotype of the variety Milvus and a genotype of the variety Britta. 1,031 of the 1,388 markers were aligned from the two maps to place 532 of the longest scaffolds and used the BAC-end sequences as markers to further link unplaced scaffolds with already placed scaffolds from the same physical contig. The physical map contained 29,730 BACs, of which almost 23,000 were in contigs (77.3%).


The genome was annotated by De Vega, JJ, et al.. Repeats were masked using RepeatMasker with the RepBase database, LTRharvest38. Transcripts were annotated using RNA-seq data with ab initio predictions and homologous transcripts from soybean, common bean, and M. truncatula.

Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 91429549 - Repeats content: 30%


  1. Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement.
    De Vega JJ, Ayling S, Hegarty M, Kudrna D, Goicoechea JL, Ergon , Rognli OA, Jones C, Swain M, Geurts R et al. 2015. Sci Rep. 5:17394.

Picture credit: By Masaki Ikeda - Own work, CC BY-SA 3.0

More information

AssemblyTrpr, INSDC Assembly GCA_900079335.1, May 2016
Database version109.1
Golden Path Length304,842,038
Genebuild byEarlham
Genebuild methodImport
Data sourceEarlham Institute

Gene counts

Coding genes39,948
Non coding genes1,183
Small non coding genes1,172
Long non coding genes11
Gene transcripts42,485