Lupinus angustifolius Assembly and Gene Annotation
About Lupinus angustifolius
Lupins are grain legumes that form an integral part of sustainable farming systems and have been an important part of the human diet for thousands of years. Planted in rotation with cereal crops, lupins reduce the need for nitrogenous fertiliser, provide valuable disease breaks and boost cereal yields. Lupins thrive on low-nutrient soils due to their ability to fix atmospheric nitrogen in symbiosis with beneficial bacteria and efficiently take up phosphorus from soils. Consequently, they are effective ecological pioneers and able to colonise extremely impoverished soils such as coastal sand dunes and new lava soils set down by recently erupted volcanoes. Narrow-leafed lupin (Lupinus angustifolius) is gaining popularity as a health food, which is high in protein and dietary fibre but low in starch and gluten-free.
The lupin genome assembly was produced by the Commonwealth Scientific and Industrial Research Organisation. The haploid genome size for NLL was previously estimated by flow cytometry to be 924 Mb. K-mer-based estimation of genome size predicted a similar value of 951 Mb. Initial assembly of the Tanjil genome using only paired-end Illumina data produced 191,701 scaffolds in 521 Mb, with an N50 of 10,137 and N50 length of 13.8 kb. The assembly was improved via scaffolding with additional paired-end, mate-pairs and BAC-end data totalling an average coverage of 162.8x. This resulted in a contig assembly with 1,068,669 contigs, totalling 810 Mb or 85% of the k-mer-based estimated genome size. The final scaffold assembly after removing scaffolds less than 200 bp comprised 14,379 scaffolds totalling 609 Mb with a contig N50 length of 45,646 bp and scaffold N50 of 232 and scaffold N50 length of 703 kb.
A total of 33,074 protein-coding genes were annotated by the Commonwealth Scientific and Industrial Research Organisation, after combining evidence from transcriptome alignments derived from five different tissue types (leaf, stem, root, flower and seed), protein homology, and in silico gene prediction. Additionally, peptide data from proteomics analysis of leaf, seed, stem and root samples were mapped to both the translated gene annotations and the 6-frame translation of the whole-genome assembly. Proteogenomic comparison of peptide-mapping versus gene annotation supported between 94 and 1134 annotations per tissue type, and provided valuable information on tissue localisation for the products of these genes. InterPro terms were the most informative functional annotation.
Repeated sequences were called with the Repeat Detector, which is part of the Ensembl Genomes repeat feature pipelines. Repeats length: 259903442 - Repeats content: 42.6%
- A comprehensive draft genome sequence for lupin (Lupinus
angustifolius), an emerging health food: insights into plantmicrobe
interactions and legume
James K. Hane, Yao Ming, Lars G. Kamphuis, Matthew N. Nelson, Gagan Garg, Craig A. Atkins, Philipp E. Bayer, Armando Bravo, Scott Bringans, Steven Cannon et al. 2016. Plant Biotechnology Journal. 15:318-330.
General information about this species can be found in Wikipedia.
|Assembly||LupAngTanjil_v1.0, INSDC Assembly GCA_001865875.1, Nov 2016|
|Golden Path Length||609,203,021|
|Data source||Commonwealth Scientific and Industrial Research Organisation|
|Non coding genes||2,096|
|Small non coding genes||2,053|
|Long non coding genes||43|