Linum usitatissimum (Flax)

About the genome:


This release of phytozome includes the v1.0 annotation of Linum usitatissimum produced by BGI, graciously provided by Mike Deyholos from University of Alberta (

Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94x raw, approximately 69x filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N50 = 694 kb, including contigs with N50 = 20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43,384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (Ks) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. (Wang et al., Plant J. 2012, 72:461-473)

Important note on data sources

The v1.0 annotation is on assembly v1.0

Other tracks on the Phytozome flax browser are:

  1. Linum usitatissimum ESTs from Genbank, aligned using the Program to Assembly Spliced Alignments (PASA, Haas et al.)
  2. Alignment of proteins from other plants in the Phytozome database (including poplar, soybean medicago, common bean, other related Eurosids), using BLATX (Kent)


The main genome assembly is approximately 318.3 Mb arranged in 88,420 scaffolds
Approximately 302.2 Mb arranged in 11,0390 contigs (~ 5.0 % gap)
Scaffold N50 (L50) = 132 (693.5 kb)
Contig N50 (L50) = 3,593 (24.9 kb)
664 scaffolds are > 50kb in size, representing approximately 91.3% of the genome
26,374 total loci containing protein-coding transcripts
Alternative Transcripts
4,347 total alternatively spliced transcripts
43,471 genes. 43,484 transcripts, where transcripts whose CDS overlap on the same strand are grouped into genes

  ©2006-2014 University of California Regents. All rights reserved  
Information on Accessibility/Section508