Setaria italica (Foxtail millet)
About the genome:
Foxtail millet (Setaria italica) is a diploid grass with a relatively small genome (~515 Mb). It is an important grain crop in temperate, subtropical, and tropical Asia and in parts of southern Europe, and is grown for forage in North America, South America, Australia, and North Africa. The genetic map of foxtail millet is highly colinear with that of rice, despite the fact that these lineages last shared a common ancestor more than 50 million years ago. Hence, comparison of the rice and foxtail millet genomes will facilitate reconstruction of the ancestral grass genome. Most important, foxtail millet is a close relative of an important biofuel crop, switchgrass (Panicum virgatum). It is also closely related to pearl millet (Pennisetum glaucum), which is under investigation as a biofuel grain feedstock in regions unsuitable for maize cultivation, and napiergrass (Pennisetum purpureum), a grass with biofuel potential in hot/humid regions such as the southeastern United States. Switchgrass is a polyploid species with a large genome that will not be an easy target for full genome sequence analysis. However, switchgrass and foxtail millet are both temperate, C4 grass species (C3 and C4 represent different metabolic approaches to CO2 metabolism in plants), so foxtail millet should share many genetic and physiological processes with switchgrass. Hence, foxtail millet should serve as an excellent surrogate genome to assist future study and improvement of switchgrass and related biofuel crops.
Principal Investigators: J.L. Bennetzen (Maize@uga.edu), K.M. Devos (email@example.com), A.N. Doust, E.A. Kellogg, D. Ware, and J. Zale
(from JGI - The Joint Genome Institute).
The current annotation is version 2.1. Transcript assemblies were constructed using PASA from ~1.28 million Setaria italica EST reads sequenced at JGI against the 8.3X version 2.0 release of the Setaria italica genome. Loci were determined by BLAT alignments of above transcript assemblies and/or BLASTX alignments of proteins from sorghum, rice, Arabidopsis thaliana, and grapevine genomes to the S. italica genome, following genome soft-masking of consensus repeat families provided by Hao Wang and Jeff Bennetzen. Gene models were predicated by homology-based predictors FGENESH+ and GenomeScan. The best prediction at each locus was selected based on protein coverage and homology, as well as intron/exon junctions and EST coverage. These transcripts were UTR-extended and/or improved by PASA to match the EST evidence. The final gene set selection is based on ESTs support or protein homology support subject to filtering of repeats/transposable elements.
StatisticsThis is the chromosome-scale release of the 8.3x whole genome shotgun assembly of Setaria italica. The first 9 scaffolds are pseudomolecules on which over 98.9% of the sequence data was able to be placed. The mapping data for Setaria italica chromosomes was generated by Katrien Devos and Xuewen Wang. The telomeric signature for foxtail millet is "AAACCCT". These chromosomes have telomeric signatures: 1Q,2Q,3P,4P,4Q,5P,6P,7Q,8Q,9P.
- The main genome assembly is approximately 405.7 Mb arranged in 336 scaffolds
- Approximately 400.9 Mb are arranged in 6791 contigs (~ 1.2% gap)
- Scaffold N50 (L50) = 4 (47.3 Mb)
- Contig N50 (L50) = 982 (126.3 Kb)
- 98.9% of the sequence data is represented in the 9 pseudomolecules
- 35,471 loci containing protein-coding transcripts
- 40,599 protein-coding transcripts
Sequence use restrictions
As a public service, the completed Setaria italica genome sequence is being made available by the Department of Energy's Joint Genome Institute (JGI) before scientific publication according to the Ft. Lauderdale Accord. This balances the imperative of the DOE and the JGI that the data from its sequencing projects be made available as soon and as completely as possible with the desire of contributing scientists and the JGI to reserve a reasonable period of time to publish on the genome sequencing and analysis without concerns about preemption by other groups.
JGI policy is that early release should aid the progress of science. By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by JGI and/or its collaborators of a comprehensive genome analysis ("Reserved Analyses"). "Reserved analyses" include the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, or any other genome feature, and whole-genome- or chromosome- scale comparisons with other species.
The embargo on publication of Reserved Analyses by researchers outside of the Setaria italica Genome Sequencing Project is expected to extend until the publication of the results of the sequencing project is accepted. Scientific users are free to publish papers dealing with specific genes or small sets of genes using the sequence data. If these data are used for publication, the following acknowledgment should be included: 'These sequence data were produced by the US Department of Energy Joint Genome Institute'.This letter has been circulated to Journal Editors so that they are aware of the conditions of access and publication detailed above.
These data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. The assembly and sequence data should not be redistributed or repackaged without permission from the JGI. Any redistribution of the data during the embargo period should carry this notice: "The Joint Genome Institute provides these data in good faith, but makes no warranty, expressed or implied, nor assumes any legal liability or responsibility for any purpose for which the data are used. Once the sequence is moved to unreserved status, the data will be freely available for any subsequent use."