Volvox carteri (Volvox)
About the genome:
Volvox carteri is a multicellular green alga, closely related to the single celled Chlamydomonas reinhardtii. Both algae belong to the chlorophytes, a group of highly adaptable species that live in many different environments throughout the world.
This is a release of the version 2, 8x Volvox carteri genome assembly and annotation. The genome was sequenced from with high quality genomic DNA prepared from a vegetative culture of female Volvox carteri f. nagariensis, Eve using a whole-genome shotgun (WGS) strategy which generated paired-end reads from libraries with insert sizes of 2-3 kb; 6-8 kb and 35-40 kb, with 30 additional plates of BES compared to v.1, assembly with the latest version of Arachne and expensive manual screening of organelle and redundant scaffolds.
- The main genome assembly is approximately 131.2 Mb arranged in 434 scaffolds
- Approximately 125.4 Mb arranged in 4,100 contigs (~ 4.4% gap)
- Scaffold N50 (L50) = 15 (2.6 Mb)
- Contig N50 (L50) = 410 (85.2 kb)
- 100 scaffolds are > 50kb in size, representing approximately 98.5% of the genome
- 14,971 total loci containing protein-coding transcripts
- Alternative Transcripts
- 314 total alternatively spliced transcripts
How was the genome sequenced?
- How was the assembly generated?
- The genome was assembled with Arachne by Jeremy Schmutz and Jerry Jenkins at HudsonAlpha.
- How were repeats identified?
- A de novo repeat library was made by running RepeatModeler (Arian Smit, Robert Hubley) on the genome assembly to produce a library of repeat sequences. Sequences with Pfam domains associated with non-TE functions were removed. Hand-curated libraries of Volvox and Chlamydomonas repeats (Vladimir Kapitonov, pers. comm.) were added to the de novo repeats to make a custom repeat library of 597 sequences. This library was then used to mask 22.6% of the genome with RepeatMasker.
- How were ESTs aligned?
- We downloaded 132,038 Volvox carteri EST sequences from the NR database at NCBI on November 28, 2011. The vast majority were generated by the JGI as part of the Volvox genome project.These were assembled into 13,344 transcripts with PASA (Brian Haas).
- How were plant proteins aligned?
- Chlamydomonas and v.1 Volvox proteins were obtained from previous JGI annotations. All proteins were aligned to the soft-masked genome using gapped BLASTX; high-scoring sequence pairs (HSPs) are shown in the gbrowser. Note that gapped BLAST was used to increase sensitivity, so that in many cases the HSP (shown in orange in Gbrowse) spans adjacent exons and the intervening intron(s). Also, small exons are often missed.
How did you determine the Volvox carteri gene set?
- Gene prediction
- To produce the Volvox v2.0 gene set, we we used the homology-based gene prediction program FgenesH which integrates EST evidence into the gene predictions. The best gene prediction at each locus is picked and integrated with RNA-Seq transcript assemblies using the PASA program (see above). The gene set shown on the browser was generated from the above input gene models by Simon Prochnik at JGI. The gene prediction pipeline has the following components: proteins from diverse angiosperms and PASA EST assemblies were aligned to the genome (proteins aligned to genome by BLASTX, then BLASTX-aligned proteins are further aligned by EXONERATE to genome), and their overlaps used to define putative protein-coding gene loci. The corresponding genomic regions were extended by up to 2kb in each direction and submitted to GenomeScan and/or FgenesH (provided by Asaf Salamov at JGI), along with related angiosperm proteins and/or ORFs from the overlapping EST assemblies. GenomeScan and Fgenesh identify likely protein coding exons, favoring regions that align well to the given homologous proteins. These predictions were integrated with expressed sequence information using PASA (Haas et al. 2003) against the PASA EST assemblies. The results were filtered to remove genes identified as transposon-related.
- How come my gene is wrong?
- GenomeScan and FgenesH are good gene predictors, but like all computational gene modeling algorithms, are imperfect. In addition, EST and cDNA data are often incomplete. For these reasons, there can be errors in gene models. We hope that the inconvenience of having an imperfect gene set is partially compensated by the rapid release of the data. Future gene sets will improve as assembly quality improves along with expressed sequence data and genomic data from related species. But the lesson from the annotation of other well-curated genomes like Arabidopsis and rice is that it can take years to fine tune a gene set even given a high quality genome assembly.
What can I do with the Volvox carteri dataset?
- I would like to use this data to help clone a gene, analyse a gene family, etc.
- Wonderful! Please feel free to use this data to advance your studies of Volvox. Please cite the genome paper: Prochnik, S. E., Umen, J., Nedelcu, A. M., Hallmann, A., Miller, S. M., Nishii, I., Ferris, P., et al. (2010). Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science, 329(5988), 223-226. doi:10.1126/science.1188800
- I think I found an error. What should I do?
- If you would like to bring any items to our attention, please send email to email@example.com.
- I would like to do a large-scale comparison of Volvox carteri to other genomes, and/or a global analysis of its gene content.
- The data in this release is freely available. Please cite the genome paper: Prochnik, S. E., Umen, J., Nedelcu, A. M., Hallmann, A., Miller, S. M., Nishii, I., Ferris, P., et al. (2010). Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science, 329(5988), 223-226. doi:10.1126/science.1188800
- and note that you downloaded the v.2 data from www.phytozome.net/volvox
The corresponding author of the Volvox genome project is
James Umen (PI, Donald Danforth Center) jumen at danforthcenter dot orgA Volvocales genome project is being coordinated by
Brad Olson (Kansas State University) bjsco at k-state dot edu