Aquilegia coerulea (Colorado blue columbine)
About the genome:
A central goal of biology is to understand the natural genetic variation that is responsible for environmental adaptations, leading to species and higher-order taxa. In order to understand the key features of angiosperm (flowering plant) evolution, we need genomic resources for model organisms from lineages reaching far back toward the base of the evolutionary tree. Aquilegia is a member of the basal-most eudicot clade (Ranunculales) and thus is positioned nearly equidistant between the current model systems Arabidopsis and rice. The genus has been used in numerous ecological and evolutionary studies, including speciation due to pollinator shifts, specialization for soil type, mating system evolution, floral development, and adaptive radiation (adaptation of different forms of organisms to different living conditions). The genus is especially well known for its diversity of floral form associated with different pollinators. In addition, nearly all species can be crossed to produce fertile hybrids, making the genetic dissection of this vast diversity possible.
Having the genome sequence for a species representing such a crucial evolutionary node in angiosperm evolution will greatly enhance our understanding of how plant genomes evolve. Furthermore, this sequence will lead to a much deeper understanding of the evolution of morphological, physiological, reproductive, and biochemical innovations found among angiosperms. In addition, Aquilegia offers the opportunity to understand how plants adapt to both biotic and abiotic factors in the environment. Such studies will be especially important for understanding how plants adapt, at the molecular level, to a changing environment such as that resulting from global climate change. For example, A. formosa has a range from Alaska to southern California and spans elevations from sea level to over 10,000 feet, allowing opportunities to study adaptation to different light, temperature, and rainfall extremes. New species of Aquilegia have colonized these different habitats following the major climate changes occurring after the last ice age.
(from JGI - The Joint Genome Institute).
Transcript assemblies were constructed using PASA from ~6 million 454 ESTs sequenced from A. coerulea Goldsmith at JGI and ~115,000 ESTs from related Aquilegia species (the ~85,000 Aquilegia formosa X pubescens cDNA library developed by Scott Hodges and ~30,000 Sanger sequences of A. formosa sequenced at JGI) against the 8X unmapped release of the Aquilegia coerulea Goldsmith genome. Loci were determined by BLAT alignments of above transcript assemblies and/or BLASTX alignments of proteins from arabi (Arabidopsis thaliana), rice, soybean, and grape genomes to the A. coerulea genome, following genome soft-masking of consensus repeat families predicted de novo by RepeatModeler. Gene models were predicated by homology-based predictors, mainly by FGENESH+ with the addition of GenomeScan if FGENESH+ produced no model at the locus. Predicted genes were UTR-extended and/or improved by PASA. Final gene set was made from gene selection based on ESTs support or protein homology support subjected to filtering of repeats/transposable elements.
This is a release of the initial 8X unmapped Aquilegia coerulea Goldsmith genome assembly and the version 1.1 annotation.
- The main genome assembly is approximately 302Mb arranged in 971 scaffolds
- Approximately 293.1Mb arranged in 7193 contigs (~ 2.9% gap)
- Scaffold N50 (L50) = 22 (4.2 Mb)
- Contig N50 (L50) = 713 (121.8 Kb)
- 156 scaffolds are > 50kb in size, representing approximately 97.8% of the genome
- 24,823 loci containing protein-coding transcripts
- 41,063 protein-coding transcripts
- The increase to 16,240 alternatively spliced transcripts is attributed to the large increase in input EST sequences compared to the v1.0 annotation
For the v1.1 annotation release, accession IDs and transcript names have been updated to better reflect gene locus location in the current v1.0 assembly. For example, accession ID Aquca_002_00131.1 is the primary transcript of gene locus Aquca_002_00131, and is the 131st consecutive locus numbered from left to right on reference sequence scaffold_2.
Where possible, v1.0 accession IDs in the form of AcoGoldSmith_v1.011582m have been mapped forward as follows: Version 1.0 annotated genes were aligned to the predicted Version 1.1 genes by BLAT with default parameters. Version 1.1 sequences with CDS to CDS BLAT results at >90% identity and >80% coverage and gene locus to locus BLAT results at >90% identity and >90% coverage were selected for further analysis. Reciprocal BLAT of this dataset against the set of all Version 1.0 annotated genes was performed to ensure that only mutual best hits were considered for annotation mapping. Version 1.0 accession IDs that matched this criteria were mapped onto corresponding Version 1.1 sequences. This process was iterated a total of 4 times, removing mapped mutual best hits at each step, in an attempt to better map alternative spliceforms. Out of 27,583 v1.0 models, 15,118 accession IDs (nearly 55%) were assigned to a v1.1 model, according to our quite stringent criteria (CDS-to-CDS BLAT identity >90%, coverage >80%, and locus-to-locus BLAT >90% identity, >90% coverage). A comprehensive list of these old to new accession ID mappings is available here in the A. coerulea FTP site as a file named Acoerulea_195_synonym.txt.
Sequence use restrictions
As a public service, the completed Aquilegia coerulea Goldsmith genome sequence is being made available by the Department of Energy's Joint Genome Institute (JGI) before scientific publication according to the Ft. Lauderdale Accord. This balances the imperative of the DOE and the JGI that the data from its sequencing projects be made available as soon and as completely as possible with the desire of contributing scientists and the JGI to reserve a reasonable period of time to publish on the genome sequencing and analysis without concerns about preemption by other groups.
JGI policy is that early release should aid the progress of science. By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by JGI and/or its collaborators of a comprehensive genome analysis ("Reserved Analyses"). "Reserved analyses" include the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, or any other genome feature, and whole-genome- or chromosome- scale comparisons with other species.
The embargo on publication of Reserved Analyses by researchers outside of the Aquilegia Genome Sequencing Project is expected to extend until the publication of the results of the sequencing project is accepted. Scientific users are free to publish papers dealing with specific genes or small sets of genes using the sequence data. If these data are used for publication, the following acknowledgment should be included: 'These sequence data were produced by the US Department of Energy Joint Genome Institute'.This letter has been circulated to Journal Editors so that they are aware of the conditions of access and publication detailed above.
These data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. The assembly and sequence data should not be redistributed or repackaged without permission from the JGI. Any redistribution of the data during the embargo period should carry this notice: "The Joint Genome Institute provides these data in good faith, but makes no warranty, expressed or implied, nor assumes any legal liability or responsibility for any purpose for which the data are used. Once the sequence is moved to unreserved status, the data will be freely available for any subsequent use."
We prefer that potential users of this sequence assembly contact the Aquilegia Genome Sequencing Project co-directors
Scott Hodges: hodges(at)lifesci.ucsb.edu (University of California, Santa Barbara)with their plans to ensure that proposed usage of sequence data are not Reserved Analyses.
Justin O. Borevitz: borevitz(at)uchicago.edu (University of Chicago)
Elena Kramer: ekramer(at)oeb.harvard.edu (Harvard University)
Magnus Nordborg: magnus.nordborg(at)gmi.oeaw.ac.at (Gregor Mendel Institute of Molecular Plant Biology)