Our long term goal is to understand how internal and external perturbations affect processes and networks controlling plant growth and development. In this project, we start with data integration of the known relationships among genes, proteins and molecules (extracted from public databases and/or generated with predictive algorithms) as well as experimental measurements under many different treatments. We go beyond data integration to conceptual integration by using novel visualization techniques to render the multivariate information in visual formats that facilitate extraction of biological concepts. We also use mathematical and statistical methods to help summarize the data. We implement and combine these approaches in a system we term “VirtualPlant”. Whereas our project relates specifically to Arabidopsis, the data structures, algorithms, and visualization tools are designed in a species-independent way. Thus the informatic, math, statistic and visualization tools that we develop can be used to model the cellular and physiological responses of any organism for which genomic data is available.
We have implemented a proto-type that is already being actively and effectively used (http://www.virtualplant.org). This tool is being used by biologists and computer scientist alike for the purpose it was designed for – to support the analysis of original genomic data generated by the researchers themselves. We have found that working with experimental biologists, even from very early stages of software development, to be the most effective way to generate real solutions to the problems encountered by researchers in the laboratory.
Biomaps: Find biological themes (Based on MIPS funcats or GO terms) in gene lists. This program analyses the distribution of functional assignments (Gene Ontology or MIPS) for one or more lists of genes. It reports back those terms that are found over-represented in the list(s) provided, as compared to the frequency of the term in a background population (e.g. the whole genome). A graphical and tabular output is given to facilitate analysis and interpretation of the results.
The BigPlant project uses a phylogenomic approach to produce a high resolution phylogeny of the seed plants. The current seed plant matrix comprises 150 species (including 5 outgroup taxa) and over 10 million aminoacid characters from more than 22,000 genes. In its next iteration the BigPlant project will expand to include over 200 species with particular attention paid to filling in the less-represented clades in the current phylogeny. A collaborative effort across 4 premium institutes (http://nypg.bio.nyu.edu/main/) is currently developing the molecular resources to fulfill this important goal.
Sungear is a generalized Venn Diagram. You can use this tool to compare an arbitrary number of lists or gene sets.
Sungear runs as an applet and can be started from the GeneCart. Sungear can be used to learn the biological significance of gene lists or intersections between gene lists. We have integrated Gene Names and Gene Ontology information to rapidly evaluate the significance of any intersection or selection made within Sungear. You can also hypothesize major trends in your data by using Sungear. The software can suggest biased GO terms (suggest over-representation) as well as identify the most “peculiar” intersections or gene sets based on the distribution of all GO terms associated with it.
Please note: Sungear requires Java 1.4.2. You can download either the J2SE SDK (full install) or the J2SE JRE (runtime only), both of which can be found here. Be sure not to get the J2EE (Java 2 Enterprise Edition, as opposed to the Java 2 Standard Edition).
If the data set doesn’t load automatically:
Choose “File->Load->development” and click open to look at an example dataset generated from the AtGenExpress developmental time courses. This example shows genes enriched at the indicated developmental stages.
This web-based tool is initiated by the Plant Genomics Consortium, and is designed to facilitate the identification of orthologous gene regions within a character-based phylogenetic framework. OrthologID will use a submitted sequence to query a local database to find all putative orthologs within the complete genomes of Oryza and Arabidopsis. Secondly, OrthologID will generate a gene tree (here referred to as a “guide tree”) of all putative orthologous gene regions from the complete genomes (as additional complete genomes become available, they will be added to the local database). It will assemble the matrix, remove redundant sequences, align sequences, perform tree searches using the parsimony ratchet and compute a strict consensus when multiple equally parsimonious trees are recovered. This guide tree will be probed for the presence of characters diagnostic of orthologous groups by passing the tree to the program P-Gnome (Sarkar et al., 2002). The submitted query sequence then will be screened for the presence of shared diagnostic characters using the program P-Elf (Sarkar et al., 2002). Finally, OrthologID will compare the output from P-Elf to the guide tree diagnostics and display the results as a tree with the query sequence appended to its ortholog.