10 Feb 2010

GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains

GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.


(this Post content was reproduced from: http://nar.oxfordjournals.org/cgi/content/short/38/3/720?rss=1, Via NAR.)

Rapid interactome profiling by massive sequencing

We have developed a high-throughput protein expression and interaction analysis platform that combines cDNA phage display library selection and massive gene sequencing using the 454 platform. A phage display library of open reading frame (ORF) fragments was created from mRNA derived from different tissues. This was used to study the interaction network of the enzyme transglutaminase 2 (TG2), a multifunctional enzyme involved in the regulation of cell growth, differentiation and apoptosis, associated with many different pathologies. After two rounds of panning with TG2 we assayed the frequency of ORFs within the selected phage population using 454 sequencing. Ranking and analysis of more than 120 000 sequences allowed us to identify several potential interactors, which were subsequently confirmed in functional assays. Within the identified clones, three had been previously described as interacting proteins (fibronectin, SMOC1 and GSTO2), while all the others were new. When compared with standard systems, such as microtiter enzyme-linked immunosorbant assay, the method described here is dramatically faster and yields far more information about the interaction under study, allowing better characterization of complex systems. For example, in the case of fibronectin, it was possible to identify the specific domains involved in the interaction.


(this Post content was reproduced from: http://nar.oxfordjournals.org/cgi/content/short/gkq052v1?rss=1, Via NAR - Advance Access.)

Target-enrichment strategies for next-generation sequencing

A nice overview (Welcome Trust Sanger = U.Washington) of the currently available techniques for sequence capture. Highlights the strengths and weaknesses of each method and includes pricing factor.

One to read...


Target-enrichment strategies for next-generation sequencing


Nature Methods 7, 111 (2010). doi:10.1038/nmeth.1419


Authors: Lira Mamanova, Alison J Coffey, Carol E Scott, Iwanka Kozarewa, Emily H Turner, Akash Kumar, Eleanor Howard, Jay Shendure & Daniel J Turner



(this Post content was reproduced from: http://feeds.nature.com/~r/nmeth/rss/current/~3/gv2f32bBO-E/nmeth.1419, Via Nature Methods current.)

7 Feb 2010

Vascular Endothelial Growth Factor--A Positive and Negative Regulator of Tumor Growth

Over the past decade, the well-documented role of vascular endothelial growth factor (VEGF) in tumor angiogenesis has led it to become one of the leading therapeutic targets for the treatment of cancer. Emerging evidence from genetically modified animal models, however, suggests that elevated levels of VEGF, or a proangiogenic phenotype, may impede, rather than promote, early tumor development and progression. For example, hypermorph VEGF transgenic mice display delayed progression of a retroviral-induced murine leukemia, and knockdown of VEGF expression within the myeloid compartment accelerates tumor progression. Several mechanisms have been proposed to explain this paradox, whereby VEGF induces changes within the hematopoietic compartment and tumor microenvironment through recruitment of tumor inhibitory monocytic cells and the negative regulation of tumor angiogenesis. Thus, it is apparent that the levels of VEGF expression in both tumor and nontumor tissues, as well as the context and timing of its modulation relative to cancer induction, play an important role in determining the effects of VEGF expression on tumorigenicity. In light of these recent findings, the various mechanisms underlying the negative role of VEGF during early tumor development, progression, and metastasis will be discussed. Cancer Res; 70(3); 863–7


(this Post content was reproduced from: http://cancerres.aacrjournals.org/cgi/content/short/70/3/863?rss=1, Via Cancer Research recent issues.)

30 Jan 2010

Splicing factor and exon profiling across human tissues

It has been shown that alternative splicing is especially prevalent in brain and testis when compared to other tissues. To test whether there is a specific propensity of these tissues to generate splicing variants, we used a single source of high-density microarray data to perform both splicing factor and exon expression profiling across 11 normal human tissues. Paired comparisons between tissues and an original exon-based statistical group analysis demonstrated after extensive RT-PCR validation that the cerebellum, testis, and spleen had the largest proportion of differentially expressed alternative exons. Variations at the exon level correlated with a larger number of splicing factors being expressed at a high level in the cerebellum, testis and spleen than in other tissues. However, this splicing factor expression profile was similar to a more global gene expression pattern as a larger number of genes had a high expression level in the cerebellum, testis and spleen. In addition to providing a unique resource on expression profiling of alternative splicing variants and splicing factors across human tissues, this study demonstrates that the higher prevalence of alternative splicing in a subset of tissues originates from the larger number of genes, including splicing factors, being expressed than in other tissues.


(this Post content was reproduced from: http://nar.oxfordjournals.org/cgi/content/short/gkq008v1?rss=1, Via NAR - Advance Access.)

19 Jan 2010

The importance of being negative

A nice Highlight, written by Allison Doerr, about the recent publication in NAR (http://www.ncbi.nlm.nih.gov/pubmed/19920129)

The database can be accessed at: http://mips.helmholtz-muenchen.de/proj/ppi/negatome/


The importance of being negative


Nature Methods 7, 10 (2010). doi:10.1038/nmeth0110-10b


Author: Allison Doerr


The Negatome is a database of non-interacting protein pairs that can be used for training protein-protein interaction prediction algorithms.


... Who cares about negative results? It's fairly safe to say that most researchers would not try to publish a paper that focused on what they did not find, and that even if they did try, they would be hard-pressed to find a journal that would agree to publish it. However, that is not to say that negative results do not have scientific value—in fact, they can be quite useful.

(this Post content was reproduced from: http://feeds.nature.com/~r/nmeth/rss/current/~3/8NV5IIK5uRo/nmeth0110-10b, Via Nature Methods current.)

16 Jan 2010

SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building

We present SeaView version 4, a multiplatform program designed to facilitate multiple alignment and phylogenetic tree building from molecular sequence data through the use of a graphical user interface. SeaView version 4 combines all the functions of the widely used programs SeaView (in its previous versions) and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. In relation to the wide present offer of tools and algorithms for phylogenetic analyses, SeaView is especially useful for teaching and for occasional users of such software. SeaView is freely available at http://pbil.univ-lyon1.fr/software/seaview.


(this Post content was reproduced from: http://mbe.oxfordjournals.org/cgi/content/short/27/2/221?rss=1, Via Molecular Biology and Evolution.)

Gene prioritization and clustering by multi-view text mining

Background:
Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.
Results:
We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are constructed on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-viewapproach demonstrates significantly better performance than othercomparing methods.
Conclusions:
In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

(this Post content was reproduced from: http://www.biomedcentral.com/1471-2105/11/28, Via BMC Bioinformatics - Latest articles.)

miRMaid: a unified programming interface for microRNA data resources

Background:
MicroRNAs (miRNAs) are endogenous small RNAs that play a key role in post-transcriptional regulation of gene expression in animals and plants. The number of known miRNAs has increased rapidly over the years. The current release (version 14.0) of miRBase, the central online repository for miRNA annotation, comprises over 10.000 miRNA precursors from 115 different species. Furthermore, a large number of decentralized online resources are now available, each contributing with important miRNA annotation and information.
Results:
We have developed a software framework, designated here as miRMaid, with the goal of integrating miRNA data resources in a uniform web service interface that can be accessed and queried by researchers and, most importantly, by computers. miRMaid is built around data from miRBase and is designed to follow the official miRBase data releases. It exposes miRBase data as inter-connected web services. Third-party miRNA data resources can be modularly integrated as miRMaid plugins or they can loosely couple with miRMaid as individual entities in the World Wide Web. miRMaid is available as a public web service but is also easily installed as a local application. The software framework is freely available under the LGPL open source license for academic and commercial use.
Conclusion:
miRMaid is an intuitive and modular software platform designed to unify miRBase and independent miRNA data resources. It enables miRNA researchers to computationally address complex questions involving the multitude of miRNA data resources. Furthermore, miRMaid constitutes a basic framework for further programming in which microRNA-interested bioinformaticians can readily develop their own tools and data sources.

(this Post content was reproduced from: http://www.biomedcentral.com/1471-2105/11/29, Via BMC Bioinformatics - Latest articles.)

mimiRNA: a microRNA expression profiler and classification resource designed to identify functional correlations between microRNAs and their targets

Motivation: microRNAs (miRNAs) are short non-coding RNAs that regulate gene expression by inhibiting target mRNA genes. Their tissue- and disease-specific expression patterns have immense therapeutic and diagnostic potential. To understand these patterns, a reliable compilation of miRNA and mRNA expression data is required to compare multiple tissue types. Moreover, with the appropriate statistical tools, such a resource could be interrogated to discover functionally related miRNA–mRNA pairs.


Results:We have developed mimiRNA, an online resource that integrates expression data from 1483 samples and permits visualization of the expression of 635 human miRNAs across 188 different tissues or cell types. mimiRNA incorporates a novel sample classification algorithm, ExParser, that groups identical miRNA or mRNA experiments from separate sources. This enables mimiRNA to provide reliable expression profiles and to discover functional relations between miRNAs and mRNAs such as miRNA targets. Additionally, mimiRNA incorporates a decision tree algorithm to discover distinguishing miRNA features between two tissue or cell types. We validate the efficacy of our resource on independent experimental data and through biologically relevant analyses.


Availability: http://mimirna.centenary.org.au


Contact: j.rasko@centenary.org.au


Supplementary information: Supplementary data are available at Bioinformatics online.


(this Post content was reproduced from: http://bioinformatics.oxfordjournals.org/cgi/content/short/26/2/223?rss=1, Via Bioinformatics - current issue.)