25 Oct 2009

The IntAct molecular interaction database in 2010

IntAct is an open-source, open data molecular interaction database and toolkit. Data is abstracted from the literature or from direct data depositions by expert curators following a deep annotation model providing a high level of detail. As of September 2009, IntAct contains over 200.000 curated binary interaction evidences. In response to the growing data volume and user requests, IntAct now provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows ‘zooming in’ on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.
(source URL, Via NAR - Advance Access.)

miRGen 2.0: a database of microRNA genomic information and regulation

MicroRNAs are small, non-protein coding RNA molecules known to regulate the expression of genes by binding to the 3'UTR region of mRNAs. MicroRNAs are produced from longer transcripts which can code for more than one mature miRNAs. miRGen 2.0 is a database that aims to provide comprehensive information about the position of human and mouse microRNA coding transcripts and their regulation by transcription factors, including a unique compilation of both predicted and experimentally supported data. Expression profiles of microRNAs in several tissues and cell lines, single nucleotide polymorphism locations, microRNA target prediction on protein coding genes and mapping of miRNA targets of co-regulated miRNAs on biological pathways are also integrated into the database and user interface. The miRGen database will be continuously maintained and freely available at http://diana.cslab.ece.ntua.gr/miRGen/.
(source URL, Via NAR - Advance Access.)

SALAD database: a motif-based database of protein annotations for plant comparative genomics

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
(source URL, Via NAR - Advance Access.)

18 Oct 2009

Metastamir: The Field of Metastasis-Regulatory microRNA Is Spreading

Despite advancements in knowledge from more than a century of metastasis research, the genetic programs and molecular mechanisms required for cancer metastasis are still incompletely understood. Genes that specifically regulate the process of metastasis are useful tools to elucidate molecular mechanisms and may become markers and/or targets for antimetastatic therapy. Recently, several noncoding regulatory RNA genes, microRNA (miRNA), were identified, which play roles in various steps of metastasis, some without obvious roles in tumorigenesis. Understanding how these metastasis-associated miRNA, which we term metastamir, are involved in metastasis will help identify possible biomarkers or targets for the most lethal attribute of cancer: metastasis. [Cancer Res 2009;69(19):7495–8]
(source URL, Via Cancer Research recent issues.)

20 Sep 2009

An improved Huffman coding method for archiving text, images, and music characters in DNA

An improved Huffman coding method for archiving text, images, and music characters in DNA

What about storing all your experimental results into BACs or YACs. Scientists moving from lab to lab with a few eppendorfs instead of heavy paper archives?

:-)

(BioTechniques®)

5 Sep 2009

HITS-CLIP Unravels microRNA-mRNA Interactions

Micro-RNAs (miRNAs) are short (18-26 nt) sequences that act as post-transcriptional repressors of gene expression.  Over 700 miRNAs have been reported in the human genome; each is believed to bind directly to many mRNAs to regulate their translation or stability.  Thus, miRNAs represent a key regulatory mechanism affecting numerous cellular activities, and are of particular interest in cancer research.  Understanding the complex relationships between miRNAs and mRNAs remains challenging, however, and computational approaches alone have been largely unsuccessful.


HITS-CLIP: Isolation and Sequencing of Argonaute-miRNA-mRNA Complexes


Enter HITS-CLIP, a new approach that applies high throughput sequencing of RNAs isolated by crosslinking immunoprecipitation.  Essentially, it’s a method by which radition is used to cross-link protein-RNA complexes and stringently purify them.  Then, massively parallel sequencing yields all of the RNA “tags” bound by the protein of interest.


Ago-miRNA-mRNA Complex (Image Credit: Nature)

Ago-miRNA-mRNA Complex (Image Credit: Nature 460: 479-486, 2009)


In a recent Nature paper, Chi et al used HITS-CLIP to isolate RNA bound by the Argonaute protein (Ago), which mediates miRNA-mRNA interaction (see figure).  The purified complexes showed two different modal sizes (110 kDa and 130kDa), suggesting that Ago (97 kDa) was crosslinked to two different RNA species - hopefully, miRNAs (small) and the mRNAs that they were targeting (large).


The authors applied Illumina high-throughput sequencing to characterize Ago-bound miRNAs and the mRNA “tags” to which they were linked.  With relatively straightforward bio-informatics approaches, it was possible to cross-reference expressed miRNAs with complementary sequences of mRNA tags.  The resulting “ternary map” of miRNA-mRNA interaction sites yields a wealth of information about this post-transcriptional regulatory mechanism.


Decoding miRNA-mRNA Interaction


The authors identified 454 unique miRNAs crosslinked to Ago in the mouse brain; mir-30e was the most abundant species, representing 14% of all miRNA tags.  In silico clustering and normalization of the messenger RNA tags yielded 1,463 robust clusters from 829 different brain transcripts.


Locations of Ago-bound mRNA tags (Image Credit: Nature)

Locations of Ago-bound mRNA tags (Image Credit: Nature 460: 479-486, 2009)


When these tags were overlaid with gene annotations, several patterns emerged.  As expected, a substantial portion (40%) of Ago-bound tags were in 3′ UTRs where miRNA activity is known to have high efficacy.  Some 8% (one-fifth of the 40%) were actually outside of the UTR but <10kb downstream, regions likely to harbor unannotated 3′ UTRs.


Unsurprisingly, very few Ago-bound tags were in 5′ UTRs.  However, a substantial fraction of tags fell in coding sequences (25%), introns (12%), and non-coding RNAs (4%), suggesting that miRNA activity occurs in these regions as well.  Another 6% of tags were in intergenic regions, possibly in as-yet-unannotated transcripts.  These unexpected locations of miRNA binding may offer additional insights into the mechanisms of miRNA regulation.


Next, the authors sought to define the Ago-mRNA “footprint” in which the majority of tags were contained.  The distribution of tags in a defined cluster, at least in their figure, looks like a bell curve, with a sharp peak in the middle.  About 95% of the time, Ago bound within 45-62 nucleotides of this peak, so the authors defined this region as the average Ago-miRNA footprint.  Linear regression analysis of all 6-8 base motifs in clusters yielded numerous “enriched” seed sequences; the most prevalent corresponded to the binding site of miR-124, a well known brain-specific miRNA.  Indeed, Ago-mRNA footprints were rich in miRNA binding sites, suggesting that this approach may predict active sites with far better specificity than other methods.


HITS-CLIP Implications


By reducing the search space for miRNA binding sites to a 45-60-nucleotide Ago footprint, HITS-CLIP offers a powerful complementary approach to bioinformatic methods for miRNA binding site prediction.  Computational approaches alone are known to have high false positive rates, whereas the authors estimate FP rates of just 15% for HITS-CLIP.  The new method offers dramatic improvement for transcripts with highly conserved 3′ UTRs, which often have many “predicted” miRNA binding sites because so many computational methods rely on conservation.  Analysis of the HITS-CLIP ternary map revealed that real miRNA-mRNA binding events are very specific, with an average of just 2.6 Ago-mRNA clusters per regulated transcript.  Despite the thousands of predicted binding sites, each miRNA bound an average of 655 targets.  These results suggest that miRNA selectivity is much higher than previously believed.  Yet Ago-mRNA clusters seemed to show no apparent sequence preference (data not shown), so it’s likely that other RNA-binding proteins are involved.


Thus, this study sets the stage for large scale genome-wide RNA-protein maps that include other proteins, tissues, and species, which should yield an unprecedented new level of understanding of this complex regulatory process.


References

Chi SW, Zang JB, Mele A, & Darnell RB (2009). Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature, 460 (7254), 479-86 PMID: 19536157



(source URL, Via MassGenomics.)

Rip exposed: How ectodomain shedding regulates the proteolytic processing of transmembrane substrates [Commentary]

Regulated intramembrane proteolysis (Rip) controls a wide variety of cellular mechanisms such as cholesterol homeostasis, immune surveillance, cellular signaling, and β-amyloid formation in Alzheimer's disease (1). Rip of substrates is mediated by several families of intramembranously cleaving proteases (I-CLiPs), all of which perform the unique chemistry of hydrolysis within the hydrophobic lipid bilayer (2). There are four known families of I-CLiPs, each denoted by the protease that typifies each group: site-2 protease (S2P) metalloproteases, the γ-secretase and signal peptide peptidase (SPP) aspartyl proteases, and the rhomboid serine proteases. Rip cleavage of transmembrane substrates by S2P, SPP, and γ-secretase is preceded and regulated by an initial distinct cleavage in a process termed “ectodomain shedding” (Fig. 1). The reason ectodomain shedding is necessary in most Rip cases is not understood. An article in this issue of PNAS (3) has shed new light on this important question.


(source URL, Via PNAS recent.)

22 Aug 2009

PiSQRD: a web server for decomposing proteins into quasi-rigid dynamical domains

Summary: The PiSQRD web resource can be used to subdivide protein structures in quasi-rigid dynamical domains. The latter are groups of amino acids behaving as approximately-rigid units in the course of protein equilibrium fluctuations. The PiSQRD server takes as input a biomolecular structure and the desired fraction of protein internal fluctuations that must be accounted for by the relative rigidbody motion of the dynamical domains. Next, the lowest-energy modes of fluctuation of the protein (optionally provided by the user) are calculated and used to identify the rigid subunits. The resulting optimal subdivision is returned through a web page containing both interactive graphics and detailed data output.

Availability: The PiSQRD web server, which requires Java, is available free of charge for academic users at the address: http://pisqrd.escience-lab.org.

(source URL, Via Bioinformatics - Advance Access.)
Reblog this post [with Zemanta]

18 Aug 2009

Mining Biological Pathways Using WikiPathways Web Services

WikiPathways is a platform for creating, updating, and sharing biological pathways [1]. Pathways can be edited and downloaded using the wiki-style website. Here we present a SOAP web service that provides programmatic access to WikiPathways that is complementary to the website. We describe the functionality that this web service offers and discuss several use cases in detail. Exposing WikiPathways through a web service opens up new ways of utilizing pathway information and assisting the community curation process.

(source URL)

16 Aug 2009

Mapping the human membrane proteome: a majority of the human membrane proteins can be classified according to function and evolutionary origin

Background:
Membrane proteins form key nodes in mediating the cell's interaction with the surroundings, which is one of the main reasons why the majority of drug targets are membrane proteins.
Results:
Here we mined the human proteome and identified the membrane proteome subset using three prediction tools for alpha-helices: Phobius, TMHMM, and SOSUI. This dataset was reduced to a non-redundant set by aligning it to the human genome and then clustered with our own interactive implementation of the ISODATA algorithm. The genes were classified and each protein group was manually curated, virtually evaluating each sequence of the clusters, applying systematic comparisons with a range of databases and other resources. We identified 6,718 human membrane proteins and classified the majority of them into 234 families of which 151 belong to the three major functional groups: receptors (63 groups, 1,352 members), transporters (89 groups, 817 members) or enzymes (7 groups, 533 members). Also, 74 miscellaneous groups with 697 members were determined. Interestingly, we find that 41% of the membrane proteins are singlets with no apparent affiliation or identity to any human protein family. Our results identify major differences between the human membrane proteome and the ones in unicellular organisms and we also show a strong bias towards certain membrane topologies for different functional classes: 77% of all transporters have more than six helices while 60% of proteins with an enzymatic function and 88% receptors, that are not GPCRs, have only one single membrane spanning alpha-helix. Further, we have identified and characterized new gene families and novel members of existing families.
Conclusion:
Here we present the most detailed roadmap of gene numbers and families to our knowledge, which is an important step towards an overall classification of the entire human proteome. We estimate that 27% of the total human proteome are alpha-helical transmembrane proteins and provide an extended classification together with in-depth investigations of the membrane proteome's functional, structural, and evolutionary features.
(source URL, Via BMC Biology - Latest articles.)