Frequently Asked Questions

1. How to search for protein interactions?

    If you are interested in one protein and want to know what other proteins it may interact with, you may search the Human Interactome Resource for this protein by keyword search or sequence search. In keyword search, a protein may be queried by its HGNC ID, NCBI Entrez ID, NCBI RefSeq accession, Ensembl gene ID, UniProt accession, Approved Symbol, and Synonyms. The interactions involving this protein will be listed on the Protein Information page.

    If you have a list of proteins and want to know whether there are interactions between them, you can paste this list of proteins into the input box in the "Search Interaction" page and select "Show only pair-wise interactions between the specified proteins". This search function works only with HGNC IDs. We provide an "ID Mapping Tool" to convert other IDs to HGNC IDs.

    If you have a list of proteins and want to know all the interactions involving these proteins, you can paste this list of proteins into the input box in the "Search Interaction" page and select "Show all interactions involving the specified proteins". This search function works only with HGNC IDs. We provide an "ID Mapping Tool" to convert other IDs to HGNC IDs.

2. How to search for biological process linkages?

    If you are interested in one biological process and want to find what other biological processes are functionally linked to it, you may search the GO biological process linkage network for this biological process by its name or GO accession. This function is provided by the "Search GO Biological Process" page.

    If you have a list of biological processes and want to know whether there are functional linkages between them, you can paste this list of GO biological processes into the input box in the "Search Functional Linkages" page and select "Show only pairwise functional linkages between the specified biological processes".

    If you have a list of biological processes and want to know all other biological processes that are functionally linked to them, you can paste this list of GO biological processes into the input box in the "Search Functional Linkages" page and select "Show all functional linkages involving the specified biological processes".

3. How to search for predicted protein functions?

    Based on the Human Interactome Resource, the biological processes that a protein may participate are predicted using the "guilt-by-association" strategy. You can search for the predicted biological processes for a protein in the "Search Predicted Biological Process Annotations" page. A protein may be queried by its HGNC ID, NCBI Entrez ID, NCBI RefSeq accession, Ensembl gene ID, UniProt accession, Approved Symbol, and Synonyms.

4. How to perform a gene set linkage analysis?

    Molecular phenotypes observed in profiling experiments are sets of "significant genes". Identifying the functional linkages between phenotypic gene sets and well-established biological processes may help to draw a connection with the molecular phenotype to our knowledge framework, providing insights into the nature, cause and functional implication of the molecular phenotype.

    Functional linkages between a custom gene set and established GO biological processes can be analyzed in the "Gene Set Linkage Analysis" page. The custom gene set needs to be submitted in a text file. A gene set text file starts with the name of this gene set, which is marked by a "#". After the gene set name is a list of tab-delimited, or enter-delimited, HGNC IDs for the genes. We provide an "ID Mapping Tool" to convert other gene IDs to HGNC IDs. Two example gene set text files are given below.

Single gene set example 1



Single gene set example 2


    If you have a list of gene sets and want to identify the functional linkages between them, you can upload this list of gene sets in the "Multiple Gene Sets" tab. The custom gene set list needs to be submitted in a text file. Each gene set starts with the name of this gene set, which is marked by a "#". After each gene set name, there are tab-delimited HGNC IDs for the genes in this gene set. We provide an "ID Mapping Tool" to convert other gene IDs to HGNC IDs. An example list of gene sets file is given below.

Multiple gene sets example

    After successful submission of a gene set text file, the gene set linkage analysis server will assign a job ID. Normally, users will receive the analysis results via email. In case there is a problem with email, users can also retrieve the results by job ID in the "Gene Set Linkage Analysis" page.

5. How to interpret the significance of a gene set linkage?

    GSLA relies on a test of two hypotheses to detect a significant functional linkage between two biological processes. The first hypothesis (Q1) expects that the interaction density between genes in the two sets be higher than the interaction density between random genes. The second hypothesis (Q2) expects that the inter-gene-set interaction density observed in the biologically correct interactome be higher than the inter-gene-set interaction densities observed in topologically randomized interactomes consisting of the same nodes, with each node having the same number of neighbors. Alternatively speaking, Q1 verifies the strength of a functional linkage between two gene sets, whereas Q2 verifies that the observed functional linkage is a result of the biologically correct network topology (i.e., our knowledge about molecular mechanisms), rather than a result of the gene compositions in these two sets. In an interactome, some genes known as hubs have considerably more neighbors than other genes. Gene sets that include many hubs are likely to connect with other gene sets with large numbers of inter-gene-set interactions. Therefore, Q2 is used to ensure the "biological significance" of the detected functional linkages, by removing the confounding factor of gene composition. Q1 and Q2 are related but different hypotheses. Their complementarity increases the sensitivity and specificity of GSLA.

    In practice, functional linkages between two gene sets that are worthy of experimental investigation are expected to be much stronger than those linkages that marginally pass the significance threshold. Therefore, a stringent density-based cutoff was used for Q1 (density ≥ 0.01), which corresponded to the p-value cutoff of p<10-10 to reject the null hypothesis that the observed inter-gene-set interaction density is equal to that expected between random genes. This density criterion was used because it has a direct biological interpretation and because density increases with the monotonic decrease of the p value. For Q2, a cutoff of p<10-4 was used to reject the null hypothesis that the observed inter-gene-set interaction density can also be observed in topologically randomized interactions networks.

6. What is the semantic similarity between two biological processes?

    The semantic similarity between two GO biological processes was calculated as described in Wang et al. (2007). This approach encodes a GO term’s semantics (biological meanings) into a numeric value by aggregating the semantic contributions of their ancestor terms (including this specific term) in the GO graph and, in turn, uses an algorithm to measure the semantic similarity between two GO terms. For more detailed description of the algorithm, please see Wang et al. (2007). For a few examples, the similarity between "cellular response to retinoic acid" (GO:0071300) and "female gonad development" (GO:0008585) is 0.0546; the similarity between "platelet dense granule organization" (GO:0060155) and "cellular membrane organization" (GO:0016044) is 0.4412; and the similarity between "attachment of GPI anchor to protein" (GO:0016255) and "GPI anchor biosynthetic process" (GO:0006506) is 0.9649.

7. How to interpret the significance of a predicted interaction?

    The SVM score is an indicator of the prediction confidence. It is the actual output of the SVM prediction function. In general, the higher the score, the more confident a prediction is. However, this score cannot be directly translated to the probability of true interaction. A SVM score less than 0.5 means a non-interaction prediction. As the prediction model can only recognize 11.4% of the true interactions, many experimentally reported interactions have SVM scores less than 0.5.

8. How to interpret the significance of a predicted protein function?

    To predict the biological processes for a gene, the annotations of the target gene were first removed from the GO annotation file. The GO Term Finder tool (Boyle et al., 2004) was then used to identify annotation terms significantly enriched in the neighbors of the target gene. The GO Term Finder tool calculates a p value for each GO term, under the assumption that the distribution of GO terms in a random set of genes follows the hypergeometric distribution.

9. How to use the network visualizer?

    Cytoscape Web is embedded in many web pages to display graphical networks. It is a flash-based software, which interacts with users via JavaScript API. Therefore, for this network visualizer to operate appropriately, users need to enable JavaScript support and install Adobe Flash Player.

    The network visualizer displays a graph consisting of user-specified interactions or biological process linkages. Single click on a node or edge will select it. Click on the "Info/Tips" button will display detailed information about the current selection. Mouse-over on nodes or edges will display concise annotation. Double click on a node will expand the network to include its neighbors. Right click on nodes or edges will bring up context-sensitive menus for network manipulation. For edges, one useful right-click menu function is "Add this edge to My Collection". Users can change the layout of the network by clicking the "Layout" button and choose a suitable layout algorithm. Alternatively, users can simply drag nodes to their desired places.

10. How to export an interaction network?

    Click the "Export" button in the network visualizer will allow you to select a desired output file format and export the interactions. For molecular interactions, the supported exportation formats include the Microsoft Excel CSV format, the Cytoscape SIF format, GraphML format, PSI-MI 2.5 XML, and our own XML format. For biological process linkages, the only supported exportation format is the tab-delimited text format. In addition, the network visualizer can also save the network as PNG image or PDF image.

11. What is "My Collection"?

    "My Collection" is a temporary personalized storage place where you can store your interested molecular interactions or biological process linkages, and then analysis them or export them as a single data file. Please note that interactions or linkages in "My Collection" are stored in local browser cookies. This means that you need to enable cookie for this website, and the collection of interactions will be erased as soon as you close the browser program. The "My Collection" page is here.

12. How to cite us?

    Xi Zhou, Pengcheng Chen, Qiang Wei, Xueling Shen and Xin Chen. Human interactome resource and gene set linkage analysis for the functional interpretation of biologically meaningful gene sets. Bioinformatics(2013) Bioinformatics (2013) 29 (16): 2024-2031.