Presaging Critical Residues in Protein interfaces-Web Server (PCRPi-W): A Web Server to Chart Hot Spots in Protein Interfaces

Joan Segura Mora; Salam A. Assi; Narcis Fernandez-Fuentes

doi:10.1371/journal.pone.0012352

Abstract

Background

It is well established that only a portion of residues that mediate protein-protein interactions (PPIs), the so-called hot spot, contributes the most to the total binding energy, and thus its identification is an important and relevant question that has clear applications in drug discovery and protein design. The experimental identification of hot spots is however a lengthy and costly process, and thus there is an interest in computational tools that can complement and guide experimental efforts.

Principal Findings

Here, we present Presaging Critical Residues in Protein interfaces-Web server (http://www.bioinsilico.org/PCRPi), a web server that implements a recently described and highly accurate computational tool designed to predict critical residues in protein interfaces: PCRPi. PRCPi depends on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs).

Conclusions

PCRPi-W has been designed to provide an easy and convenient access to the broad scientific community. Predictions are readily available for download or presented in a web page that includes among other information links to relevant files, sequence information, and a Jmol applet to visualize and analyze the predictions in the context of the protein structure.

Citation: Segura Mora J, Assi SA, Fernandez-Fuentes N (2010) Presaging Critical Residues in Protein interfaces-Web Server (PCRPi-W): A Web Server to Chart Hot Spots in Protein Interfaces. PLoS ONE 5(8): e12352. https://doi.org/10.1371/journal.pone.0012352

Editor: Ashley M. Buckle, Monash University, Australia

Received: April 21, 2010; Accepted: July 28, 2010; Published: August 23, 2010

Copyright: © 2010 Segura Mora et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Research Councils United Kingdom Academic Fellow scheme (to NFF), Wellcome ViP award (to SAS), and an internal scholarship awarded by the Leeds Institute of Molecular Medicine (to JSM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Cellular tasks require highly precise and regulated communication between proteins. Whether a protein is part of a metabolic pathway, an intermediate signalling effector, part of the transcription machinery, or a component of the cytoskeleton -just to mention some examples- requires proteins to act as complexes rather than as isolated units. Thus, protein-protein interactions (PPIs) are ubiquitous in Biology and therefore offer an enormous potential for the discovery of novel therapeutic agents able to modulate PPIs.

The analysis of protein complexes for which tertiary structure is known, has shown that protein interfaces are large, typically between 1500–2000 Ang² [1], [2], involving many intermolecular contacts (10 to 30 side chains per protein on average), and that such surfaces are usually flat and lacking defining physicochemical traits. It is for that reason that the identification of small-molecules that can act as modulators of PPIs is widely regarded as a formidable goal. However, as recently reviewed by Wells and McLendon [3] (and references therein), exciting new data indicates that disruption of protein associations using small molecules is possible.

Part of the recent successes in the modulation of PPIs using small molecules has been possible by direct targeting of the important region, or hot spot, of the protein interface. The concept of hot spots in protein interfaces originates from the pioneering work of Clackson and Wells [4] that jointly with subsequent scientific works, have shown that most of binding energy in protein-protein associations can be ascribed to a small and complementary set of interfacial residues – a hot spot- surrounded by weaker interactions.

The experimental identification of hot spots in protein interfaces by Alanine scanning [5], Alanine shaving [6], or residue grafting [6], is a lengthy, labour-intensive, and costly process. Computational tools can be used to help and guide experimental efforts. We recently developed a novel computational tool: Presaging Critical Residues in Protein interfaces (PCRPi), that proved to be highly accurate and competitive with current computational methods [7]. In this paper, we present the implementation of the method as web application that will provide convenient and easy access to the method to the scientific community. The web application has been designed having in mind a wide range of potential users, thus it has a user-friendly and straightforward interface with a minimal number of tunable parameters. Predictions are readily available for download or presented in a web page that has a number of functionalities such as a Jmol applet to visualize and analyze the predictions in the context of the protein structure.

Results and Discussion

Submitting a task

Running a prediction on PCRPI-W is a straightforward procedure. On the submission web page (Figure 1, panel A), users have to submit the coordinates of the protein complex of interest by either selecting it from a locally mirrored Protein Databank (PDB) database [8] typing the PDB code in a text box or uploading the coordinates (PDB format only); and select the chain identification code of the protein of interest. In advanced options, users can choose the type of BN and training set (see below).

Download:

Figure 1. Several screenshots of PCRPi-W.

The home web page of the server is the submission web page (A), where upon submission a temporary web page (B) reports an unique job identification code and a link to the results web page that users can bookmark to retrieve their results when available. The results web page (C) provides access to a number of links among them: a link to download the list of predicted hot spot residues (D) and a link to visualize the protein complex colored by prediction probabilities using a Jmol applet (E).

https://doi.org/10.1371/journal.pone.0012352.g001

Prior to prediction, structures undergo a set of quality checks. If atoms present alternative locations or rotamers, only the first occurring rotamer is kept. Also, if residues have insertion codes, the distance with neighboring residues is calculated and discarded if structurally equivalent. Side-chains with missing atoms are re-constructed using Scrwl 4.0 [9], an important step because energy calculations are highly affected by missing atoms. Finally, the length of proteins are checked and those shorter that 40 residues are discarded. As a result, a modified version of the original coordinate file, remediated coordinates file, is generated. This is the file used as input during the prediction and is downloadable from the result web page. Changes to the original coordinate (if any) are recorded in the log file (see below Retrieving and visualizing results).

PCRPi-W features two types of BNs, a naïve and expert, that can be trained using two different datasets: Ab+ and Ab− (Figure 2). More information about the structure of the BNs and the composition of the training sets can be found in the help web page of the server or in the original publication describing the method [7]. By default, PCRPi-W run the prediction using a naïve version of the BN trained on the Ab+ dataset, although both, BNs type and training sets are tunable parameters and users can select the ones that adjust the best to their needs. If an e-mail address is given at time of the submission, user will be notified by e-mail once the job is finished including a hyperlink to the results web page (hyperlink also shown upon submission for bookmarking purposes; Figure 1, panel B). PCRPI-W assigns a unique job identifier for each submitted job (e.g. PCRPi_cA8r0nAz0). This job identifier can be used to check the status of the submission (i.e. in queue, running, finished) and to retrieve the results by typing it in the ‘Job ID’ field at the submission web page.

Download:

Figure 2. General overview of the prediction process.

PCRPi combines seven different measures by using BNs and outputs a probability. The input variables are: IE, TOP, BE, CON, 3DCON, ANCCON, and ANC3DCON. There are two different training datasets: Ab+ and Ab−, and three different BNs: a naïve and two training dataset-specific experts BNs that can be invoked during the prediction. For more information regarding PCRPi method and input variables, refer to the original publication describing the method [7].

https://doi.org/10.1371/journal.pone.0012352.g002

Jobs are handled by a queuing system and, if not competing jobs, typically take few minutes to be completed; larger protein complexes featuring large or multiple interfaces can take up to one hour. The most time consuming is the estimation of the binding free energy, which for large interfaces and protein complexes requires intensive and long computational times, and the sequence search and calculation of sequence profiles for evolutionary-based measures.

Retrieving and visualizing results

PCRPI-W returns a list of interface residues sorted by probability (Figure 1 panel C and D) and several links to download files used or generated during the prediction. A successful prediction will generate the following files: a file that contains the original coordinates as uploaded by the user or as in the PDB; the remediated version of the coordinates file (see above submitting a task); a modified version of the input coordinates where the B-factor field has been substituted by a value that is equal to the prediction probability times 100 (facilitating analysis of predictions when using molecular visualization programs such as PyMOL [10]); a list of interface residues sorted by probability; a file detailing the atomic interaction of the interface residues as defined by CSU program [11] (atomic interactions can be also visualized in the context of the structure by using a Jmol (http://www.jmol.org) applet, see next); and a log file that records the entire prediction process and that can be examined if errors are reported.

Other elements that are shown in the results web page is the mapping of predictions on the protein sequence and a Jmol applet that allows the visualization of the structure of the complex and the mapping of the predictions. The Jmol applet includes a clickable list of protein chains and residues sorted by probability (Figure 1, panel E), and thus facilitate the process of visualization and selection of interface residues and predictions. Upon selection of a given residue, this will be highlighted in ball-and-stick representation and the atomic interactions with neighbouring residues will be shown.

Possible bottlenecks

Occasionally, PCRPI-W may fail to provide a prediction. The main reason is usually when the coordinates file contains only one protein chain or if more that one, these do not interact, i.e. no atomic interactions between protein chains. In this case, interface(s) cannot be located and therefore the program fails. More rarely, there can be errors along the prediction process, e.g. problems during free energy calculations or errors when deriving evolutionary-based measures, e.g. PSI-BLAST [12] fails to find homologous sequences with significant E-values. As described above, a log file is available for users to download and examine to understand the reason(s) of reported error(s). In addition, users can contact the authors via e-mail for further support.

Availability and Future Directions

PCRPi-W server is freely available upon registration to the scientific community at http://www.bioinsilico.org/PCRPi. Besides the option of submitting tasks to the server, users can browser an extensive documentation, have access to related resources available online, and download the benchmark and training datasets.

Methods

Prediction algorithm

Several are the features that characterize the residues that are part of a hot spot and these have been exploited in the past for prediction purposes. These features can be broadly grouped in three categories depending on nature of the data. Hot spots can be predicted by energy, structural, and evolutionary-based (e.g. sequence conservation) analysis. Although these features are useful, it was shown that, individually, cannot unambiguously define hot spots [13]. PCRPi [7] overcomes this limitation by combining a set of seven different measures that account for energetic, structural, and evolutionary-based information (Figure 2). Individual measures are combined into an unique probabilistic framework by using Bayesian Networks (BNs) [14], [15].

The performance of PCRPi was benchmarked in two independent datasets [7]. The first set was composed of 25 protein complexes summing up 636 interfaces residues, 300 of which were validated as critical or non-critical residues by experimental means and available in the scientific literature. The second dataset was the protein complex formed by HRAS and a VH domain of an Fv antibody [16]. Under both scenarios PCRPi delivered highly accurate and consistent predictions. Moreover, in a head-to-head comparison with other available computational tools using the same test set, PCRPi predictions were superior in terms of precision, recall, and F1-scores (Table 1).

Download:

Table 1. Comparison of different methods for the prediction of critical residues in protein interfaces using a BID derived dataset as described in Tuncbag et al. [18].

https://doi.org/10.1371/journal.pone.0012352.t001

Design, implementation and use of PCRPi-W

PCRPI-W is implemented on an Apache server running on a Red Hat® enterprise linux-based operating system. The server is interfaced with a CGI Perl and Javascript coded web interface. PCRPI-W modules and accessory scripts are coded in Perl, Fortran, and C++ respectively. Databases required by the server, namely, PDB [8] and NCBI non-redundant (NR) protein sequence database [17], are locally mirrored and weekly updated. All the queries are submitted to a queuing system that submits the tasks to a computer farm. Results are displayed in HTML format and send to the user by e-mail containing a hyperlink to the results web page.

Acknowledgments

NFF thanks Dr. Gendra for critical reading and insightful comments to the manuscript and Ms Martina and Ms Daniela G Fernandez for continuing inspiration and motivation. Authors acknowledge constructive comments from anonymous reviewers.

Author Contributions

Conceived and designed the experiments: NFF. Performed the experiments: JSM SAA. Analyzed the data: NFF. Contributed reagents/materials/analysis tools: JSM SAA NFF. Wrote the paper: JSM SAA NFF.

References

1. Jones S, Thornton JM (1996) Principles of protein-protein interactions. ProcNatlAcadSciUSA 93: 13.
- View Article
- Google Scholar
2. Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein-protein recognition sites. JMolBiol 285: 2177.
- View Article
- Google Scholar
3. Wells JA, McClendon CL (2007) Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature 450: 1001–1009.
- View Article
- Google Scholar
4. Clackson T, Wells JA (1995) A Hot Spot of Binding Energy in a Hormone-Receptor Interface. Science 267: 383–386.
- View Article
- Google Scholar
5. Wells JA (1991) Systematic mutational analyses of protein-protein interfaces. Methods Enzymol 202: 390–411.
- View Article
- Google Scholar
6. Jin L, Wells JA (1994) Dissecting the energetics of an antibody-antigen interface by alanine shaving and molecular grafting. Protein Sci 3: 2351–2357.
- View Article
- Google Scholar
7. Assi SA, Tanaka T, Rabbitts TH, Fernandez-Fuentes N (2009) PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces. Nucleic Acids Res 38(6): e86.
- View Article
- Google Scholar
8. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, et al. (2002) The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 58: 899–907.
- View Article
- Google Scholar
9. Wang Q, Canutescu AA, Dunbrack RL Jr (2008) SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Nat Protoc 3: 1832–1847.
- View Article
- Google Scholar
10. http://www.pymol.org/ (last accessed 2010).
11. Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M (1999) Automated analysis of interatomic contacts in proteins. Bioinformatics 15: 327–332.
- View Article
- Google Scholar
12. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389.
- View Article
- Google Scholar
13. DeLano WL (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 12: 14–20.
- View Article
- Google Scholar
14. Pearl J (1988) Probabilistic Reasoning in Intelligent Systems. San Francisco: Morgan Kaufmann Publishers.
15. Jordan M (1998) Learning in Graphical Models. London: The MIT Press.
16. Tanaka T, Williams RL, Rabbitts TH (2007) Tumour prevention by a single antibody domain targeting the interaction of signal transduction proteins with RAS. EMBO J 26: 3250–3259.
- View Article
- Google Scholar
17. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61–65.
- View Article
- Google Scholar
18. Tuncbag N, Gursoy A, Keskin O (2009) Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25: 1513–1520.
- View Article
- Google Scholar
19. Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320: 369–387.
- View Article
- Google Scholar

[ref1] 1. Jones S, Thornton JM (1996) Principles of protein-protein interactions. ProcNatlAcadSciUSA 93: 13.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein-protein recognition sites. JMolBiol 285: 2177.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Wells JA, McClendon CL (2007) Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature 450: 1001–1009.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Clackson T, Wells JA (1995) A Hot Spot of Binding Energy in a Hormone-Receptor Interface. Science 267: 383–386.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Wells JA (1991) Systematic mutational analyses of protein-protein interfaces. Methods Enzymol 202: 390–411.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Jin L, Wells JA (1994) Dissecting the energetics of an antibody-antigen interface by alanine shaving and molecular grafting. Protein Sci 3: 2351–2357.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Assi SA, Tanaka T, Rabbitts TH, Fernandez-Fuentes N (2009) PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces. Nucleic Acids Res 38(6): e86.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, et al. (2002) The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 58: 899–907.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Wang Q, Canutescu AA, Dunbrack RL Jr (2008) SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Nat Protoc 3: 1832–1847.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. http://www.pymol.org/ (last accessed 2010).

[ref11] 11. Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M (1999) Automated analysis of interatomic contacts in proteins. Bioinformatics 15: 327–332.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. DeLano WL (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 12: 14–20.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Pearl J (1988) Probabilistic Reasoning in Intelligent Systems. San Francisco: Morgan Kaufmann Publishers.

[ref15] 15. Jordan M (1998) Learning in Graphical Models. London: The MIT Press.

[ref16] 16. Tanaka T, Williams RL, Rabbitts TH (2007) Tumour prevention by a single antibody domain targeting the interaction of signal transduction proteins with RAS. EMBO J 26: 3250–3259.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61–65.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Tuncbag N, Gursoy A, Keskin O (2009) Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25: 1513–1520.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320: 369–387.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

Figures

Abstract

Background

Principal Findings

Conclusions

Introduction

Results and Discussion

Submitting a task

Retrieving and visualizing results

Possible bottlenecks

Availability and Future Directions

Methods

Prediction algorithm

Design, implementation and use of PCRPi-W

Acknowledgments

Author Contributions

References