Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparability of Mixed IC50 Data – A Statistical Analysis

Abstract

The biochemical half maximal inhibitory concentration (IC50) is the most commonly used metric for on-target activity in lead optimization. It is used to guide lead optimization, build large-scale chemogenomics analysis, off-target activity and toxicity models based on public data. However, the use of public biochemical IC50 data is problematic, because they are assay specific and comparable only under certain conditions. For large scale analysis it is not feasible to check each data entry manually and it is very tempting to mix all available IC50 values from public database even if assay information is not reported. As previously reported for Ki database analysis, we first analyzed the types of errors, the redundancy and the variability that can be found in ChEMBL IC50 database. For assessing the variability of IC50 data independently measured in two different labs at least ten IC50 data for identical protein-ligand systems against the same target were searched in ChEMBL. As a not sufficient number of cases of this type are available, the variability of IC50 data was assessed by comparing all pairs of independent IC50 measurements on identical protein-ligand systems. The standard deviation of IC50 data is only 25% larger than the standard deviation of Ki data, suggesting that mixing IC50 data from different assays, even not knowing assay conditions details, only adds a moderate amount of noise to the overall data. The standard deviation of public ChEMBL IC50 data, as expected, resulted greater than the standard deviation of in-house intra-laboratory/inter-day IC50 data. Augmenting mixed public IC50 data by public Ki data does not deteriorate the quality of the mixed IC50 data, if the Ki is corrected by an offset. For a broad dataset such as ChEMBL database a Ki- IC50 conversion factor of 2 was found to be the most reasonable.

Introduction

Public collections of IC50 data (the half maximal inhibitory concentrations of ligands on their protein targets) represent a wealth of knowledge on bioactivity with growing importance. One of the major databases of public bioactivities for small molecules is ChEMBL, [1] which currently contains roughly three times more IC50 values than Ki values. It has been shown that the gap between the number of IC50 and Ki values is still increasing. [2] Proper usage of IC50 data facilitates the development of useful methods for drug discovery. Examples of such applications are the global mapping of pharmacological space by Paolini and co-workers, [3] the Similarity Ensemble Approach (SEA), [4] the Bayesian models for adverse drug reactions by Bender and coworkers, [5] the models used for polypharmacological optimization by Hopkins et al., [6] and the kinome-wide activity modeling studies by Schuerer and Muskal. [7] These methods can be used to predict off-target effects based on heterogeneous public activity data and chemical similarity analysis. Usually, public off-target toxicity models like human Ether-à-go-go-Related Gene (hERG) [8] and cytochrome P450 (CYP) models [9], [10] are based and validated on mixed public IC50 data, since there is not enough public data available that originates from one single assay.

In contrast to Ki values, IC50 data is assay specific. For the simplest typical case of competitive monosubstrate enzyme inhibition, Ki can be calculated from the IC50 according to the Cheng-Prusoff equation:where |S| is the substrate concentration and Km is the Michaelis-Menten constant of the substrate. [11] Under the same assay conditions the measured IC50 of same inhibitor or two different inhibitors (1 and 2 below) with the same mechanism of action can be compared as

The problem is that assay details are not reported in public bioactivity databases. Recently, Zdrazil et al. analyzed human P-glycoprotein bioassay data from the ChEMBL and TP-search databases. [12] They explore the ability of these data, determined in different assays, to be combined with each other. Their study indicates that for inhibitors of human P-glycoprotein this is possible under certain conditions: i.e., data coming from the same type of assay, same cell lines, and also same fluorescent or radiolabeled substrates with overlapping binding sites. However they point out that it is currently not possible to extract such data in automated fashion from the current public databases. Effort in annotating assay details would increase the capabilities of safe data integration thus increasing the usefulness of those huge data repositories freely available.

In this manuscript we report an estimate of the error introduced by mixing public IC50 data from different labs and how this can affect the capability of drawing scientifically sound conclusions from such data. By using the same statistical technique that we have previously introduced to determine the experimental uncertainty of heterogeneous public Ki data [13] we analyze the variability of all pairs of biochemical IC50 measurements on the same protein-ligand system independently of assay details.

In the following, we first describe our attempts in extracting a set of at least ten IC50 values from ChEMBL that have independently been measured in two comparable assays. Since all sets of identified measurements turn out to be not independent or otherwise faulty, we analyze the standard deviation of all truly independent pairs of IC50 values available from ChEMBL. Dubious entries and filters used to spot and remove faulty entries are described in detail. For the remaining pairs of measurements, the original publications of protein-ligand systems showing various ranges of IC50 differences were inspected in order to gain an impression of which activity differences are due to database errors and which activity differences are due to the variations in assay conditions. We then fitted a Gaussian distribution to the distribution of IC50 differences to estimate the standard deviation of valid pairs of independent IC50 measurements. By comparing the IC50 standard deviation to the equivalent Ki standard deviation, we can estimate the variability of heterogeneous IC50 data. The average difference between Ki and IC50 values and their correlation are assessed. Moreover the effect of mixing Ki and IC50 values in order to enlarge the data size was evaluated. Lastly, we analyze whether the variability of IC50 values depends on simple ligand properties such as molecular weight (MW) and the calculated octanol –water partition coefficient (logP).

Materials and Methods

Dataset Preparation

All measurements were extracted for the ChEMBL database version 14. It is the currently largest public database with bioactivities extracted from the literature. BindingDB [14] is similar in size, but has a significant overlap with ChEMBL with most of the values being copied from ChEMBL.

The raw data was filtered in order to remove erroneous entries as described earlier. [13] Generally, all analyses presented here are based on multiple affinity measurements of the same protein-ligand system. The filtering steps were the following:

  1. Remove all data from reviews, since this is not original data.
  2. Remove all unclear measurements (i.e. Unit not M, mM, µM, nM, pM, fM; qualified values (“<” or “>”); extremely high (pActivity >15) or extremely low (pActivity <2) values).
  3. Remove younger entry for exactly the same value reported twice (younger paper cites older paper).
  4. Remove younger entry for very close values reported twice (difference in pActivity <0.02: younger paper cites older paper and rounds).
  5. Remove both entries if their difference is exactly 3, 6, or 9. These are citations with unit-conversion errors.
  6. Remove entries for which the authors could not be extracted from PubMed.
  7. Only keep pairs where the name overlap of the authors is zero to make sure that measurements are from different laboratories.

After each step, protein-ligand systems that had only one measurement entry left (singletons) were removed. All affinity were converted to their negative logarithm pActivity (e.g. pIC50 or pKi) with M−1 as base unit (e.g. 1 µM is converted to 6 [log Activity units]).

In ChEMBL a confidence score is available for each bioactivity entry. According to the ChEMBL homepage, a confidence score of nine is the highest, a confidence score of four or more indicates a biochemical measurement and a confidence score below four indicates a cellular measurement. For the IC50 analysis, two sets of data were generated: Set1 contains all data with a confidence score of four and more, Set2 contains data with the highest confidence score nine only. Since it turned out that there is no difference in variability between Set1 and Set2, here we only report results for Set1.

From the initially available 616.555 IC50 values with confidence score greater or equal to four 10.895 IC50 values for 3.480 Protein/Ligand systems remained, yielding 20.356 pairs of independent measurements. Overall, the number of both protein/ligand systems and individual IC50 data points available for comparisons has been reduced by 94% and 93%. The filtering statistics is shown in Table 1.

thumbnail
Table 1. Filtering statistics for extracting independent pairs of IC50 measurements on identical systems.

https://doi.org/10.1371/journal.pone.0061007.t001

Metrics for Evaluating the Distribution of Errors

We analyze the distribution of the differences between two affinity measurements on the same protein-ligand system using the Standard Deviation (σ), the Mean Unsigned (Absolute) Error (MUE), the Median Unsigned Error (MedUE) the squared Pearson’s correlation coefficient (R2pearson = R2). They are defined aswith n being the number of pairs of measurements considered, ypub,i,1 and ypub,i,2 being the two published values of pair i and pub is the average of all measured values. If more than two measurements are available for a given protein-ligand system, all possible pairs are generated. The order of ypub,i,1 and ypub,i,2 has to be scrambled in order to not bias the calculation of R2Pearson and σ. As we have shown earlier, [13] MUE, MedUE and σ calculated from pairs of measurements are overestimated by a factor of √2. Therefore MUE, MedUE and σ calculated from pairs of measurements were divided by √2.

Raw data was extracted from ChEMBL14 using MySQL statements. Filtering and pairing of measurements were done using Python 2.7. The statistical analysis was carried out using R version 2.15.1. [15] All R-, Python- and MySQL-scripts used including detailed instructions on how to repeat the work can be found in the Archive S1.

Results

In order to assess the comparability of IC50 values, we first extracted all series of compounds that have been measured against the same protein target in two independent assays from whole ChEMBL. There were twelve series of ten or more compounds whose activity on the same target has been measured in different assays. An overview of the different series is given in Supporting Information (Table S1, Text S1–S2 and Figures S1–S2). However, eleven out of twelve series had overlapping authors and the single independently measured series was incorrectly annotated into the database.

Since it is not possible to find independently measured sets of at least ten IC50 values for the same target, the IC50 variability was determined differently. In the following, we analyze the IC50 data using an approach that we have previously introduced for analyzing the reproducibility of heterogeneous Ki data. All pairs of identical protein-ligand systems with independently measured IC50 values were extracted from ChEMBL and the variability of the differences between the pairs of measurements was calculated.

The distribution of pIC50 values is shown in Figure 1. The distribution of measured values is slightly skewed to the left with a maximum of roughly 30% of all pIC50 values reported between 7.0 and 8.0.

thumbnail
Figure 1. Distribution of the 9.465 pIC50 values for protein-ligand systems with independent multiple measurements.

https://doi.org/10.1371/journal.pone.0061007.g001

The distribution of ΔpIC50 values and the distribution of the number of independent measurements per protein-ligand system are shown in Figures 2 and 3. Roughly 70% of all ΔpIC50’s are smaller than one log unit.

thumbnail
Figure 2. Distribution of the 16.844 pairs of ΔpIC50 values for protein-ligand systems with independent multiple measurements.

The largest ΔpIC50 is 7.7 log units.

https://doi.org/10.1371/journal.pone.0061007.g002

thumbnail
Figure 3. Number of published independent values per protein-ligand system.

https://doi.org/10.1371/journal.pone.0061007.g003

Most systems with multiple independent measurements have two or three independent measurements. The most frequently measured system is celecoxib on cyclooxygenase-2 with 30 independently measured IC50 values.

Sets of ten pairs of measurements for seven ranges of ΔpIC50 were closely inspected. The selected ranges of ΔpIC50 for the inspected ten cases span the whole range of ΔpIC50 (see Figure 2). The values of 3.2 and 1.1 were selected to avoid pairs which could contain combinations of citation of previous values and unit transcription errors. The findings are summarized in Table 2.

thumbnail
Table 2. Errors found for samples of pairs of measurements with specific differences in measured pIC50.

https://doi.org/10.1371/journal.pone.0061007.t002

We found that very high differences in pIC50 (ΔpIC50>2.5) were in most cases due to annotation errors. Some measurements had wrong units assigned (unit error). The receptor subtype was sometimes incorrectly assigned or not assigned at all (receptor subtype error). Other errors come from wrong stereoisomers of ligands (stereochemistry error), cellular assays assigned as biochemical assays (cellular assay error), incorrect target annotations (target error) and erroneous values extracted from original publications (value error).

Unit errors are the most common error. Receptor subtype errors occur most often for older publications (e.g., papers from the 1980’s with published IC50 values for dopamine receptors, opioid receptors, and mono-amino oxidases in general, i.e. without distinguishing the subtypes). This data is mixed with the subtype specific data in ChEMBL. Stereochemistry errors occur when the stereochemistry is wrongly extracted from the original literature. Cellular assay errors occur when the reported IC50 values have been measured in a cellular assay, despite being associated with a confident score greater than four (see Dataset preparation section).

Pairs with small ΔpIC50’s can also be composed of erroneously reported IC50 data. For example, the group of pairs with ΔpIC50 = 0.05 contains one case where the IC50 extracted from the literature is incorrect as in the original manuscript there is an activity range given, whereas in the ChEMBL database only one threshold of the range is reported with an equal sign. Another smaller set of problems come from retracted original publications (for example, the original publication [16], publishing an IC50 value for the compound with ChEMBL ID CHEMBL266497 on aldose reductase (CHEMBL2622), was retracted). Considering the number of invalid pairs out of the ten inspected for the seven ΔpIC50 ranges there is a high probability that pairs with ΔpIC50≥2.5 contains errors in the database or in the original publication.

A plot of all pairs of pIC50 values is shown in Figure 4. The correlation coefficient for the raw extracted data is R2 = 0.40. Excluding a major part of the invalid pairs by removing all pairs with ΔpIC50≥2.5, the correlation coefficient becomes R2 = 0.53.

thumbnail
Figure 4. All Pairs of pIC50 values extracted from ChEMBL.

The two outer diagonal lines indicate the 2.5 log unit threshold, outside which the probability for finding faulty pairs of measurements is very high. The extreme disagreements are all due to clear errors.

https://doi.org/10.1371/journal.pone.0061007.g004

We also calculated the standard deviation σ of all ΔpIC50 and ΔpKi values between 0.05 (lower threshold) and a variable upper threshold (1.5, 2.0 and 2.5) by fitting the data to a Gaussian distribution. The lower threshold of 0.05 was selected to remove pairs which were just rounded duplicates. The standard deviations obtained for the ΔpIC50 and ΔpKi distributions are shown in Table 3. The fitted Gaussian and the raw distributions for ΔpIC50’s and ΔpKi’s with an upper threshold of 2.0 are shown in Figure 5.

thumbnail
Figure 5. Fitted Gaussian distribution of ΔpIC50 (red) and ΔpKi (black).

The Gaussian distributions shown were fitted to all ΔpActivity values with an upper threshold ΔpActivity = 2.0. Standard deviations for the fitted Gaussian distributions are σpIC50 = 0.87 and σpKi = 0.69. Note that since the σ here is calculated from pairs of measurements each containing experimental uncertainty and other sources of variability, it has to be divided by √2 in order to obtain the true σ of the individual measurements [13].

https://doi.org/10.1371/journal.pone.0061007.g005

thumbnail
Table 3. Standard deviation of a Gaussian distribution fitted to the inner part of the distribution of ΔpIC50 and ΔpKi.

https://doi.org/10.1371/journal.pone.0061007.t003

The standard deviations of the ΔpIC50 data is constantly 21–26% larger than the standard deviation of the ΔpKi data. After dividing by √2, the σ for the Gaussian distribution fitted to all ΔpKi values <2.5 then becomes 0.47 (a bit lower than the σ value of 0.54 previously calculated for heterogeneous pKi data from ChEMBL version 12 data without upper threshold for ΔpKi data. [13] Since σ, MUE, and MedUE are proportional to each other in Gaussian distributions, we can estimate σ, MUE and MedUE for the IC50 data to be 21–26% larger than the same metrics for pKi data, yielding σpIC50 = 0.68, MUEpIC50 = 0.55 and MedUE pIC50 = 0.43 (when using a factor of +25% for converting pKi data to pIC50 data).

In order to test the alternative approach of directly obtaining quality metrics from the data, we calculated the quality metrics from the ΔpIC50 data with an upper threshold of ΔpIC50 = 2.5. Here, σpIC50 = 0.68, MUEpIC50 = 0.54 and MedUE pIC50 = 0.43 are obtained. These values are very similar to the values obtained from comparing fitted Gaussian distributions and indicate that the erroneous pairs of measurements do not have a large effect on the overall result.

Similar performance was obtained considering only IC50 data with ChEMBL confidence score of nine (data not shown). As ChEMBL contains data from both human input and automatic extraction processes, we also looked if there was a difference between the two. Equally to the confidence score filtering, the results were similar with both data types.

We checked whether the ΔpIC50 depends on the overall activity measured or on physicochemical ligand properties like logP, logD, molecular weight (MW), polar surface area (PSA), the number hydrogen bond acceptors (HBA), the number hydrogen bond donors (HBD) or the number of rotatable bonds. Boxplots of all those properties versus the ΔpIC50 are shown in Figure 6. The ΔpIC50’s depend neither on the average measured pIC50 nor on any of the ligand properties examined.

thumbnail
Figure 6. ΔpIC50 versus average pIC50 measured, logP, logD, polar surface area, molecular weight, number of hydrogen bond acceptors, number of hydrogen bond donors and number of rotatable bonds.

The numbers above the boxplot indicate the number of ΔpIC50 values falling into the specific bin. Some boxplots are truncated at the very low and high ends because the low number of samples/bin makes the boxplot insignificant.

https://doi.org/10.1371/journal.pone.0061007.g006

We also examined whether the ΔpIC50 depends on the combination of average activity and logP, since one might expect large deviations in measured pIC50’s for compounds with low activity and high logP due to solubility issues. Here we also did not find a clear trend (Figure S3).

Can ChEMBL Ki and IC50 Data be Mixed?

Empirical statistical models and SAR interpretations improve with the amount of data. Above, we have shown that the variability of heterogeneous IC50 data is roughly 25% worse than that of Ki data. Therefore it is not recommendable to add IC50 data to Ki data as this would lower the quality of the data. However, since there is much more IC50 data than Ki data available, it is interesting to see what happens by augmenting the IC50 dataset with additional Ki data. Figure 7 shows the distribution of pKi and pIC50 data extracted from ChEMBL with the filters mentioned in Table 1. Overall, pIC50 and pKi data show a similar distribution with the pKi data slightly shifted towards higher values.

thumbnail
Figure 7. Distribution of published pIC50 (dark grey) and pKi (light grey) values for protein-ligand systems with multiple independent measurements.

https://doi.org/10.1371/journal.pone.0061007.g007

For identical protein-ligand systems, we extracted all pairs of pKi and pIC50 data that have passed the filters individually. This yields 11.556 pairs of measurements on 670 protein-ligand systems. A plot of measured pIC50 versus pKi is shown in Figure 8.

thumbnail
Figure 8. Measured pKi versus measured pIC50 for identical protein-ligand systems.

https://doi.org/10.1371/journal.pone.0061007.g008

Based on the Cheng-Prusoff equation and under the assumption of a competitive mechanism of action, pKi values are larger or equal to pIC50 values. However due to unknown mechanism, experimental uncertainty and some database annotation errors in the data, there are a significant number of pairs where the pIC50 is larger than the pKi. On average, the measured pKi values are 0.355 log units larger than the measured pIC50 values, corresponding to a factor of 2.3. A factor of 2 is in agreement with a balanced assay condition in which the substrate concentration is equal to the Km value. This is often used in order to allow the detection of inhibitors with different mechanism of action.

After subtracting 0.35 log units from the pKi values and correcting by √2, pKi and pIC50 values agree with an R2 = 0.46, σ = 0.68, MUE = 0.54 and MedUE = 0.43. The standard deviations of Gaussian distributions fitted to the inner part with an upper threshold of 1.5, 2.0 and 2.5 ΔpActivity units are 0.79, 0.83, and 0.85.

Overall, this is close to or even slightly better than the agreement obtained for pIC50 values with themselves. Therefore we can conclude that pKi values can be used to augment pIC50 values without any loss of quality, if they are corrected by an offset. In the absence of assay information, the best guess for the conversion factor between Ki into IC50 is extrapolated from the average offset calculated from the heterogeneous ChEMBL data, i.e. a factor of 2.3, corresponding to 0.35 pActivity units.

Discussion

In this contribution we show how the comparability of IC50 data can be analyzed using the public ChEMBL database. We find that when comparing all independently measured pIC50 data, the variability found for pIC50 data is approximately 25% larger than the variability found for pKi data, with σpIC50 = 0.68, MUEpIC50 = 0.55 and MedUE pIC50 = 0.43. These values correspond to the most probable variability of pIC50 data mixing from different (unknown) assays.

We want to stress that pIC50 data from different assays can only be compared under certain conditions. However, as discussed in the introduction, this is often done in large-scale data analysis. A standard deviation of 0.68 corresponds to a factor of 4.8, meaning that 68.2% of all IC50 measurements agree within a factor of 4.8, even when measured in different laboratories under potentially different assay conditions. One reason why the variability of IC50 data is found only moderately higher than the variability of Ki data might be that practically most of the IC50 assays may have been run using very similar assay protocols. Unfortunately, the assay descriptions available within ChEMBL are too terse to permit analyzing this any further.

IC50 values measured in the same laboratory usually show a better reproducibility. From our in-house database, we extracted series of reference pIC50 values measured for assay standards. The plots in Figure 9 show the pIC50 values measured for rolipram on PDE4D and cilostamide on PDE3. The standard deviation of the pIC50 values are σ = 0.22 for rolipram/PDE4D and σ = 0.17 for cilostamide/PDE3.

thumbnail
Figure 9. Variation of measured pIC50 values over time for rolipram/PDE4D and cilostamide/PDE3.

https://doi.org/10.1371/journal.pone.0061007.g009

There is some variation over time which could indicate changes in the assay conditions and solution handling. We also tried to find public series of at least ten compounds that have been measured in independent parallel assays. However, such series did not exist within ChEMBL as all the series we found were either measured in the same laboratory or the target protein was mistakenly annotated.

For extracting the pairs of IC50 data, which are indeed independently measured on the same protein-ligand system, we applied a set of filters that we have previously applied to filter and analyze Ki data. Here, the filters removed more than 90% of the IC50 data erroneously assumed to be independent measurements on the same protein-ligand system. When inspecting the remaining 20.356 pairs of measurements from 3.480 protein-ligand systems, we found that there are still a number invalid pairs, especially but not limited to the pairs with larger ΔpIC50. The main errors we found were unit transcription errors, wrong annotation of the receptor subtype, and annotation of cellular assays as biochemical assays. More rarely occurring errors were wrongly assigned stereochemistry, values and protein targets. These errors cannot be automatically detected and have to be manually curated out of the database over time [17].

In contrast to our previous study of Ki values, we observed a larger number of invalid pairs even for smaller ΔpIC50 approximately 2.5. To reduce the impact of these hard to find cases, we applied a different strategy to find the variability of the true pairs. By fitting a Gaussian distribution to the central part of the distribution we were able to compare the variability of the pIC50 data to the variability of the pKi data. We found that the ratio between pKi and pIC50 variability is relatively stable between 21 and 26% when varying the upper threshold for fitting the Gaussian distribution between 1.5 and 2.5 ΔpActivity units. Using this approach, we were able to estimate the variability of the IC50 data from the variability of the Ki data.

ChEMBL has a confidence score assigned for each activity value. The confidence score indicates how much the ChEMBL authors trust the value reported. Confidence scores below four indicate that the assay was a cellular assay, whereas confidence scores between four and nine indicate biochemical assays. In this study, we used all values that had a confidence score of at least four. The most confident data with a confidence score of nine was also exclusively used, but the results did not change. We also examined, whether there is a difference in data annotated as “autocurated” and data annotated as “expert” data. In this experiment, we also did not find any significant difference. The availability of assay description within ChEMBL would have allowed the analysis of whether specific assay types are statistically better comparable than other assay types or if the variability of pIC50 is lower in comparable assays. However, such information is not easily added to the database because this would require detailed assay ontologies and in the original literature assay details are often missing as well.

One might assume that higher IC50 values show a larger variability than for example single digit µM IC50 values because of solubility limits. However, our analysis shows that on the average this is clearly not the case. Moreover, the variability does not depend on any specific ligand properties such as logP, MW, PSA etc.

While the quality of pure Ki datasets would be reduced by adding IC50 data, we have shown that augmenting IC50 datasets by Ki data does not deteriorate the quality, if the Ki data is corrected by an offset. We found that pKi values reported in ChEMBL are on average 0.35 log units higher than pIC50 values, which corresponds to a factor of 2.3. The IC50 to Ki conversion factor is exactly 2.0 in competitive monosubstrate IC50 inhibition assays, if the substrate concentration is set equal to its Km value. This factor is close to the average difference between pKi and pIC50 values in ChEMBL and therefore in absence of any further specific assay knowledge available, a factor of 2.0 is the most probable conversion factor to convert Ki values to IC50 values.

Summary and Conclusions

In this contribution, we present an analysis of the comparability of public heterogeneous IC50 data. We find that the agreement of independently measured biochemical IC50 values is only 23–30% worse than the agreement of pKi data, irrespective to the used condition and type of assay. For heterogeneous biochemical pIC50 data, we find a variability with σpIC50 = 0.68, MUEpIC50 = 0.55 and MedUE pIC50 = 0.43. Although theoretically IC50 values with different assay conditions should not be comparable, this is common practice in analyzing large-scale off-target and toxicity datasets. Our analysis quantitatively assesses the consequence in doing so. We believe that this knowledge should be important for everybody who decides to work with IC50 data from various heterogeneous sources. We also show that Ki data can be used to augment IC50 datasets without any loss of quality if corrected by a factor of 2, which is the conversion factor most frequently found by comparing the IC50/Ki values in ChEMBL for the same protein-ligand systems.

Nevertheless, public IC50 data extracted from ChEMBL14 is quite error prone. The most common errors we found are unit conversion errors, receptor subtype errors and errors in mixing up biochemical and cellular assay. The data quality is good enough to build large-scale fishing tools where errors partially cancel each other out, but for detailed SAR analysis and methods based on individual or very few data points like activity cliff or matched pair analysis it is mandatory to take recourse to the original literature and ensure that the values are correctly annotated and comparable.

This work augments our previous work where we focused on the experimental uncertainty of heterogeneous public Ki data. As we have previously stated, it is likely the data quality will rise over time by continuous iterative improvement of the large databases such as ChEMBL and BindingDB. In a different branch of affinity databases, smaller high-quality affinity databases, potentially combined with other physicochemical data or structural knowledge are being built up (see for example the CSARdock challenge [18], [19]). It will also be interesting to see what the reproducibility of such high-quality data is going to be.

It is surprising that we did not find in ChEMBL a single set of at least ten inhibitors for which IC50 values on the same target has been independently measured by different laboratories or a scientific contribution in literature addressing the comparison of heterogeneous IC50 values. Due to the scarcity of details about the experimental assay setup in both original publications and current large activity databases it is not possible to systematically analyze the comparability of the reproducibility of IC50 data for the same assay or various assay types under the same conditions. Using in-house data we were able to estimate the interlab reproducibility of IC50 for the same assay under the same conditions.

We hope that with this article we increase the awareness of noise added during mixing blindly public IC50 values during the data selection process for SAR analysis and QSAR models and its impact in limiting the maximal achievable performance of these techniques.

Supporting Information

Figure S1.

Agreement of IC50 values for two dopamine transporter assays, measured in the same laboratory. Here the pairs of measurements agree quite well with an R2 of 0.70 and a mean error of 0.29. According to the assay description of the primary literature, the assay conditions have been the same. The same is true for the norepinephrine transporter assay (R2 = 0.73, MUE = 0.29).

https://doi.org/10.1371/journal.pone.0061007.s001

(DOCX)

Figure S2.

Agreement of IC50 values for two rattus norvegicus dihydrofolate reductase assays, measured in the same laboratory. Although the assays have been run in the same lab on DHFR from the same species, the IC50 values of rattus norvegicus DHFR agree with R2 = 0.25 and MUE = 0.61.

https://doi.org/10.1371/journal.pone.0061007.s002

(DOCX)

Figure S3.

Median ΔpIC50, binned according to average activity and logP. The numbers indicate the number of entries per bin. We do not see a clear trend in this plot.

https://doi.org/10.1371/journal.pone.0061007.s003

(DOCX)

Table S1.

All series where more than ten compounds have been measured in two parallel assays.

https://doi.org/10.1371/journal.pone.0061007.s004

(DOCX)

Archive S1.

Python- and R-scripts to repeat the analysis.

https://doi.org/10.1371/journal.pone.0061007.s006

(GZ)

Author Contributions

Conceived and designed the experiments: TK CK AV PG. Performed the experiments: TK CK. Analyzed the data: TK CK AV PG. Contributed reagents/materials/analysis tools: TK CK. Wrote the paper: TK CK AV PG.

References

  1. 1. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, et al. (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40: D1100–D1107.
  2. 2. Hu Y, Bajorath J (2012) Growth of Ligand–Target Interaction Data in ChEMBL Is Associated with Increasing and Activity Measurement-Dependent Compound Promiscuity. J Chem Inf Model 52: 2550–2558.
  3. 3. Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Global mapping of pharmacological space. Nat Biotechnol 24: 805–815.
  4. 4. Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, et al. (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25: 197–206.
  5. 5. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, et al. (2007) Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure. ChemMedChem 2: 861–873.
  6. 6. Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, et al. (2012) Automated design of ligands to polypharmacological profiles. Nature 492: 215–220.
  7. 7. Schürer SC, Muskal SM (2013) Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets. J Chem Inf Model. 53: 27–38.
  8. 8. Kramer C, Beck B, Kriegl JM, Clark T (2008) A Composite Model for hERG Blockade. ChemMedChem 3: 254–265.
  9. 9. Kirchmair J, Williamson MJ, Tyzack JD, Tan L, Bond PJ, et al. (2012) Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J Chem Inf Model 52: 617–648.
  10. 10. McCarren P, Bebernitz GR, Gedeck P, Glowienke S, Grondine MS, et al. (2011) Avoidance of the Ames test liability for aryl-amines via computation. Bioorg Med Chem 19: 3173–3182.
  11. 11. Cheng Y, Prusoff WH (1973) Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem Pharmacol 22: 3099–3108.
  12. 12. Zdrazil B, Pinto M, Vasanthanathan P, Williams AJ, Balderud LZ, et al. (2012) Annotating Human P-Glycoprotein Bioassay Data. Mol Inform 31: 599–609.
  13. 13. Kramer C, Kalliokoski T, Gedeck P, Vulpetti A (2012) The Experimental Uncertainty of Heterogeneous Public Ki Data. J Med Chem 55: 5165–5173.
  14. 14. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35: D198–D201.
  15. 15. Team RC (2012) R: A Language and Environment for Statistical Computing. Vienna, Austria. Available: http://www.R-project.org.
  16. 16. Sahoo PK, Behera P (2010) Synthesis and biological evaluation of [1,2,4]triazino[4,3-a] benzimidazole acetic acid derivatives as selective aldose reductase inhibitors. Eur J Med Chem 45: 909–914.
  17. 17. Kramer C, Lewis R (2012) QSARs, data and error in the modern age of drug discovery. Curr Top Med Chem 12: 1896–1902.
  18. 18. Dunbar JB, Smith RD, Yang C-Y, Ung PM-U, Lexa KW, et al. (2011) CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes. J Chem Inf Model 51: 2036–2046.
  19. 19. Smith RD, Dunbar JB, Ung PM-U, Esposito EX, Yang C-Y, et al. (2011) CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions. J Chem Inf Model 51: 2115–2131.