Comparability of Mixed IC50 Data – A Statistical Analysis

Tuomo Kalliokoski; Christian Kramer; Anna Vulpetti; Peter Gedeck

doi:10.1371/journal.pone.0061007

Abstract

The biochemical half maximal inhibitory concentration (IC₅₀) is the most commonly used metric for on-target activity in lead optimization. It is used to guide lead optimization, build large-scale chemogenomics analysis, off-target activity and toxicity models based on public data. However, the use of public biochemical IC₅₀ data is problematic, because they are assay specific and comparable only under certain conditions. For large scale analysis it is not feasible to check each data entry manually and it is very tempting to mix all available IC₅₀ values from public database even if assay information is not reported. As previously reported for K_i database analysis, we first analyzed the types of errors, the redundancy and the variability that can be found in ChEMBL IC₅₀ database. For assessing the variability of IC₅₀ data independently measured in two different labs at least ten IC₅₀ data for identical protein-ligand systems against the same target were searched in ChEMBL. As a not sufficient number of cases of this type are available, the variability of IC₅₀ data was assessed by comparing all pairs of independent IC₅₀ measurements on identical protein-ligand systems. The standard deviation of IC₅₀ data is only 25% larger than the standard deviation of K_i data, suggesting that mixing IC₅₀ data from different assays, even not knowing assay conditions details, only adds a moderate amount of noise to the overall data. The standard deviation of public ChEMBL IC₅₀ data, as expected, resulted greater than the standard deviation of in-house intra-laboratory/inter-day IC₅₀ data. Augmenting mixed public IC₅₀ data by public K_i data does not deteriorate the quality of the mixed IC₅₀ data, if the K_i is corrected by an offset. For a broad dataset such as ChEMBL database a K_i- IC₅₀ conversion factor of 2 was found to be the most reasonable.

Citation: Kalliokoski T, Kramer C, Vulpetti A, Gedeck P (2013) Comparability of Mixed IC₅₀ Data – A Statistical Analysis. PLoS ONE 8(4): e61007. https://doi.org/10.1371/journal.pone.0061007

Editor: Andrea Cavalli, University of Bologna & Italian Institute of Technology, Italy

Received: January 21, 2013; Accepted: March 5, 2013; Published: April 16, 2013

Copyright: © 2013 Kalliokoski et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: TK and CK are Novartis Presidential Postdoctoral Fellows funded by Novartis Education Office. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Public collections of IC₅₀ data (the half maximal inhibitory concentrations of ligands on their protein targets) represent a wealth of knowledge on bioactivity with growing importance. One of the major databases of public bioactivities for small molecules is ChEMBL, [1] which currently contains roughly three times more IC₅₀ values than K_i values. It has been shown that the gap between the number of IC₅₀ and K_i values is still increasing. [2] Proper usage of IC₅₀ data facilitates the development of useful methods for drug discovery. Examples of such applications are the global mapping of pharmacological space by Paolini and co-workers, [3] the Similarity Ensemble Approach (SEA), [4] the Bayesian models for adverse drug reactions by Bender and coworkers, [5] the models used for polypharmacological optimization by Hopkins et al., [6] and the kinome-wide activity modeling studies by Schuerer and Muskal. [7] These methods can be used to predict off-target effects based on heterogeneous public activity data and chemical similarity analysis. Usually, public off-target toxicity models like human Ether-à-go-go-Related Gene (hERG) [8] and cytochrome P450 (CYP) models [9], [10] are based and validated on mixed public IC₅₀ data, since there is not enough public data available that originates from one single assay.

In contrast to K_i values, IC₅₀ data is assay specific. For the simplest typical case of competitive monosubstrate enzyme inhibition, K_i can be calculated from the IC₅₀ according to the Cheng-Prusoff equation:where |S| is the substrate concentration and K_m is the Michaelis-Menten constant of the substrate. [11] Under the same assay conditions the measured IC₅₀ of same inhibitor or two different inhibitors (1 and 2 below) with the same mechanism of action can be compared as

The problem is that assay details are not reported in public bioactivity databases. Recently, Zdrazil et al. analyzed human P-glycoprotein bioassay data from the ChEMBL and TP-search databases. [12] They explore the ability of these data, determined in different assays, to be combined with each other. Their study indicates that for inhibitors of human P-glycoprotein this is possible under certain conditions: i.e., data coming from the same type of assay, same cell lines, and also same fluorescent or radiolabeled substrates with overlapping binding sites. However they point out that it is currently not possible to extract such data in automated fashion from the current public databases. Effort in annotating assay details would increase the capabilities of safe data integration thus increasing the usefulness of those huge data repositories freely available.

In this manuscript we report an estimate of the error introduced by mixing public IC₅₀ data from different labs and how this can affect the capability of drawing scientifically sound conclusions from such data. By using the same statistical technique that we have previously introduced to determine the experimental uncertainty of heterogeneous public K_i data [13] we analyze the variability of all pairs of biochemical IC₅₀ measurements on the same protein-ligand system independently of assay details.

In the following, we first describe our attempts in extracting a set of at least ten IC₅₀ values from ChEMBL that have independently been measured in two comparable assays. Since all sets of identified measurements turn out to be not independent or otherwise faulty, we analyze the standard deviation of all truly independent pairs of IC₅₀ values available from ChEMBL. Dubious entries and filters used to spot and remove faulty entries are described in detail. For the remaining pairs of measurements, the original publications of protein-ligand systems showing various ranges of IC₅₀ differences were inspected in order to gain an impression of which activity differences are due to database errors and which activity differences are due to the variations in assay conditions. We then fitted a Gaussian distribution to the distribution of IC₅₀ differences to estimate the standard deviation of valid pairs of independent IC₅₀ measurements. By comparing the IC₅₀ standard deviation to the equivalent K_i standard deviation, we can estimate the variability of heterogeneous IC₅₀ data. The average difference between K_i and IC₅₀ values and their correlation are assessed. Moreover the effect of mixing K_i and IC₅₀ values in order to enlarge the data size was evaluated. Lastly, we analyze whether the variability of IC₅₀ values depends on simple ligand properties such as molecular weight (MW) and the calculated octanol –water partition coefficient (logP).

Materials and Methods

Dataset Preparation

All measurements were extracted for the ChEMBL database version 14. It is the currently largest public database with bioactivities extracted from the literature. BindingDB [14] is similar in size, but has a significant overlap with ChEMBL with most of the values being copied from ChEMBL.

The raw data was filtered in order to remove erroneous entries as described earlier. [13] Generally, all analyses presented here are based on multiple affinity measurements of the same protein-ligand system. The filtering steps were the following:

Remove all data from reviews, since this is not original data.
Remove all unclear measurements (i.e. Unit not M, mM, µM, nM, pM, fM; qualified values (“<” or “>”); extremely high (pActivity >15) or extremely low (pActivity <2) values).
Remove younger entry for exactly the same value reported twice (younger paper cites older paper).
Remove younger entry for very close values reported twice (difference in pActivity <0.02: younger paper cites older paper and rounds).
Remove both entries if their difference is exactly 3, 6, or 9. These are citations with unit-conversion errors.
Remove entries for which the authors could not be extracted from PubMed.
Only keep pairs where the name overlap of the authors is zero to make sure that measurements are from different laboratories.

After each step, protein-ligand systems that had only one measurement entry left (singletons) were removed. All affinity were converted to their negative logarithm pActivity (e.g. pIC₅₀ or pK_i) with M⁻¹ as base unit (e.g. 1 µM is converted to 6 [log Activity units]).

In ChEMBL a confidence score is available for each bioactivity entry. According to the ChEMBL homepage, a confidence score of nine is the highest, a confidence score of four or more indicates a biochemical measurement and a confidence score below four indicates a cellular measurement. For the IC₅₀ analysis, two sets of data were generated: Set1 contains all data with a confidence score of four and more, Set2 contains data with the highest confidence score nine only. Since it turned out that there is no difference in variability between Set1 and Set2, here we only report results for Set1.

From the initially available 616.555 IC₅₀ values with confidence score greater or equal to four 10.895 IC₅₀ values for 3.480 Protein/Ligand systems remained, yielding 20.356 pairs of independent measurements. Overall, the number of both protein/ligand systems and individual IC₅₀ data points available for comparisons has been reduced by 94% and 93%. The filtering statistics is shown in Table 1.

Download:

Table 1. Filtering statistics for extracting independent pairs of IC₅₀ measurements on identical systems.

https://doi.org/10.1371/journal.pone.0061007.t001

Metrics for Evaluating the Distribution of Errors

We analyze the distribution of the differences between two affinity measurements on the same protein-ligand system using the Standard Deviation (σ), the Mean Unsigned (Absolute) Error (MUE), the Median Unsigned Error (M_edUE) the squared Pearson’s correlation coefficient (R²_pearson = R²). They are defined aswith n being the number of pairs of measurements considered, y_pub,i,1 and y_pub,i,2 being the two published values of pair i and _pub is the average of all measured values. If more than two measurements are available for a given protein-ligand system, all possible pairs are generated. The order of y_pub,i,1 and y_pub,i,2 has to be scrambled in order to not bias the calculation of R²_Pearson and σ. As we have shown earlier, [13] MUE, M_edUE and σ calculated from pairs of measurements are overestimated by a factor of √2. Therefore MUE, M_edUE and σ calculated from pairs of measurements were divided by √2.

Raw data was extracted from ChEMBL14 using MySQL statements. Filtering and pairing of measurements were done using Python 2.7. The statistical analysis was carried out using R version 2.15.1. [15] All R-, Python- and MySQL-scripts used including detailed instructions on how to repeat the work can be found in the Archive S1.

Results

In order to assess the comparability of IC₅₀ values, we first extracted all series of compounds that have been measured against the same protein target in two independent assays from whole ChEMBL. There were twelve series of ten or more compounds whose activity on the same target has been measured in different assays. An overview of the different series is given in Supporting Information (Table S1, Text S1–S2 and Figures S1–S2). However, eleven out of twelve series had overlapping authors and the single independently measured series was incorrectly annotated into the database.

Since it is not possible to find independently measured sets of at least ten IC₅₀ values for the same target, the IC₅₀ variability was determined differently. In the following, we analyze the IC₅₀ data using an approach that we have previously introduced for analyzing the reproducibility of heterogeneous K_i data. All pairs of identical protein-ligand systems with independently measured IC₅₀ values were extracted from ChEMBL and the variability of the differences between the pairs of measurements was calculated.

The distribution of pIC₅₀ values is shown in Figure 1. The distribution of measured values is slightly skewed to the left with a maximum of roughly 30% of all pIC₅₀ values reported between 7.0 and 8.0.

Download:

Figure 1. Distribution of the 9.465 pIC₅₀ values for protein-ligand systems with independent multiple measurements.

https://doi.org/10.1371/journal.pone.0061007.g001

The distribution of ΔpIC₅₀ values and the distribution of the number of independent measurements per protein-ligand system are shown in Figures 2 and 3. Roughly 70% of all ΔpIC₅₀’s are smaller than one log unit.

Download:

Figure 2. Distribution of the 16.844 pairs of ΔpIC₅₀ values for protein-ligand systems with independent multiple measurements.

The largest ΔpIC₅₀ is 7.7 log units.

https://doi.org/10.1371/journal.pone.0061007.g002

Download:

Figure 3. Number of published independent values per protein-ligand system.

https://doi.org/10.1371/journal.pone.0061007.g003

Most systems with multiple independent measurements have two or three independent measurements. The most frequently measured system is celecoxib on cyclooxygenase-2 with 30 independently measured IC₅₀ values.

Sets of ten pairs of measurements for seven ranges of ΔpIC₅₀ were closely inspected. The selected ranges of ΔpIC₅₀ for the inspected ten cases span the whole range of ΔpIC₅₀ (see Figure 2). The values of 3.2 and 1.1 were selected to avoid pairs which could contain combinations of citation of previous values and unit transcription errors. The findings are summarized in Table 2.

Download:

Table 2. Errors found for samples of pairs of measurements with specific differences in measured pIC₅₀.

https://doi.org/10.1371/journal.pone.0061007.t002

We found that very high differences in pIC₅₀ (ΔpIC₅₀>2.5) were in most cases due to annotation errors. Some measurements had wrong units assigned (unit error). The receptor subtype was sometimes incorrectly assigned or not assigned at all (receptor subtype error). Other errors come from wrong stereoisomers of ligands (stereochemistry error), cellular assays assigned as biochemical assays (cellular assay error), incorrect target annotations (target error) and erroneous values extracted from original publications (value error).

Unit errors are the most common error. Receptor subtype errors occur most often for older publications (e.g., papers from the 1980’s with published IC₅₀ values for dopamine receptors, opioid receptors, and mono-amino oxidases in general, i.e. without distinguishing the subtypes). This data is mixed with the subtype specific data in ChEMBL. Stereochemistry errors occur when the stereochemistry is wrongly extracted from the original literature. Cellular assay errors occur when the reported IC₅₀ values have been measured in a cellular assay, despite being associated with a confident score greater than four (see Dataset preparation section).

Pairs with small ΔpIC₅₀’s can also be composed of erroneously reported IC₅₀ data. For example, the group of pairs with ΔpIC₅₀ = 0.05 contains one case where the IC₅₀ extracted from the literature is incorrect as in the original manuscript there is an activity range given, whereas in the ChEMBL database only one threshold of the range is reported with an equal sign. Another smaller set of problems come from retracted original publications (for example, the original publication [16], publishing an IC₅₀ value for the compound with ChEMBL ID CHEMBL266497 on aldose reductase (CHEMBL2622), was retracted). Considering the number of invalid pairs out of the ten inspected for the seven ΔpIC₅₀ ranges there is a high probability that pairs with ΔpIC₅₀≥2.5 contains errors in the database or in the original publication.

A plot of all pairs of pIC₅₀ values is shown in Figure 4. The correlation coefficient for the raw extracted data is R² = 0.40. Excluding a major part of the invalid pairs by removing all pairs with ΔpIC₅₀≥2.5, the correlation coefficient becomes R² = 0.53.

Download:

Figure 4. All Pairs of pIC₅₀ values extracted from ChEMBL.

The two outer diagonal lines indicate the 2.5 log unit threshold, outside which the probability for finding faulty pairs of measurements is very high. The extreme disagreements are all due to clear errors.

https://doi.org/10.1371/journal.pone.0061007.g004

We also calculated the standard deviation σ of all ΔpIC₅₀ and ΔpK_i values between 0.05 (lower threshold) and a variable upper threshold (1.5, 2.0 and 2.5) by fitting the data to a Gaussian distribution. The lower threshold of 0.05 was selected to remove pairs which were just rounded duplicates. The standard deviations obtained for the ΔpIC₅₀ and ΔpK_i distributions are shown in Table 3. The fitted Gaussian and the raw distributions for ΔpIC₅₀’s and ΔpK_i’s with an upper threshold of 2.0 are shown in Figure 5.

Download:

Figure 5. Fitted Gaussian distribution of ΔpIC₅₀ (red) and ΔpK_i (black).

The Gaussian distributions shown were fitted to all ΔpActivity values with an upper threshold ΔpActivity = 2.0. Standard deviations for the fitted Gaussian distributions are σ_pIC50 = 0.87 and σ_pKi = 0.69. Note that since the σ here is calculated from pairs of measurements each containing experimental uncertainty and other sources of variability, it has to be divided by √2 in order to obtain the true σ of the individual measurements [13].

https://doi.org/10.1371/journal.pone.0061007.g005

Download:

Table 3. Standard deviation of a Gaussian distribution fitted to the inner part of the distribution of ΔpIC₅₀ and ΔpK_i.

https://doi.org/10.1371/journal.pone.0061007.t003

The standard deviations of the ΔpIC₅₀ data is constantly 21–26% larger than the standard deviation of the ΔpKi data. After dividing by √2, the σ for the Gaussian distribution fitted to all ΔpK_i values <2.5 then becomes 0.47 (a bit lower than the σ value of 0.54 previously calculated for heterogeneous pK_i data from ChEMBL version 12 data without upper threshold for ΔpKi data. [13] Since σ, MUE, and M_edUE are proportional to each other in Gaussian distributions, we can estimate σ, MUE and MedUE for the IC₅₀ data to be 21–26% larger than the same metrics for pK_i data, yielding σ_pIC50 = 0.68, MUE_pIC50 = 0.55 and M_edUE_pIC50 = 0.43 (when using a factor of +25% for converting pK_i data to pIC₅₀ data).

In order to test the alternative approach of directly obtaining quality metrics from the data, we calculated the quality metrics from the ΔpIC₅₀ data with an upper threshold of ΔpIC₅₀ = 2.5. Here, σ_pIC50 = 0.68, MUE_pIC50 = 0.54 and M_edUE_pIC50 = 0.43 are obtained. These values are very similar to the values obtained from comparing fitted Gaussian distributions and indicate that the erroneous pairs of measurements do not have a large effect on the overall result.

Similar performance was obtained considering only IC₅₀ data with ChEMBL confidence score of nine (data not shown). As ChEMBL contains data from both human input and automatic extraction processes, we also looked if there was a difference between the two. Equally to the confidence score filtering, the results were similar with both data types.

We checked whether the ΔpIC₅₀ depends on the overall activity measured or on physicochemical ligand properties like logP, logD, molecular weight (MW), polar surface area (PSA), the number hydrogen bond acceptors (HBA), the number hydrogen bond donors (HBD) or the number of rotatable bonds. Boxplots of all those properties versus the ΔpIC₅₀ are shown in Figure 6. The ΔpIC₅₀’s depend neither on the average measured pIC₅₀ nor on any of the ligand properties examined.

Download:

Figure 6. ΔpIC₅₀ versus average pIC₅₀ measured, logP, logD, polar surface area, molecular weight, number of hydrogen bond acceptors, number of hydrogen bond donors and number of rotatable bonds.

The numbers above the boxplot indicate the number of ΔpIC₅₀ values falling into the specific bin. Some boxplots are truncated at the very low and high ends because the low number of samples/bin makes the boxplot insignificant.

https://doi.org/10.1371/journal.pone.0061007.g006

We also examined whether the ΔpIC₅₀ depends on the combination of average activity and logP, since one might expect large deviations in measured pIC₅₀’s for compounds with low activity and high logP due to solubility issues. Here we also did not find a clear trend (Figure S3).

Can ChEMBL K_i and IC₅₀ Data be Mixed?

Empirical statistical models and SAR interpretations improve with the amount of data. Above, we have shown that the variability of heterogeneous IC₅₀ data is roughly 25% worse than that of K_i data. Therefore it is not recommendable to add IC₅₀ data to K_i data as this would lower the quality of the data. However, since there is much more IC₅₀ data than K_i data available, it is interesting to see what happens by augmenting the IC₅₀ dataset with additional K_i data. Figure 7 shows the distribution of pK_i and pIC₅₀ data extracted from ChEMBL with the filters mentioned in Table 1. Overall, pIC₅₀ and pK_i data show a similar distribution with the pK_i data slightly shifted towards higher values.

Download:

Figure 7. Distribution of published pIC₅₀ (dark grey) and pK_i (light grey) values for protein-ligand systems with multiple independent measurements.

https://doi.org/10.1371/journal.pone.0061007.g007

For identical protein-ligand systems, we extracted all pairs of pK_i and pIC₅₀ data that have passed the filters individually. This yields 11.556 pairs of measurements on 670 protein-ligand systems. A plot of measured pIC₅₀ versus pK_i is shown in Figure 8.

Download:

Figure 8. Measured pK_i versus measured pIC₅₀ for identical protein-ligand systems.

https://doi.org/10.1371/journal.pone.0061007.g008

Based on the Cheng-Prusoff equation and under the assumption of a competitive mechanism of action, pK_i values are larger or equal to pIC₅₀ values. However due to unknown mechanism, experimental uncertainty and some database annotation errors in the data, there are a significant number of pairs where the pIC₅₀ is larger than the pK_i. On average, the measured pK_i values are 0.355 log units larger than the measured pIC₅₀ values, corresponding to a factor of 2.3. A factor of 2 is in agreement with a balanced assay condition in which the substrate concentration is equal to the K_m value. This is often used in order to allow the detection of inhibitors with different mechanism of action.

After subtracting 0.35 log units from the pK_i values and correcting by √2, pK_i and pIC₅₀ values agree with an R² = 0.46, σ = 0.68, MUE = 0.54 and M_edUE = 0.43. The standard deviations of Gaussian distributions fitted to the inner part with an upper threshold of 1.5, 2.0 and 2.5 ΔpActivity units are 0.79, 0.83, and 0.85.

Overall, this is close to or even slightly better than the agreement obtained for pIC₅₀ values with themselves. Therefore we can conclude that pK_i values can be used to augment pIC₅₀ values without any loss of quality, if they are corrected by an offset. In the absence of assay information, the best guess for the conversion factor between K_i into IC₅₀ is extrapolated from the average offset calculated from the heterogeneous ChEMBL data, i.e. a factor of 2.3, corresponding to 0.35 pActivity units.

Discussion

In this contribution we show how the comparability of IC₅₀ data can be analyzed using the public ChEMBL database. We find that when comparing all independently measured pIC₅₀ data, the variability found for pIC₅₀ data is approximately 25% larger than the variability found for pK_i data, with σ_pIC50 = 0.68, MUE_pIC50 = 0.55 and M_edUE_pIC50 = 0.43. These values correspond to the most probable variability of pIC₅₀ data mixing from different (unknown) assays.

We want to stress that pIC₅₀ data from different assays can only be compared under certain conditions. However, as discussed in the introduction, this is often done in large-scale data analysis. A standard deviation of 0.68 corresponds to a factor of 4.8, meaning that 68.2% of all IC₅₀ measurements agree within a factor of 4.8, even when measured in different laboratories under potentially different assay conditions. One reason why the variability of IC₅₀ data is found only moderately higher than the variability of K_i data might be that practically most of the IC₅₀ assays may have been run using very similar assay protocols. Unfortunately, the assay descriptions available within ChEMBL are too terse to permit analyzing this any further.

IC₅₀ values measured in the same laboratory usually show a better reproducibility. From our in-house database, we extracted series of reference pIC₅₀ values measured for assay standards. The plots in Figure 9 show the pIC₅₀ values measured for rolipram on PDE4D and cilostamide on PDE3. The standard deviation of the pIC₅₀ values are σ = 0.22 for rolipram/PDE4D and σ = 0.17 for cilostamide/PDE3.

Download:

Figure 9. Variation of measured pIC₅₀ values over time for rolipram/PDE4D and cilostamide/PDE3.

https://doi.org/10.1371/journal.pone.0061007.g009

There is some variation over time which could indicate changes in the assay conditions and solution handling. We also tried to find public series of at least ten compounds that have been measured in independent parallel assays. However, such series did not exist within ChEMBL as all the series we found were either measured in the same laboratory or the target protein was mistakenly annotated.

For extracting the pairs of IC₅₀ data, which are indeed independently measured on the same protein-ligand system, we applied a set of filters that we have previously applied to filter and analyze K_i data. Here, the filters removed more than 90% of the IC₅₀ data erroneously assumed to be independent measurements on the same protein-ligand system. When inspecting the remaining 20.356 pairs of measurements from 3.480 protein-ligand systems, we found that there are still a number invalid pairs, especially but not limited to the pairs with larger ΔpIC₅₀. The main errors we found were unit transcription errors, wrong annotation of the receptor subtype, and annotation of cellular assays as biochemical assays. More rarely occurring errors were wrongly assigned stereochemistry, values and protein targets. These errors cannot be automatically detected and have to be manually curated out of the database over time [17].

In contrast to our previous study of K_i values, we observed a larger number of invalid pairs even for smaller ΔpIC₅₀ approximately 2.5. To reduce the impact of these hard to find cases, we applied a different strategy to find the variability of the true pairs. By fitting a Gaussian distribution to the central part of the distribution we were able to compare the variability of the pIC₅₀ data to the variability of the pK_i data. We found that the ratio between pK_i and pIC₅₀ variability is relatively stable between 21 and 26% when varying the upper threshold for fitting the Gaussian distribution between 1.5 and 2.5 ΔpActivity units. Using this approach, we were able to estimate the variability of the IC₅₀ data from the variability of the K_i data.

ChEMBL has a confidence score assigned for each activity value. The confidence score indicates how much the ChEMBL authors trust the value reported. Confidence scores below four indicate that the assay was a cellular assay, whereas confidence scores between four and nine indicate biochemical assays. In this study, we used all values that had a confidence score of at least four. The most confident data with a confidence score of nine was also exclusively used, but the results did not change. We also examined, whether there is a difference in data annotated as “autocurated” and data annotated as “expert” data. In this experiment, we also did not find any significant difference. The availability of assay description within ChEMBL would have allowed the analysis of whether specific assay types are statistically better comparable than other assay types or if the variability of pIC₅₀ is lower in comparable assays. However, such information is not easily added to the database because this would require detailed assay ontologies and in the original literature assay details are often missing as well.

One might assume that higher IC₅₀ values show a larger variability than for example single digit µM IC₅₀ values because of solubility limits. However, our analysis shows that on the average this is clearly not the case. Moreover, the variability does not depend on any specific ligand properties such as logP, MW, PSA etc.

While the quality of pure K_i datasets would be reduced by adding IC₅₀ data, we have shown that augmenting IC₅₀ datasets by K_i data does not deteriorate the quality, if the K_i data is corrected by an offset. We found that pK_i values reported in ChEMBL are on average 0.35 log units higher than pIC₅₀ values, which corresponds to a factor of 2.3. The IC₅₀ to K_i conversion factor is exactly 2.0 in competitive monosubstrate IC₅₀ inhibition assays, if the substrate concentration is set equal to its K_m value. This factor is close to the average difference between pKi and pIC₅₀ values in ChEMBL and therefore in absence of any further specific assay knowledge available, a factor of 2.0 is the most probable conversion factor to convert K_i values to IC₅₀ values.

Summary and Conclusions

In this contribution, we present an analysis of the comparability of public heterogeneous IC₅₀ data. We find that the agreement of independently measured biochemical IC₅₀ values is only 23–30% worse than the agreement of pK_i data, irrespective to the used condition and type of assay. For heterogeneous biochemical pIC₅₀ data, we find a variability with σ_pIC50 = 0.68, MUE_pIC50 = 0.55 and M_edUE_pIC50 = 0.43. Although theoretically IC₅₀ values with different assay conditions should not be comparable, this is common practice in analyzing large-scale off-target and toxicity datasets. Our analysis quantitatively assesses the consequence in doing so. We believe that this knowledge should be important for everybody who decides to work with IC₅₀ data from various heterogeneous sources. We also show that K_i data can be used to augment IC₅₀ datasets without any loss of quality if corrected by a factor of 2, which is the conversion factor most frequently found by comparing the IC₅₀/K_i values in ChEMBL for the same protein-ligand systems.

Nevertheless, public IC₅₀ data extracted from ChEMBL14 is quite error prone. The most common errors we found are unit conversion errors, receptor subtype errors and errors in mixing up biochemical and cellular assay. The data quality is good enough to build large-scale fishing tools where errors partially cancel each other out, but for detailed SAR analysis and methods based on individual or very few data points like activity cliff or matched pair analysis it is mandatory to take recourse to the original literature and ensure that the values are correctly annotated and comparable.

This work augments our previous work where we focused on the experimental uncertainty of heterogeneous public K_i data. As we have previously stated, it is likely the data quality will rise over time by continuous iterative improvement of the large databases such as ChEMBL and BindingDB. In a different branch of affinity databases, smaller high-quality affinity databases, potentially combined with other physicochemical data or structural knowledge are being built up (see for example the CSARdock challenge [18], [19]). It will also be interesting to see what the reproducibility of such high-quality data is going to be.

It is surprising that we did not find in ChEMBL a single set of at least ten inhibitors for which IC₅₀ values on the same target has been independently measured by different laboratories or a scientific contribution in literature addressing the comparison of heterogeneous IC₅₀ values. Due to the scarcity of details about the experimental assay setup in both original publications and current large activity databases it is not possible to systematically analyze the comparability of the reproducibility of IC₅₀ data for the same assay or various assay types under the same conditions. Using in-house data we were able to estimate the interlab reproducibility of IC₅₀ for the same assay under the same conditions.

We hope that with this article we increase the awareness of noise added during mixing blindly public IC₅₀ values during the data selection process for SAR analysis and QSAR models and its impact in limiting the maximal achievable performance of these techniques.

Supporting Information

Figure S1.

Agreement of IC₅₀ values for two dopamine transporter assays, measured in the same laboratory. Here the pairs of measurements agree quite well with an R² of 0.70 and a mean error of 0.29. According to the assay description of the primary literature, the assay conditions have been the same. The same is true for the norepinephrine transporter assay (R² = 0.73, MUE = 0.29).

https://doi.org/10.1371/journal.pone.0061007.s001

(DOCX)

Figure S2.

Agreement of IC₅₀ values for two rattus norvegicus dihydrofolate reductase assays, measured in the same laboratory. Although the assays have been run in the same lab on DHFR from the same species, the IC₅₀ values of rattus norvegicus DHFR agree with R² = 0.25 and MUE = 0.61.

https://doi.org/10.1371/journal.pone.0061007.s002

(DOCX)

Figure S3.

Median ΔpIC₅₀, binned according to average activity and logP. The numbers indicate the number of entries per bin. We do not see a clear trend in this plot.

https://doi.org/10.1371/journal.pone.0061007.s003

(DOCX)

Table S1.

All series where more than ten compounds have been measured in two parallel assays.

https://doi.org/10.1371/journal.pone.0061007.s004

(DOCX)

Text S1.

Closer inspection of Table S1.

https://doi.org/10.1371/journal.pone.0061007.s005

(DOCX)

Archive S1.

Python- and R-scripts to repeat the analysis.

https://doi.org/10.1371/journal.pone.0061007.s006

(GZ)

Author Contributions

Conceived and designed the experiments: TK CK AV PG. Performed the experiments: TK CK. Analyzed the data: TK CK AV PG. Contributed reagents/materials/analysis tools: TK CK. Wrote the paper: TK CK AV PG.

References

1. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, et al. (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40: D1100–D1107.
- View Article
- Google Scholar
2. Hu Y, Bajorath J (2012) Growth of Ligand–Target Interaction Data in ChEMBL Is Associated with Increasing and Activity Measurement-Dependent Compound Promiscuity. J Chem Inf Model 52: 2550–2558.
- View Article
- Google Scholar
3. Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Global mapping of pharmacological space. Nat Biotechnol 24: 805–815.
- View Article
- Google Scholar
4. Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, et al. (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25: 197–206.
- View Article
- Google Scholar
5. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, et al. (2007) Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure. ChemMedChem 2: 861–873.
- View Article
- Google Scholar
6. Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, et al. (2012) Automated design of ligands to polypharmacological profiles. Nature 492: 215–220.
- View Article
- Google Scholar
7. Schürer SC, Muskal SM (2013) Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets. J Chem Inf Model. 53: 27–38.
- View Article
- Google Scholar
8. Kramer C, Beck B, Kriegl JM, Clark T (2008) A Composite Model for hERG Blockade. ChemMedChem 3: 254–265.
- View Article
- Google Scholar
9. Kirchmair J, Williamson MJ, Tyzack JD, Tan L, Bond PJ, et al. (2012) Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J Chem Inf Model 52: 617–648.
- View Article
- Google Scholar
10. McCarren P, Bebernitz GR, Gedeck P, Glowienke S, Grondine MS, et al. (2011) Avoidance of the Ames test liability for aryl-amines via computation. Bioorg Med Chem 19: 3173–3182.
- View Article
- Google Scholar
11. Cheng Y, Prusoff WH (1973) Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem Pharmacol 22: 3099–3108.
- View Article
- Google Scholar
12. Zdrazil B, Pinto M, Vasanthanathan P, Williams AJ, Balderud LZ, et al. (2012) Annotating Human P-Glycoprotein Bioassay Data. Mol Inform 31: 599–609.
- View Article
- Google Scholar
13. Kramer C, Kalliokoski T, Gedeck P, Vulpetti A (2012) The Experimental Uncertainty of Heterogeneous Public Ki Data. J Med Chem 55: 5165–5173.
- View Article
- Google Scholar
14. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35: D198–D201.
- View Article
- Google Scholar
15. Team RC (2012) R: A Language and Environment for Statistical Computing. Vienna, Austria. Available: http://www.R-project.org.
16. Sahoo PK, Behera P (2010) Synthesis and biological evaluation of [1,2,4]triazino[4,3-a] benzimidazole acetic acid derivatives as selective aldose reductase inhibitors. Eur J Med Chem 45: 909–914.
- View Article
- Google Scholar
17. Kramer C, Lewis R (2012) QSARs, data and error in the modern age of drug discovery. Curr Top Med Chem 12: 1896–1902.
- View Article
- Google Scholar
18. Dunbar JB, Smith RD, Yang C-Y, Ung PM-U, Lexa KW, et al. (2011) CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes. J Chem Inf Model 51: 2036–2046.
- View Article
- Google Scholar
19. Smith RD, Dunbar JB, Ung PM-U, Esposito EX, Yang C-Y, et al. (2011) CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions. J Chem Inf Model 51: 2115–2131.
- View Article
- Google Scholar

[ref1] 1. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, et al. (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40: D1100–D1107.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Hu Y, Bajorath J (2012) Growth of Ligand–Target Interaction Data in ChEMBL Is Associated with Increasing and Activity Measurement-Dependent Compound Promiscuity. J Chem Inf Model 52: 2550–2558.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL (2006) Global mapping of pharmacological space. Nat Biotechnol 24: 805–815.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, et al. (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25: 197–206.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, et al. (2007) Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off-Target Effects from Chemical Structure. ChemMedChem 2: 861–873.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, et al. (2012) Automated design of ligands to polypharmacological profiles. Nature 492: 215–220.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Schürer SC, Muskal SM (2013) Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets. J Chem Inf Model. 53: 27–38.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Kramer C, Beck B, Kriegl JM, Clark T (2008) A Composite Model for hERG Blockade. ChemMedChem 3: 254–265.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Kirchmair J, Williamson MJ, Tyzack JD, Tan L, Bond PJ, et al. (2012) Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J Chem Inf Model 52: 617–648.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. McCarren P, Bebernitz GR, Gedeck P, Glowienke S, Grondine MS, et al. (2011) Avoidance of the Ames test liability for aryl-amines via computation. Bioorg Med Chem 19: 3173–3182.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Cheng Y, Prusoff WH (1973) Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem Pharmacol 22: 3099–3108.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Zdrazil B, Pinto M, Vasanthanathan P, Williams AJ, Balderud LZ, et al. (2012) Annotating Human P-Glycoprotein Bioassay Data. Mol Inform 31: 599–609.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Kramer C, Kalliokoski T, Gedeck P, Vulpetti A (2012) The Experimental Uncertainty of Heterogeneous Public Ki Data. J Med Chem 55: 5165–5173.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK (2007) BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res 35: D198–D201.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Team RC (2012) R: A Language and Environment for Statistical Computing. Vienna, Austria. Available: http://www.R-project.org.

[ref16] 16. Sahoo PK, Behera P (2010) Synthesis and biological evaluation of [1,2,4]triazino[4,3-a] benzimidazole acetic acid derivatives as selective aldose reductase inhibitors. Eur J Med Chem 45: 909–914.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref17] 17. Kramer C, Lewis R (2012) QSARs, data and error in the modern age of drug discovery. Curr Top Med Chem 12: 1896–1902.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref18] 18. Dunbar JB, Smith RD, Yang C-Y, Ung PM-U, Lexa KW, et al. (2011) CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes. J Chem Inf Model 51: 2036–2046.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref19] 19. Smith RD, Dunbar JB, Ung PM-U, Esposito EX, Yang C-Y, et al. (2011) CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions. J Chem Inf Model 51: 2115–2131.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Dataset Preparation

Metrics for Evaluating the Distribution of Errors

Results

Can ChEMBL Ki and IC50 Data be Mixed?

Discussion

Summary and Conclusions

Supporting Information

Figure S1.

Figure S2.

Figure S3.

Table S1.

Text S1.

Archive S1.

Author Contributions

References

Can ChEMBL K_i and IC₅₀ Data be Mixed?