A robust Pearson correlation test for a general point null using a surrogate bootstrap distribution

Alan D. Hutson

doi:10.1371/journal.pone.0216287

Abstract

In this note we present a robust bootstrap test with good Type I error control for testing the general hypothesis H₀: ρ = ρ₀. In order to carry out this test we use what is termed a surrogate bootstrap distribution. The test was inspired by the studentized permutation for testing H₀: ρ = 0, which was proven to be exact in certain scenarios and asymptotically correct overall. We show that the bootstrap based test is robust to a variety of distributional scenarios in terms of proper Type I error control.

Citation: Hutson AD (2019) A robust Pearson correlation test for a general point null using a surrogate bootstrap distribution. PLoS ONE 14(5): e0216287. https://doi.org/10.1371/journal.pone.0216287

Editor: Eugene Demidenko, Dartmouth College Geisel School of Medicine, UNITED STATES

Received: October 4, 2018; Accepted: April 17, 2019; Published: May 16, 2019

Copyright: © 2019 Alan D. Hutson. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data are provided in a table in the manuscript.

Funding: This work was supported by Roswell Park Cancer Institute and National Cancer Institute (NCI) grant P30CA016056 (Hutson, Co-Inv), National Science Foundation under grant NSF IIS-1514204 (Hutson, Co-Inv) and NRG Oncology Statistical and Data Management Center grant U10CA180822 (Hutson MPI) and IOTN Moonshot grant U24CA232979-01 (Hutson MPI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The author has declared that no competing interests exist.

Introduction

The term “correlation” was introduced by Galton as a synonym for regression and the term “coefficient of correlation” was first used by Edgeworth, who also derived the first sample estimator of linear association [1], [2]. Estimation and testing based on the correlation coefficient is one of the most used approaches towards examining the association between two continuous variables. Even though Edgeworth first derived the estimator of the sample correlation coefficient it was Karl Pearson who is credited with much of its development via his 1896 paper [3], and why we often refer to the “Pearson Correlation Coefficient” as given by the form (1) where −1 < ρ < 1, μ_XY is the covariance between two random variables X and Y, σ_X is the standard deviation of X and σ_Y is the standard deviation of Y.

Throughout this note let (X₁, Y₁), (X₂, Y₂), ⋯, (X_n, Y_n) be n paired observations from a non-degenerate joint distribution F_XY(x, y). The well-known sample estimator for ρ at (1) is given as (2) where and . The estimator is also well-known to be the maximum likelihood estimator for the bivariate normal correlation parameter.

The sampling distribution for at (2) under bivariate normality was derived by R. A. Fisher [4]. For the case when ρ = 0, “Student” [5] surmised that has a t-distribution with n − 2 degrees-of-freedom. The t-distribution based test, assuming bivariate normality, for testing the hypothesis H₀: ρ = 0 versus H₁: ρ ≠ 0 is generally what is found in most statistical software packages. For the more general bivariate normal case when ρ ≠ 0 the exact distribution for at (2) is a bit more unwieldy [1]. Remarkably, Fisher discovered a large sample asymptotic normal variance stabilized approximation based on what is now known as Fisher’s z-transformation, which is given as (3) This transformed variable tends to normality much faster than the classic distribution free based asymptotic normal approximation. The first order variance of z at (3) is 1/(n − 3). Various refinements of Fisher’s work have been carried out over the years in terms of both bias reduction and refined variance expressions [1]. For the more general test of H₀: ρ = ρ₀ (including H₀: ρ = 0 versus one-sided alternatives) Fisher’z z-transformation approach, again under bivariate normality assumptions, is what most current statistical software packages employ.

Alternatively, the “distribution free” large sample approximation for the sampling distribution for may be derived using the multivariate delta method as outlined in Serfling [6] page 126 and given by first noting that we can express the sample estimator as , where (4) such that as per Serfling [6] we have (5) For expression (5) Σ is the 5 × 5 variance-covariance matrix for the vector of the components of the summands of V given by . The elements of d, which will play a key role in our application are given as (6) More refined approximations for the sampling distribution of via Edgeworth techniques for non-normal populations [7], [8].

There have been several investigations examining the robustness of the bivariate normality assumptions as it pertains to type I error control for testing H₀: ρ = ρ₀ using Fisher’s z-transformation or the more specific case of H₀: ρ = 0 using Student’s t distribution. One of the earliest of these investigations is due to Pearson [9] who concluded that “the results suggest that the normal bivariate surface can be mutilated and distorted to a remarkable degree without affecting the frequency distribution of r”, which has since been disproved [7]. Havlicek and Peterson [10] also incorrectly concluded “that the Pearson r is insensitive to extreme violations of the basic assumptions of normality.” There has been a substantial number of simulation based studies, primarily in the psychology literature that note when tests about the correlation coefficient, assuming bivariate normality incorrectly, tend to fail [11]–[15].

We will repeat the simulation study of DiCiccio and Romano [16] and re-examine the Type I error control for testing H₀: ρ = ρ₀ based on Fisher’s z-transformation, the classic large sample approximation and our new bootstrap approach. To the best of our knowledge there does not seem to be an extensive simulation study of the straightforward asymptotic approximation above given by using the corresponding moment estimators in place of their population counterparts and setting ρ = ρ₀ under the null. DiCiccio and Romano [16] do include results for the specific case for testing H₀: ρ = 0.

As an alternative to either assuming bivariate normality or using a large sample asymptotic approximation one may consider a permutation test approach for the specific test H₀: ρ = 0 [17]. The key sticking point that is often overlooked by practitioners is that the permutation test is only exact in terms of Type I error control when using the metric ρ to test H₀: F_XY = F_X F_Y. Only under specific distributional assumptions, e.g. bivariate normality or certain families of elliptical distributions, is the permutation test exact for testing H₀: ρ = 0. Heuristic investigations for permutation testing about the correlation coefficient go back several decades [18]. Even recently, Tuǧran [19] make the same mistakes of the past when drawing conclusions regarding the permutation test’s applicability under normality versus non-normality.

It was only until very recently that DiCiccio and Romano [16] provided a careful theoretical treatment of the subject summarizing the methodologies and providing guidance for the most appropriate permutation approach for testing H₀: ρ = 0. In their work, DiCiccio and Romano [16] provide a novel method of obtaining a permutation test that is exact when testing H₀: ρ = 0 is equivalent to testing H₀: F_XY = F_X F_Y and asymptotically controls the Type I error when testing H₀: ρ = 0 is not equivalent to testing H₀: F_XY = F_X F_Y. The key to their approach is standardizing the test statistic using the large sample variance at (5) under H₀: ρ = 0 such that as n → ∞ the quantiles of the studentized test statistic for the permutation distribution and the true sampling distribution “converge almost surely to the corresponding quantiles of the standard normal distribution.” In fact, the specific form of the standardization they utilize for testing H₀: ρ = 0 can be traced back to Tschuprow [20].

As an overall strategy for testing H₀: ρ = 0 the permutation testing approach that appears in DiCiccio and Romano [16] appears to be the most robust strategy to date in terms of controlling Type I error rates across a variety of distributional assumptions. In particular it is clear that the over-reliance on the bivariate normality assumption, when testing H₀: ρ = 0, and utilized in many statistical software packages is not a particularly robust assumption. As noted by R. C. Geary [21]: “Normality is a myth; there never was, and never will be, a normal distribution.” One might extend this quote in an even stronger fashion relative to bivariate normality relative to its existence in real-life data analysis problems and testing about the correlation coefficient.

In this note we build upon the work of DiCiccio and Romano [16] by introducing a bootstrap-like resampling approach for testing the general null hypothesis H₀: ρ = ρ₀, which has virtually the same properties as the permutation approach for testing about H₀: ρ = 0. The general difference between our approach and the approach of DiCiccio and Romano [16] is that the permutation testing approach is essentially a sampling without replacement method under the null hypothesis while our approach is a sampling with replacement approach under the null hypothesis approach. This subtle difference will allow us to develop a more general testing approach for H₀: ρ = ρ₀. Both approaches rely on a properly standardized test statistic based on (5) under H₀: ρ = 0 such that the large sample Type I error control controlled exactly for certain scenarios and in general asymptotically. The test can also be inverted to provide precise confidence interval for ρ as an alternative to the more standard bootstrap confidence interval methods [22].

Materials and methods

The main thrust of the bootstrap correlation test of H₀: ρ = ρ₀ is to generalize the approach of DiCiccio and Romano [16] by noting that we can approximate the distribution function of the sample correlation estimator given F_XY|H₀ using a surrogate distribution function, which we will describe below. Bootstrap samples will be drawn from the surrogate distribution function F_ST|H₀ ≈ F_XY|H₀, which is described in detail below.

In terms ofthe preliminaries let (X₁, Y₁), (X₂, Y₂), ⋯, (X_n, Y_n) be an i.i.d. bivariate sample from an absolutely continuous distribution F_XY with marginal distributions denoted F_X and F_Y. In terms of our application let the 2 × 2 positive definite correlation matrix for the standardized variables be given as as (7) Next, represent the Cholesky decomposition of the p × p matrix Γ as (8) such that A′^-1 is defined. The Cholesky decomposition is a key component of the bootstrap test that we propose given below.

Denote the n × 2 matrix of standardized observations as (U⋮V), where (9) and (10) respectively. Apply the transformation (11) where A is the decomposed matrix at (8), where more specifically stated (12) such that we have transformed observations are given as S_i = U_i and , i = 1, 2, ⋯, n.

Next denote , where Σ_XY is the 5 × 5 variance-covariance matrix for the vector given earlier at (5) and d_XY is the 1 × 5 vector defined at (6). For the transformed variables similarly denote , where Σ_ST is the 5 × 5 variance-covariance matrix for the vector and the vector d_ST is defined similar to d_XY defined at (6) now using the moments based on S and T in place of X and Y.

Under H₀: ρ = ρ₀ the parameter ρ is “known”. The estimator for the variance of will be denoted as (13) where population moments are replaced with their corresponding sample moment counterparts throughout and Σ_XY, and ρ = ρ₀. In a similar fashion denote the estimator for the variance of as (14)

Theorem 2.1 Under H₀: ρ = ρ₀ assume (X₁, Y₁), (X₂, Y₂), ⋯, (X_n, Y_n) are an i.i.d. bivariate sample from an absolutely continuous distribution F_XY. Also, assume , and . Then as n → ∞ (15) The result follows straightforward noting that the sample moments used to calculate the estimator are all averages and as n → ∞ all converge to their constant population counterparts. Hence, under the conditions outlined above as n → ∞. Noting that as per Serfling [6] page 126 that as n → ∞ completes the argument using Slutsky’s Theorem with the conditions and .

Theorem 2.2 Under H₀: ρ = ρ₀ assume (S₁, T₁), (S₂, T₂), ⋯, (S_n, T_n) defined at (11) are an i.i.d. bivariate sample from an absolutely continuous distribution F_ST. Also, assume , , and that μ_X, σ_X, μ_Y and σ_Y are known. Then as n → ∞ (16) The result follows using the same arguments as in Theorem 2.1 given the key feature that μ_X, σ_X, μ_Y and σ_Y are known quantities.

Comment 2.1 Under Theorem 2.1 and Theorem 2.2. we see that both estimators and have the same large sample distributions and their expectations are given as .

Comment 2.2 Relative to Theorem 2.2 we can generate bootstrap samples denoted as the pairs under the null hypothesis H₀: ρ = ρ₀ such that . This is done by first drawing independent bootstrap samples from the marginal distribution functions and and applying an empirical version of the transformation at (11), i.e. replace μ_X, σ_X, μ_Y and σ_Y with their respective sample counterparts. Essentially we are drawing nonparametric bootstrap samples from to estimate the bootstrap distribution of the sample variance for with the scale based on the underlying properties of the distribution F_ST|H₀. We can then rescale based on the sample variance for in order to have an asymptotically valid testing procedure. More specifically we can follow the the steps below to test H₀: ρ = ρ₀ versus H₁: ρ > ρ₀ as an extension of the rescaling ideas put forward by DiCiccio and Romano [16] for testing the specific hypothesis H₀: ρ = 0. Note also that the procedure we propose can be modified easily to handle the alternatives H₁: ρ < ρ₀ and H₁: ρ ≠ ρ₀.

Out testing algorithm now follows these steps:

Calculate the Pearson sample correlation (the subscript is added for notational convenience), which has the same functional form as () given at (2), and its corresponding variance estimate under H₀, given at (13).
Draw samples of size n independently and with replacement from each marginal distribution and and denote the pairs of independently drawn observations as , where and , respectively.
Standardize each observation using the observed sample means and sample standard deviations as .
Denote the n × 2 matrix of standardized bootstrap observations as (U*⋮V*), where and , respectively.
Apply the transformation , where A₀ is the decomposed matrix at (8) replacing ρ with ρ₀, where specifically stated (17) such that we have correlated bootstrap resampled paired values are given as and , i = 1, 2, ⋯, n.
Calculate using the standard sample correlation formula (2) and the corresponding variance estimate at (14) under H₀ using the correlated values , i = 1, 2, ⋯, n, to generate an empirical bootstrap null distribution of observations for a single bootstrap replicate.
Calculate the bootstrap rescaled and bias corrected bootstrap correlation estimate under H₀ as .
Repeat steps 2-7 B times (usually B = 500 or B = 1000 is sufficient).
Calculate the bootstrap estimated p-value as , where I denotes the standard indicator function.

An outline for the large sample validity of the test can be examined via Edgeworth expansion techniques [1], for an overview of Edgeworth expansion methods. Towards this end note that under H₀: ρ = ρ₀ we have (18) and (19) where is defined through the marginal distribution and and the transformation at Step 5 in our algorithm and ϕ(c) and Φ(⋅) are the standard normal density and cumulative distribution functions, respectively. Eqs (18) and (19) yields the expression for the Kolmogorov-Smirnov distance as (20) where the polynomials p_i depend on F_ST in a smooth way. If the distance between between is then expression (20) holds. Again, recall that is defined through the marginal distribution and and the transformation at Step 5 in our algorithm, where using standard techniques it is straightforward to show and and that , s_X → σ_X, and s_Y → σ_Y, where μ_X, σ_X, μ_Y and σ_Y are constants. In other words, the distribution of under the assumptions provided and consistent with the original distribution of as per Theorem 2.1.

Comment 2.3. For the special and important case H₀: ρ = 0 the only practical difference between our approach and the permutation based method of DiCiccio and Romano [16] is that is recalculated per each bootstrap replicate due to sampling with replacement mechanism whereas the permutation value is fixed across all permutations under the null. We do show however in our simulation study that the bootstrap test for H₀: ρ = 0 does appear to have some cases where it works slightly better than the permutation test, likely due to a finer grid of values in the resampling scheme of with replacement as compared to without replacement.

Results

For our simulation study we tested H₀: ρ = ρ₀ versus H₁: ρ > ρ₀ for ρ₀ = 0.0, 0.3 and 0.6 with sample sizes n = 10, 25, 50, 100, 200. Each simulation utilized 10,000 Monte Carlo replications and the bootstrap resampling algorithm used B = 500 resamples. We compared Type I error control between the studentized permutation test of DiCiccio and Romano [16], the new bootstrap test, Fisher’s z-transform and the straight large sample approximation given Eq (5) using sample moment estimators for the variance estimation. The tests examined the Type I error control α = 0.05 as was performed by DiCiccio and Romano [16].

We utilized the same marginal distributions for F_X and F_Y from DiCiccio and Romano [16] page 1218 for the case H₀: ρ = 0 for our simulation study:

Multivariate normal (MVN) with mean zero and identity covariance.
Multivariate t-distribution (Multivariate t₅) with 5 degrees of freedom.
Exponential given as (X, Y) = rS′u where , u is uniformaly distributed on the two dimensional unit circle.
Circular given as the uniform distribution on a two dimensional unit circle.
t_4.1 where X = W + Z and Y = W − Z, where W and Z are i.i.d. t_4.1 random variables.

One can see DiCiccio and Romano [16] for more detail regarding these specific distributions.

For the general case H₀: ρ = ρ₀ we utilized the marginal distributions above, standardized the marginal variables X and Y to mean = 0 and variance = 1 and applied the transformation at (11) under H₀. The results are contained in Tables 1–3.

Download:

Table 1. Rejection Probabilities at α = 0.05 for bootstrap, asymptotic and Fisher’s z tests for H_o: ρ = ρ₀ versus H₁: ρ > ρ₀ using the sample correlation coefficient.

https://doi.org/10.1371/journal.pone.0216287.t001

Download:

Table 2. Rejection Probabilities at α = 0.05 for bootstrap, asymptotic and Fisher’s z tests for H_o: ρ = ρ₀ versus H₁: ρ > ρ₀ using the sample correlation coefficient.

https://doi.org/10.1371/journal.pone.0216287.t002

Download:

Table 3. Rejection Probabilities at α = 0.05 for bootstrap, asymptotic and Fisher’s z tests for H_o: ρ = ρ₀ versus H₁: ρ > ρ₀ using the sample correlation coefficient.

https://doi.org/10.1371/journal.pone.0216287.t003

The first striking point to make is how inflated the Type I error can be for the test based on Fisher’s z-transformation. For the t_4.1 at H₀: ρ = 0.3 we see an estimated Type I error rate of 0.1847 at n − 200. In the other direction the test based on Fisher’s z-transformation can also have much lower than anticipated Type error rate, e.g. see H₀: ρ = 0.3, n = 100 for the circular distribution.

In terms of the straight large sample approximation [6] we see that it performs fairly well across all distributions for samples of size n ≥ 25, but has inflated Type I error rates for n = 10 usually twice the desired α.

For the specific case H₀: ρ₀ = 0 we see that the bootstrap test and permutation test are comparable with the bootstrap test actually subtlety outperforming the permutation test, even for large sample sizes, for certain scenarios. The bootstrap test and asymptotic tests also have similar properties for large sample sizes thus heuristically validating our theoretical arguments from the previous section. The bootstrap test appears to work well across all scenarios in terms of Type I error control and thusly would appear to be a robust method for testing the general hypothesis H₀: ρ = ρ₀.

In Table 4 we compared the power of the bootstrap test versus the exact Student’s t-test under multivariate normality assumptions. We see that the bootstrap test compares favorably to a test based on optimal assumptions and one that is limited by the form of the null hypothesis, i.e. the exact Student t-test for testing H₀: ρ₀ = 0 is specific to this form of the null hypothesis and does not generalize to other values of ρ₀. We see from Table 4 that the exact test has a slight power advantage for small samples (n = 10), which dimensions rapidly as a function of larger sample sizes. A similar analogy would be found comparing the Wilcoxon rank-sum test versus the two-sample t-test in terms of efficiency where the two-sample t-test is only more efficient in general to the Wilcoxon rank-sum test under the strict and often unrealistic assumpion of normality.

Download:

Table 4. Power comparisons at α = 0.05 for bootstrap versus exact Student’s t-test for H_o: ρ = 0 versus H₁: ρ > 0 using the sample correlation coefficient.

https://doi.org/10.1371/journal.pone.0216287.t004

Example

As a straightforward example of our approach we tested H₀: ρ = ρ₀ versus H₁: ρ > ρ₀ for ρ₀ = 0 and ρ₀ = 0.3 using lactate levels measured in the blood and the cerebrospinal fluid (CSF) in 13 female subjects [20]. The data are provided in Table 5. A scatterplot of the paired data is given in Fig 1.

Download:

Fig 1. Scatterplot of blood lactate levels versus CSF lactate levels.

https://doi.org/10.1371/journal.pone.0216287.g001

Download:

Table 5. Example blood and CSF lactate levels on n = 13 female subjects.

https://doi.org/10.1371/journal.pone.0216287.t005

We used SAS 9.4 (SAS Institute Cary, NC) to test the marginal normality of the data using the Shapiro-Wilk test, the multivariate skewness and kurtosis coming from a bivariate normal distribution using the built in Mardia tests and the overall test of bivariate normality using the Henze-Zirkler T test. The results from SAS given in Table 6. We see that none of the tests rejects a given aspect of normality at α = 0.05. We do note that there is low power to do so given n = 13.

Download:

Table 6. Results of normality tests for example data.

https://doi.org/10.1371/journal.pone.0216287.t006

The estimated correlation between blood and CSF lactate levels was For the test H₀: ρ = 0 versus H₁: ρ > 0 the p-values were 0.0198, 0.0398, 0.0770 and 0.0628 for the Fisher’s z transformation based test, the large sample test, the bootstrap test and the studentized permutation test of DiCiccio and Romano [16], respectively. The bootstrap and permutation tests used 5,000 resamples. If were testing at level α = 0.05 there is agreement between the permutation and bootstrap test of not rejecting H₀ while the test based on Fisher’s z transformation and the large sample approximation indicate to reject H₀. Given the small sample size and the results of our simulation study we would recommend that the permutation or bootstrap p-values are likely the more useful calculations in terms of drawing the correct conclusion, particularly given the closeness of the values between the Fisher’s z transformation based p-value and the large sample approximation, which we demonstrated did not control the Type I error in small samples.

For the test H₀: ρ = 0.3 versus H₁: ρ > 0.3 the p-values were 0.149, 0.1399, 0.1792 and not applicable (doesn’t generalize) for the Fisher’s z based test, the large sample test, the bootstrap test and the studentized permutation test of DiCiccio and Romano [16], respectively. All three tests do not reject H₀ at α = 0.05.

Discussion and conclusion

In this note we present a robust bootstrap test with good Type I error control for testing the general hypothesis H₀: ρ = ρ₀. The test was inspired by the studentized permutation given by DiCiccio and Romano [16] for testing H₀: ρ = 0, which was proven to be exact in certain scenarios and asymptotically correct overall. We believe that statistical software packages should employ both the bootstrap test and the studentized permutation given by DiCiccio and Romano [16] as the default tests over tests based on bivariate normal assumptions. In addition, it should be noted that the bootstrap test can be inverted to form a confidence interval about ρ. This will be examined in our future work.

Acknowledgments

This work was supported by Roswell Park Cancer Institute and National Cancer Institute (NCI) grant P30CA016056, National Science Foundation under grant NSF IIS-1514204 and NRG Oncology Statistical and Data Management Center grant U10CA180822 and IOTN Moonshot grant U24CA232979-01. We wish to thank the Academic Editor and two reviewers for their thoughtful critiques, which led to an improved version of this paper.

References

1. Kendall M. and Stuart A. (1979) The Advanced Theory of Statistics. Charles Griffin & Company Limited, Buck, England.
2. Stigler S. M. (1986) The History of Statistics: The Measurement of Uncertainty before 1900. The Belknap Press of Harvard University Press, Cambridge, MA.
3. Pearson K. (1896) Mathematical contributions to the theory of evolution, III: regression, heredity and panmixia. Philosophical Transactions of the Royal Society of London (A) 187 253–318.
- View Article
- Google Scholar
4. Fisher R. A. (1915) Frequency-distribution of the vlaues of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507–521.
- View Article
- Google Scholar
5. “Student” (1908) On the probable error of a correlation coefficient. Biometrika 6 302–310.
- View Article
- Google Scholar
6. Serfling R. J. (1980) Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York, NY.
7. Gayen A. K. (1951) The Frequency Distribution of the Product-Moment Correlation Coefficient in RandomSamples of Any Size Drawn from Non-Normal Universes. Biometrika 38 219–247. pmid:14848124
- View Article
- PubMed/NCBI
- Google Scholar
8. Ogasawara H. (2006) Asymptotic expansion of the sample correlation coefficient under nonnormality. Computational Statistics & Data Analysis 50 891–910.
- View Article
- Google Scholar
9. Pearson E. S. (1929). Some notes on sampling tests with two variables. Biometrika 21:337–360.
- View Article
- Google Scholar
10. Havlicek L. L. and Peterson N. L. (1977) Effect of the Violation of Assumptions Upon Significance Levels of the Pearson r. Psychological Bulletin 84 373–377.
- View Article
- Google Scholar
11. Berry K. J. and Mielke P. W. Jr., (2000) A Monte Carlo investigation of the Fisher Z transformation for normal and nonnormal distributions. Psychological Reports 87 1101–1114. pmid:11272750
- View Article
- PubMed/NCBI
- Google Scholar
12. Bishara A. J. and Hittner J. B. (2012) Testing the significance of a correlation with non-normal data: Comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological Methods 17: 399–417 pmid:22563845
- View Article
- PubMed/NCBI
- Google Scholar
13. Bishara A. J. and Hittner J. B. (2015) Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality. Educational and Psychological Measurement 75 785–804. pmid:29795841
- View Article
- PubMed/NCBI
- Google Scholar
14. Bishara A. J. and Hittner J. B. (2017) Confidence intervals for correlations when data are not normal. Behaviorial Research 49 294–309.
- View Article
- Google Scholar
15. Edgell S. E. and Noon S. M. (1984) Effect of Violation of Normality on the / Test of the Correlation Coefficient. Psychological Bulletin 95 576–583.
- View Article
- Google Scholar
16. DiCiccio C. J. and Romano J. P. (2017) Robust Permutation Tests For Correlation And Regression Coefficients. Journal of the American Statistical Association 112 1211–1220.
- View Article
- Google Scholar
17. Lehmann E. L. (1991) Testing Statistical Hypotheses, 2nd Edition. Wadsworth & Brooks/Cole, Belmont, CA.
18. Hayes A. F. (1996) Permutation Test Is Not Distribution-Free: Testing H₀: ρ = 0. Psychological Methods 1 184–198.
- View Article
- Google Scholar
19. Tuǧran E., Kocak M., Mirtagioǧlu H., Yiǧit S. and Mendes M. (2015) A Simulation Based Comparison of Correlation Coefficients with Regard to Type I Error Rate and Power. Journal of Data Analysis and Information Processing 3 87–101.
- View Article
- Google Scholar
20. Tschuporow A. A. (1925). Grundbegriffe und Grundprobleme der Korrelationstheorie. Leipzig: Teubner (English translation as the Mathematical Theory of Correlation, William Hodge and Co. Ltd., 1939).
21. Geary R. C. (1947) Testing for normality. Biometrika 34 209–242. pmid:18918691
- View Article
- PubMed/NCBI
- Google Scholar
22. Lee W.-C. and Rogers J. L. (1998) Bootstrapping Correlation Coefficients Using Univariate and Bivariate Sampling. Psychological Methods 3 91–103.
- View Article
- Google Scholar
23. Neiberger R. E., George J. C., Perkins L., Theriaque D. W., Hutson A. D., Stacpoole P. W. (2002) Renal Manifestations of Congenital Lactic Acidosis. American Journal of Kidney Diseases 39 12–23. pmid:11774096
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Kendall M. and Stuart A. (1979) The Advanced Theory of Statistics. Charles Griffin & Company Limited, Buck, England.

[ref2] 2. Stigler S. M. (1986) The History of Statistics: The Measurement of Uncertainty before 1900. The Belknap Press of Harvard University Press, Cambridge, MA.

[ref3] 3. Pearson K. (1896) Mathematical contributions to the theory of evolution, III: regression, heredity and panmixia. Philosophical Transactions of the Royal Society of London (A) 187 253–318.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Fisher R. A. (1915) Frequency-distribution of the vlaues of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507–521.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. “Student” (1908) On the probable error of a correlation coefficient. Biometrika 6 302–310.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Serfling R. J. (1980) Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York, NY.

[ref7] 7. Gayen A. K. (1951) The Frequency Distribution of the Product-Moment Correlation Coefficient in RandomSamples of Any Size Drawn from Non-Normal Universes. Biometrika 38 219–247. pmid:14848124
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref8] 8. Ogasawara H. (2006) Asymptotic expansion of the sample correlation coefficient under nonnormality. Computational Statistics & Data Analysis 50 891–910.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref9] 9. Pearson E. S. (1929). Some notes on sampling tests with two variables. Biometrika 21:337–360.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref10] 10. Havlicek L. L. and Peterson N. L. (1977) Effect of the Violation of Assumptions Upon Significance Levels of the Pearson r. Psychological Bulletin 84 373–377.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref11] 11. Berry K. J. and Mielke P. W. Jr., (2000) A Monte Carlo investigation of the Fisher Z transformation for normal and nonnormal distributions. Psychological Reports 87 1101–1114. pmid:11272750
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref12] 12. Bishara A. J. and Hittner J. B. (2012) Testing the significance of a correlation with non-normal data: Comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological Methods 17: 399–417 pmid:22563845
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref13] 13. Bishara A. J. and Hittner J. B. (2015) Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality. Educational and Psychological Measurement 75 785–804. pmid:29795841
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref14] 14. Bishara A. J. and Hittner J. B. (2017) Confidence intervals for correlations when data are not normal. Behaviorial Research 49 294–309.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Edgell S. E. and Noon S. M. (1984) Effect of Violation of Normality on the / Test of the Correlation Coefficient. Psychological Bulletin 95 576–583.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. DiCiccio C. J. and Romano J. P. (2017) Robust Permutation Tests For Correlation And Regression Coefficients. Journal of the American Statistical Association 112 1211–1220.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref17] 17. Lehmann E. L. (1991) Testing Statistical Hypotheses, 2nd Edition. Wadsworth & Brooks/Cole, Belmont, CA.

[ref18] 18. Hayes A. F. (1996) Permutation Test Is Not Distribution-Free: Testing H₀: ρ = 0. Psychological Methods 1 184–198.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref19] 19. Tuǧran E., Kocak M., Mirtagioǧlu H., Yiǧit S. and Mendes M. (2015) A Simulation Based Comparison of Correlation Coefficients with Regard to Type I Error Rate and Power. Journal of Data Analysis and Information Processing 3 87–101.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref20] 20. Tschuporow A. A. (1925). Grundbegriffe und Grundprobleme der Korrelationstheorie. Leipzig: Teubner (English translation as the Mathematical Theory of Correlation, William Hodge and Co. Ltd., 1939).

[ref21] 21. Geary R. C. (1947) Testing for normality. Biometrika 34 209–242. pmid:18918691
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref22] 22. Lee W.-C. and Rogers J. L. (1998) Bootstrapping Correlation Coefficients Using Univariate and Bivariate Sampling. Psychological Methods 3 91–103.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref23] 23. Neiberger R. E., George J. C., Perkins L., Theriaque D. W., Hutson A. D., Stacpoole P. W. (2002) Renal Manifestations of Congenital Lactic Acidosis. American Journal of Kidney Diseases 39 12–23. pmid:11774096
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Results

Example

Discussion and conclusion

Acknowledgments

References