Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

Xinge Jessie Jeng; Zhongyin John Daye; Wenbin Lu; Jung-Ying Tzeng

doi:10.1371/journal.pcbi.1004993

Abstract

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.

Author Summary

Next-generation sequencing technologies have allowed genetic association studies of complex traits at the single base-pair resolution, where most genetic variants have extremely low mutation frequencies. These rare variants have been the focus of modern statistical-computational genomics due to their potential to explain missing disease heritability. The identification of individual rare variants associated with diseases can provide new biological insights and enable the precise delineation of disease mechanisms. However, due to the extreme rarity of mutations and large numbers of variants, significances of causative variants tend to be mixed inseparably with a few noncausative ones, and standard multiple testing procedures controlling for false positives fail to provide a meaningful way to include a large proportion of the causative variants. To address the challenge of detecting weak biological signals, we propose a novel statistical procedure, based on false-negative control, to provide a practical approach for variant inclusion in large-scale sequencing studies. By determining those variants that can be confidently dispatched as noncausative, the proposed procedure offers an objective selection of a modest number of potentially causative variants at the single-locus level. Results can be further prioritized or used to infer disease-associated genes with annotation information.

Citation: Jeng XJ, Daye ZJ, Lu W, Tzeng J-Y (2016) Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level. PLoS Comput Biol 12(6): e1004993. https://doi.org/10.1371/journal.pcbi.1004993

Editor: Predrag Radivojac, Indiana University, UNITED STATES

Received: October 20, 2015; Accepted: May 21, 2016; Published: June 29, 2016

Copyright: © 2016 Jeng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: (1) Simulated data: the relevant R code are submitted in the Supporting Information (S1 File). (2) CoLaus data: The CoLaus data can be applied from the database of Genotypes and Phenotypes (dbGaP) (dbGaP Study Accession: phs000145.v4.p2) at NCBI website: http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000145.v4.p2.

Funding: This work was supported by the National Institutes of Health (http://www.nih.gov/) grants P01 CA142538 (to WL, JYT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This is a PLOS Computational Biology Methods paper.

Introduction

Recent advances in next-generation sequencing (NGS) technologies have extended the focus of genetic studies of complex traits from that of common to rare variants. Having low minor allele frequencies (MAFs), usually defined to be less than 1% to 5%, rare variants are often evolved from recent mutations that have not yet been subjected to the pruning mechanism of natural selection and can potentially retain a larger proportion of inheritable variability than common variants. [1–5] Recent studies have already implicated the relevance of rare variants on several complex traits. [6–13]

Despite its potential to uncover genetic factors contributing to missing disease heritability, the analysis of rare variants association studies bears fundamental challenges. As only a small proportion of samples may carry variant alleles at each locus, associations of individual rare variants are often underpowered. [1, 14, 15] Moreover, the number of candidate variants can be extremely large in high-throughput sequencing studies, in which available multiple testing strategies may impose excessively severe corrections, preventing the selection of potentially causal variants. [16]

Recent proposals for rare variants association analysis often resort to collapsing or pooling multiple variants in a gene or pathway. Examples include the combined multivariate collapsing (CMC) [17], cohort allelic sum (CAST) [18], C-alpha [19], sum of squared scores [20–23], sequence kernel association (SKAT) [24], quality-weighted multivariate score association (qMSAT) [25], and similarity-based regression (simReg) [26] tests. The strategy increases power by aggregating effects of low-frequency variants and decreasing data dimension in multiple testing. It has been successfully applied in several applications that identified functional regions that may contain potentially relevant rare variants. [17–20, 23–26]

Nonetheless, variants-pooling tests that aggregate over a gene or pathway do not provide information at the individual locus and are ill-equipped to tap the full potential of NGS data in identifying causative mutations at the single-nucleotide resolution. Pinpointing potentially causal variants is a critical goal of genomic studies because such information would faciliate precise delineations of molecular mechanisms and functions of genetic factors on diseases. [27] Moreover, studies have shown that pooling over multiple variants may result in reduced power, as the inclusion of many noncausal variants may dilute the effects of relevant variants on a trait. [28–30] Thus, pooling over multiple variants can sometimes be inadequate for the identification of functional genomic regions.

On the other hand, analysis of individual rare variants can provide practical advantages. Information of single-variant association can be used to pinpoint a small number of potentially causal variants for follow-up studies to facilitate the precise characterization of functions via molecular modeling and genetic experimentation, which are often too expensive and time consuming to conduct for all variants in a gene. [27] Further, single-variant results can be utilized a posteriori to objectively infer disease-related genes or pathways by comparing with annotation and functional databases. [31–34] This is useful as gene-level results can oftentimes be uninformative when the significance of a few causal variants are diluted by a large number of noncausal ones in the same gene. In the Results section, we will illustrate both strategies for applying single-variant results using the CoLaus data set.

Genome-wide association (GWA) studies, as the pre-eminent means for genetic discovery over the last decade, have largely relied on statistical genomic tools that can identify common variants at the individual single-nucleotide polymorphism (SNP) level. [35] Standard procedures for GWA studies evaluate each variant individually. [36, 37] Potentially causal variants are identified by multiple-testing control on significances at each locus. The simplest strategy for multiple testing utilizes the Bonferroni correction that controls family-wise error rate, or the probability of having one or more false positives. [38] However, the Bonferroni correction can often be too conservative for GWA studies under the presence of thousands of SNPs. [39] To address this issue, the false discovery rate (FDR) is often utilized that provides a more liberal criterion by controlling the expected proportion instead of the presence of false positives. [40–42]

Despite being extremely successful for common variants in GWA studies [43–46], procedures based on false-positive control are often underpowered in NGS studies involving rare variants (as illustrated in Fig 1). New approaches are needed to provide a meaningful way for powerful variants selection in large-scale sequencing studies. Fig 1 compares the statistical landscape of rare variants analysis in NGS studies with that of common variants in GWA studies. In GWA studies, we observe three regions of statistical inference: the Signals (“S”) region where strongly associated variants can be readily identified by controlling false positives, the Noise (“N”) region where noncausal variants can be identified by controlling false negatives, and the indistinguishable (“I”) region where causal and noncausal variants are inextricably mixed. [47, 48] We have recently developed theoretical characterizations for the three regions in high-dimensional data analysis. [49] In NGS studies with rare variants, the Signals region tends to be very narrow and can often degenerate due to extremely low MAF and high dimensionality. Consequently, few causal variants can be identified by evaluating false positives, and results can be very unstable due to random perturbations of noncausal variants.

Download:

Fig 1. Illustrations of regions of statistical inference for GWA and NGS studies.

The Signals (“S”), Indistinguishable (“I”), and Noise (“N”) regions are shown. False-positive control allows the selection of variants in the Signals region, whereas false-negative control selects from both the Signals and Indistinguishable regions. In NGS studies with rare variants, the Signals region often degenerates due to extremely low MAF and high dimensionality.

https://doi.org/10.1371/journal.pcbi.1004993.g001

To address the challenge of rare variants association analysis at the single-locus level, we propose the Adaptive False-Negative Control (AFNC) procedure in order to allow a large proportion of causal variants to be retained with high probability. Specifically, the AFNC applies a novel metric called the signal missing rate (Eq 2), defined as the probability of having a nontrivial proportion of false negatives among all causal variants (i.e., FN/s in Table 1), to achieve informative variant selection by controlling the signal missing rate to be small (see Methods section). That is, AFNC seeks to determine those variants that can be confidently dispatched as noncausal and identifies variants from both the Signals and Indistinguishable regions. The results can provide informative inference in NGS studies where the Signals region is very small or degenerate (Fig 1).

Download:

Table 1. Classifications of variants under multiple testing control.

https://doi.org/10.1371/journal.pcbi.1004993.t001

We note that this is quite different from classical methods that control false positives. For example, the Bonferroni controls for the presence of any false positives (i.e., FP ≥ 1), whereas the FDR controls for the expectation of the proportion FP/R when R > 0 (see Table 1). Neither of these involve the number of causal variants s; thus, they cannot be used for controlling the proportion of causal variants selected. On the other hand, the AFNC, based on the proportion FN/s or 1 − TP/s, allows powerful variants selection by controlling the type II error or 1 − statistical power. Although there may exist a corresponding control level for the FDR (albeit very large) that can include the variants selected by the AFNC at a given false-negative control level (see Results section), this corresponding FDR control level is not known a priori and is expected to vary haphazardly across different studies. An arbitrarily assigned FDR control level would be inefficient for controlling false negatives in NGS studies, that can over- or under-select uncontrollably depending on the size of the Noise region. A corresponding control level usually does not exist for the stringent Bonferroni selection in large-scale sequencing studies (see Results section).

The AFNC provides a general framework that can accommodate for a wide spectrum of models and test statistics, that may include biological prior knowledge and global genotype information (see Methods section). Moreover, it readily adapts to the quality of statistical tests employed. With improved quality of statistical tests, the Indistinguishable region (see Fig 1) narrows, and the AFNC can, in turn, select a smaller set of potentially causal variants. Extensive studies (see Results section) demonstrate that the AFNC can identify a modest number of potentially causal variants while avoiding a deluge of noncausal ones for follow-up analyses that focus on targeted variants. Our proposal employs recent developments in ultra high-dimensional statistical inference to derive a data-driven procedure that can readily adapt to the underlying sparsity and effect sizes of the data. [50–53] It readily controls type I error rates (see Results section). In addition, it is computationally very efficient and can be applicable for whole-genome sequencing (WGS) and whole-exome sequencing (WES) studies.

Results

The AFNC provides a general framework for including a high proportion of causal variants. It can accommodate for a spectrum of models and significance tests. The procedure (detailed in the Methods section) consists of three major steps: (i) based on a given model and significance test, obtain the test statistics and their p-values for each of the d variants and order them, (ii) estimate the signal proportion among the d variants (denoted by ) using Eq 4, and (iii) compute the AFNC cut-off position by controlling the signal missing rate at level β using Eq 3 and report the top variants as potentially causal. The AFNC is designed to allow researchers to select a modest number of potential variants while encompassing the causal ones with high confidence. Below we use simulation studies and data applications to illustrate the utility of AFNC.

Simulation studies

Simulation designs.

We obtained 10,000 haplotypes for a 25Mb region simulated by COSI 1.2 (http://www.broadinstitute.org/~sfs/cosi) according to a coalescent model that emulates the linkage-disequilibrium (LD) pattern and history of the European population using default parameters. [54] For each subject i, i = 1, ⋯, n, we randomly drew two haplotypes with replacement from the 10,000 haplotypes to form its genotypes G_ij across variants j = 1, ⋯, d, where we assumed an additive genetic model such that G_ij ∈ {0, 1, 2} is the number of minor alleles at locus j. For an experiment with sample size n, we focused on evaluating rare variants with , where the threshold was derived from statistical theory and has been employed in providing principled demarcations of rare and common variants in recent literature. [52, 53, 55] It incorporates sample-size information of individual experiments to determine if a variant is rare. For example, a variant with 1% MAF will be considered rare in an experiment when n = 2000 and common when n = 10,000. There were at least 250,000 numbers of rare variants with for randomly generated data at sample sizes n = 1000, 2500, 5000, 7500, and 10,000. These variants were truncated to obtain subsets of the data with different numbers of total variants d in various simulation scenarios. We randomly generated phenotypes in each experiment from the Normal distribution , where s is the number of causal variants, A_j is the effect size of the jth locus, and σ is the noise level fixed at 1. We selected the first s variants as causal so that the causal variants in different simulation scenarios are nested. As in previous studies, we set the effect sizes A_j = C⋅|log₁₀(MAF_j)| for variants j = 1, …, s and 0 otherwise. [24] Thus, a continuum of effect sizes can be shown by varying the effect-size multiplier C.

The AFNC was compared with the Bonferroni and FDR controls, which are the most commonly used procedures for adjusting multiplicity in genomic studies. Bonferroni controls the family-wise type I error [38], whereas FDR controls the expected proportion of false positives among all discoveries [41]. Both essentially focus on the control of false positives with FDR being less stringent than the Bonferroni. The Bonferroni and FDR threshold levels were both set at 0.05. The AFNC threshold levels were set at a false-positive rate of α = 0.05 and a false-negative rate of β = 0.1. When estimating π in Step (ii) of AFNC (Eq 4), the c_d values, obtained from Eq 5, are 0.0488, 0.0305, 0.0150, and 0.0095 for d = 10,000, 25,000, 100,000, and 250,000, respectively, based on M = 10,000 randomly generated samples under the global null hypothesis of no causal variants.

For succinct presentation, we compared the AFNC with the Bonferroni and FDR using the Wald test. In the following, we illustrate that the AFNC can perform well, even though significance rankings based on the Wald test may not be optimal. Performances were comprehensively evaluated via sensitivity, specificity, and g-measure [56], and success rates of inclusion of a high proportion of causal rare variants. Sensitivity is defined as the proportion of causal variants that were correctly identified and provides the empirical power for s > 0 causal variants. Specificity is the proportion of noncausal variants that were correctly rejected. Under the global null hypothesis when all variants are noncausal (i.e., C = 0), the empirical type I error rate or false-positive rate is defined as 1—specificity. The g-measure, defined as , is a composite performance measure of overall variant selection. [56, 57] A g-measure close to 1 indicates accurate variant selection, and a g-measure close to 0 implies that few causal variants or too many noncausal ones are selected, or both. Each experimental scenario was randomly simulated 100 times. Median results are shown for sensitivity, specificity, and g-measure, whereas success rates of inclusion of at least a given proportion of causal variants were computed based on the 100 repetitions.

Comparison across different effect sizes and numbers of variants.

We evaluated performances across varying numbers of total variants d and effect-size multipliers C. We considered s = 50 variants, which are causal when C ≠ 0. Experiments were conducted with n = 2000 number of samples.

Fig 2 presents results of sensitivity, specificity, and g-measure. The AFNC consistently dominates the FDR and Bonferroni across numbers of variants d and effect-size multipliers C in terms of sensitivity or empirical power for C ≠ 0. Success rates of including at least a given proportion of the s causal variants are presented in S1 Fig. AFNC successfully selects at least 75% of causal variants when C is relativley large, whereas FDR and Bonferroni usually cannot select a large proportion of causal variants, especially for d large. In fact, the Bonferroni fails to select more than 75% of causal variants in all scenarios. This suggests the advantage of considering false-negative control procedures over false-positive ones for including causal rare variants.

Download:

Fig 2. Comparisons across varying effect sizes and numbers of variants at s = 50.

Performance of AFNC, FDR, and Bonferroni is evaluated in terms of sensitivity, specificity, and g-measure. Results are shown for s = 50 number of causal variants when C ≠ 0 and n = 2000 number of samples.

https://doi.org/10.1371/journal.pcbi.1004993.g002

AFNC underperforms the Bonferroni and FDR in terms of specificity in Fig 2. Nonetheless, AFNC consistently dominates the Bonferroni and FDR in terms of overall performances with the g-measure, especially at d large. This suggests that the AFNC can improve overall variant-selection performance in large-scale sequencing studies. Specifically, the AFNC, at the cost of mildly increased but controlled false positives, provides dramatic reduction in the number of candidate variants while retaining a high proportion of causal ones for follow-up analysis. However, variant screening with the AFNC comes with a cost. Although AFNC selects a small proportion of variants, the actual number of selected variants can be large in high dimensions, which can result in severely lower precision (i.e., the proportion of true positives among those selected, TP/R) compared with the Bonferroni and FDR.

Table 2 presents empirical type I error rates at the global null hypothesis C = 0 when no variants are causal. The AFNC is shown to control type I error rates well at below α = 0.05. This is due to the adaptivity of the AFNC procedure that allows it to accommodate for varying proportions of causal variants (see Methods section). On the other hand, Bonferroni and FDR have type I error rates at 0, suggesting them to be much too conservative for rare-variant association studies.

Download:

Table 2. Empirical type I error rates across varying numbers of variants.

https://doi.org/10.1371/journal.pcbi.1004993.t002

We repeated the same evaluation with s = 25 variants, which are causal when C ≠ 0. Results are presented in S2 Fig (for sensitivity, specificity, and g-measure) and S3 Fig (for success rates of inclusion). The relative performance among AFNC, FDR, and Bonferroni is similar to what is observed for s = 50.

Comparison across different sample sizes and numbers of causal variants.

We compared performances across different sample sizes n and numbers of causal variants s. An effect-size multiplier C = 0.5 is considered at d = 100,000 total number of variants.

Fig 3 shows that the AFNC consistently outperforms the FDR and Bonferroni across numbers of causal variants s and sample sizes n in terms of sensitivity or empirical power. Success rates of inclusion are shown in S4 Fig, where the AFNC can select at least 75% of causal variants for sample size n large. The FDR and Bonferroni usually select a small proportion of causal variants with the Bonferroni consistently selecting less than 50% of causal variants in nearly all scenarios. Due to low MAFs, selection of causal variants is more difficult for rare variants at small sample sizes. For example, at n ≤ 2500, the procedures usually cannot identify more than 90% of all causal variants. Fig 3 shows that the AFNC dominates the FDR and Bonferroni for overall variant selection in terms of g-measure with underperformance in terms of specificity. Moreover, S1 Table presents empirical type I error rates at varying sample sizes n, where the AFNC is shown to control type I error rates at 0.05 while the FDR and Bonferroni are overwhelmingly over-conservative with type I error rates at 0. S5 and S6 Figs further present results at C = 0.25, where the AFNC is shown to be even more advantageous at smaller effect sizes.

Download:

Fig 3. Comparisons across varying sample sizes and numbers of causal variants at C = 0.5.

Performance of AFNC, FDR, and Bonferroni is evaluated in terms of sensitivity, specificity, and g-measure. Results are shown for the effect-size multiplier C = 0.5 and d = 100,000 number of variants.

https://doi.org/10.1371/journal.pcbi.1004993.g003

Analysis of CoLaus cardiovascular diseases dataset

We considered the Cohorte Latusannoise (CoLaus) sequence study [58–61], where almost 6000 unrelated Caucasian residents of Lausanne, Switzerland were assessed for risk factors of cardiovascular diseases (CVD). Targeted sequencing genotypes on 202 drug-targeted genes (human genome build 36) were obtained for n = 1769 of these subjects. Cholesterol levels were collected for each subject to evaluate risk of CVD, along with 12 clinical factors—age, gender, and 10 ethnicity covariates using the first 10 principal components [62]. We considered d = 9665 autosomal rare variants from the sequencing study with .

For each variant, t-statistic was obtained by linear association with log cholesterol levels as the response while adjusting for the 12 clinical covariates. The AFNC, FDR, and Bonferroni were, then, applied on significances of t-statistics to identify potentially causal variants. At threshold levels of 0.05, Bonferroni and FDR only identified 4 variants. At α = 0.05 and β = 0.1, AFNC identified 56 candidate rare variants. The AFNC algorithm obtained c_d = 0.0494 based on M = 10,000 randomly generated samples under the global null of no causal variants and (Eqs 4 and 5). As CVD tends to be influenced by multiple factors [63, 64] and the study focused on genes having clinical relevance, one expects a larger number of causal variants than those identified by the FDR and Bonferroni. Our estimated number of signals, , suggests that at least 18 variants need to be selected, and potentially much more due to signals dispersed in the Indistinguishable region, to encompass a high proportion of causal variants. That is, false-positive control procedures can be much too conservative in NGS studies, where the Signals region tends to be degenerate (see Fig 1). In the following, we illustrate potential applications of the AFNC for pinpointing individual variants in candidate genes and inferring disease-related genes with annotation information.

Pinpointing individual variants in candidate genes for follow-up analysis.

To obtain a set of candidate genes, we conducted gene-based analysis using the SKAT with the linear kernel and variant weights 1/MAF. [24] The SKAT performs gene-level analyses via variance component test. The SKAT with the linear kernel is equivalent to the SimReg [26] and the sum of squared scores [20–23] tests. Gene-based analysis did not identify any significant gene when controlling the FDR at the 0.05 threshold level. For illustrative purposes, we focused on the top 5 genes (APH1A, TRPM8, SLC10A2, SP110, SIRT6) with gene-set p-values <0.01. These genes have been related to CVD in the literature. [65–74]

Table 3 presents variants selected in the top 5 candidate genes by the AFNC, along with their p-values and annotation information. The Bonferroni and FDR only selected 2 variants, chr1_148504677 from APH1A and chr2_234559154 from TRPM8. They did not identify any variant from SP110 and SIRT6. Both are relevant genes, where SP110 has been associated with venous obstruction [67] and SIRT6 has been known for its therapeutic potential towards the prevention of CVD [72–74]. Moreover, TRPM8, from which the FDR and Bonferroni only identified a single variant, regulates functions of the pulmonary artery via complex systems. [68–70] No individual variants were selected from SLC10A2, whose most significant variant has a p-value of 6.32 × 10⁻³.

Download:

Table 3. Annotation of AFNC-selected variants of candidate genes in the analysis of CoLaus data.

https://doi.org/10.1371/journal.pcbi.1004993.t003

The AFNC, based on global hypothesis tests, provides an objective selection of a modest number of potentially causal variants at the single-locus level. Investigators may further prioritize variants using annotation information. For example, in Table 3, one may first target variants at non-synonymous coding and splice sites that can disrupt protein structures before analyzing 3’/5’ UTR and downstream/upstream variants that may regulate gene expression. [75] Synonymous coding and intron variants may also impact gene expression, protein folding, and fitness. [76–78] Nonetheless, they are usually considered as low-priority and may represent irrelevant variants that were mixed indistinguishably with the causal ones due to extremely low MAF and high dimensionality.

Inferring disease-related genes with single-variant results.

Gene-based analysis using variants pooling can sometimes result in limited power due to the inclusion of many noncausal variants. For example, gene-set analysis using the SKAT did not identify any candidate genes in this study when controlling the FDR at the 0.05 level on gene-set p-values for risk of CVD, a multifaceted disease. To provide an alternative approach, we consider the utilization of single-variant results to infer candidate genes. Specifically, among the 56 AFNC variants, we further focus on non-synonymous and splice-site variants that are often considered as prime candidates for causal variants due to their capacity to influence protein coding and structure. [75] Table 4 presents non-synonymous and splice-site variants selected. The Bonferroni and FDR only selected a single variant, chr2_234559154 from TRPM8, whereas the AFNC selected 16 variants from 14 genes. The number of non-synonymous and splice-site variants selected by AFNC is at the same magnitude as our estimated number of causal variants . SP110 and TRPM8, that contain 2 AFNC-selected non-synonymous and splice-site variants, have been related to venous obstruction [67] and pulmonary functions [68–70], respectively. Moreover, genes with a AFNC-selected non-synonymous or splice-site variant have been associated with CVD (BRD2 [79], CNR2 [80–82], KCNN4 [83–86], MME [87, 88], NLRP1 [89], SDHB [90], TACR3 [91, 92], TNNI3K [93–95]) or related conditions, such as diabetes (CLEC16A [96]), obesity (OPRM1 [97, 98]), chronic obstructive pulmonary disease (PDE4A [99–102]), and diabetic peripheral neuropathy (SCN9A [103, 104]). The full annotation of FDR- and AFNC-selected variants are shown in S2 Table.

Download:

Table 4. Annotation of AFNC-selected non-synonymous and splice-site variants in the analysis of CoLaus data.

https://doi.org/10.1371/journal.pcbi.1004993.t004

Comparison with Bonferroni and FDR at varying control levels.

Table 5 presents numbers of variants selected by the Bonferroni and FDR at different control levels. The Bonferroni, based on the stringent family-wise type I error rate, cannot select more than 10 variants even at the maximum control level of 1. That is, when more than 10 variants are selected, a false positive will almost surely be included with probability 1. In this particular analysis, FDR at the 0.55 control level can select the 56 variants obtained by the AFNC at α = 0.05 and β = 0.1. However, we note that the FDR control level corresponding to the AFNC is not invariant and can vary dramatically across different studies. Intuitively, a larger (or smaller) FDR control level would be needed when the Indistinguishable region is larger (or smaller) (see Fig 1), and this cannot be determined a priori.

Download:

Table 5. Number of variants selected in the analysis of CoLaus data at different control levels.

https://doi.org/10.1371/journal.pcbi.1004993.t005

Discussion

We have proposed a novel bioinformatic approach that allows the identification of individual rare variants in large-scale sequencing association studies. Extensive studies based on simulated data generated with COSI at realistic population parameters have been used to compare our method with the Bonferroni and FDR across various scenarios. [54] Results have suggested that the AFNC can provide informative variant selection by including a large proportion of causal variants while avoiding a deluge of noncausal ones. On the other hand, the Bonferroni and FDR are shown to be excessively over-conservative under extremely low MAFs and high dimensionality. Analyses of the CoLaus dataset for cardiovascular diseases using the AFNC have pinpointed individual variants most responsible for explaining significances of genes identified in gene-level aggregation tests. Moreover, single-variant results have been successfully applied to objectively infer potentially relevant genes when cross-referenced with annotation information. The R package ‘AFNC’ for performing the AFNC is publicly and freely available at https://github.com/zjdaye/AFNC or http://sites.google.com/site/zhongyindaye/software.

The AFNC provides a unified framework to accommodate for a wide spectrum of models, test statistics, and data scenarios. To achieve a succinct presentation, we focused on quantitative traits using p-values obtained from linear association tests in this paper. The AFNC can be easily adopted for case-control studies [23–25, 105], family-structured data [106, 107], and many other scenarios. Moreover, empirical p-values, as from permutation or bootstrap, can be employed for improved significance ranking. [108] Clearly, performance results of the AFNC using p-values based on associations with quantitative traits, shown in this paper, can be extended to those obtained under a spectrum of models and data scenarios. Moreover, the analysis of large-scale genomic data is a dynamic and fast-evolving field. The AFNC, that readily adapts to the quality of statistical tests employed, will be able to provide increasingly efficient inclusion of causal variants as ever more accurate and computationally efficient means for assessing significances are developed.

A few very recent works have sought to identify individual rare variants by incorporating prior-knowledge information in statistical inference. [109, 110] These methods typically upweight individual variants predicted to be most likely to be causal based on prior GWA studies, functional annotation, sequence conservation, and other computational means. The AFNC can be readily utilized with models and test statistics that incorporate biological prior knowledge. In the Results section, we illustrated an alternative way to incorporate this bioinformatic knowledge. Specifically, we started with an agnostic interrogation of each variant and obtained a set of statistically promising variants using AFNC. We then compared the selected variants with prior-knowledge information to allow investigators to form educated hypothesis in designing follow-up studies. Statistically promising variants, that are selected objectively by AFNC, can also be explored in follow-up studies without comparing with annotation information, such as when prior knowledge is not available for novel variants or believed to be inaccurate.

Due to extremely low MAFs, rare variants do not usually exhibit strong linkage disequilibrium. [1, 111] Thus, we designed the AFNC for rare variants association studies, in which dependence among test statistics is assumed to be weak. The AFNC procedure is also applicable in the situation when causal variants are dependent, but noncausal variants are independent. [112] In other applications where noncausal genetic factors are expected to be strongly dependent, the AFNC procedure can be adapted to account for arbitrary dependence using several recent techniques for multiple testing. [113, 114]

One potential limitation of AFNC is that it may underperform when the signal intensity of the causal variants is too low. The signal intensity of a causal variant depends on the effect sizes and sample size. As shown in Figs 2 and 3, the sensitivity of AFNC deteriorates as effect size or sample size becomes smaller. Indeed, low effect sizes and small sample size are fatal limitations to all methods. In single-variant analysis of rare variants, such challenges may arise from identifying the extremely rare causal variants (e.g., singletons in the data). Although effect size is believed to be high for rare causal variants, the overall signal intensity may still be low given the extremely low sample size. Under this scenario, gene-based tests coupled with functional annotation would have better potential to identify these causal variants. Therefore, gene-based tests, functional annotation and AFNC should be used in an integrated fashion in the detection of rare causal variants: as we have illustrated in our analysis of the CoLaus data, AFNC coupled with gene-based tests can help to pinpoint potential causal variants that lead to gene-level significance; AFNC coupled with functional annotation can help to identify causal genes that are insignificant at gene level due to a few causal variants mixed with a large number of noncausal variants; finally, gene-based tests coupled with functional annotation can facilitate the identification of extremely rare causal variants.

Recent developments in the multiple testing literature have introduced the false nondiscovery rate (Fndr). [115–117] We note that this is quite different from the AFNC control procedure. The Fndr controls for the expectation of the proportion FN/(d − R), which do not involve the number of causal variants s (see Table 1). Moreover, this is not a sensitive measure and will be very close to zero in large-scale NGS studies, as the number of variants that are not selected d − R will be very large. On the other hand, the AFNC, based on the proportion FN/s, allows robust variants selection in large-scale sequencing studies, as the number of causal variants s is expected to be small and the proportion FN/s is receptive to changes in the number of false negatives. In S7 Fig, we compared the AFNC with the Fndr at a threshold level of β = 0.1. Results suggest that the AFNC dominates the Fndr in terms of overall performances of g-measure and the Fndr performs poorly in terms of specificity.

Innovative technological advances have imposed new bioinformatic and statistical challenges by introducing genomic data at ever increasing resolution and dimensions. The proliferation of GWA studies in the last decade has largely led to the development and adaptation of the FDR as a conventional genomic tool. [42–46] In this paper, we introduced the AFNC to enable the identification of rare variants in large-scale sequencing studies. It is computationally efficient for applications in WGS and WES studies and can provide informative results for investigators charged with the task of analyzing large-scale sequencing studies.

Methods

Adaptive false-negative control of individual rare variants

The proposed procedure is general and can accommodate a spectrum of models and significance tests. Suppose that we have test statistics for each variant T₁(G, Z), T₂(G, Z), …, T_d(G, Z) based on i = 1, 2, …, n subjects, such that G = {G_ij} is a matrix of vectors of genotypes across all variants j = 1, 2, …, d and Z = {Z_ik} is a matrix of vectors of additional covariates across various clinical factors and prior biological knowledge k = 1, …, K. Examples for T_j(G, Z) include the classical t-test statistic that depends only on genotypes of the jth variant and the local FDR statistic that utilizes genotypes across all variants in an empirical Bayes construction. [108] Further, prior knowledge from functional annotation can be incorporated, such as by using a generalized linear mixed-effects model. [110] We assume that the test statistic T_j(G, Z) for j = 1, 2, …, d is drawn from the mixture distribution (1) where π = s/d is the signal proportion, s is the number of causal variants, F₀ is the null distribution of T_j(G, Z) when the jth variant is noncausal, and F₁ is the alternative distribution when the jth variant is causal. [52, 53, 118] Let T₍₁₎(G, Z) ≥ T₍₂₎(G, Z) ≥ … ≥ T_(d)(G, Z) be the ordered test statistics at decreasing significances.

To evaluate false negatives in NGS studies, we introduce the signal missing rate (SMR) for selecting the top j ranked variants as (2) where FN(j) is the number of causal variants missed by selecting the top j ranked variants and ϵ > 0 is a small constant. The SMR can be interpreted as the probability of neglecting at least a small proportion of causal variants among the top j ranked variants. By controlling the SMR, potentially causal variants can be included from both the Signals and Indistinguishable regions while dispatching with a very large number of irrelevant variants in the Noise region (see Fig 1). Compared to another possible measure of false negatives, P(FN(j)>0), SMR provides a more liberal control as it allows some, instead of zero, false negatives. SMR is also substantially different from the control of false nondiscovery rate (Fndr), which is an analog of FDR in terms of false negatives. Fndr is defined as the expectation of the proportion of false negatives among the accepted null hypotheses. [115, 119] In the analysis of data with extremely high dimensions and relatively small number of causal variants, Fndr is very close to zero and hence not an informative measure.

To provide informative analysis of rare variants in NGS studies, we propose the false-negative control screening (AFNC) procedure as follows.

Obtain ordered p-values from the test statistics T₍₁₎(G, Z) ≥ T₍₂₎(G, Z) ≥ … ≥ T_(d)(G, Z) such that p₍₁₎ ≤ p₍₂₎ ≤ … ≤ p_(d).
Compute an estimate of the signal proportion and compute .
Retain the top variants with (3) where is the inverse cumulative distribution function of the jth ordered p-value among the null (i.e., noncausal) variants; follows the Beta distribution with parameters j and ; t_α is the cut-off position of the Bonferroni procedure at α significance level, and β is a pre-fixed level for controlling the signal missing rate. We set α and β at conventional levels of 0.05 and 0.1, respectively, in this paper. Smaller value of β corresponds to more stringent control on false negatives.

Step 2. Estimating π.

To estimate the signal proportion π in Step 2, we employ the efficient estimator [50], based on empirical processes of p-values, (4) where 0 < c₀ ≤ 1 is pre-fixed to accelerate the algorithm for large d by searching through only c₀ fraction of the ranked variants. Conceptually, Eq (4) seeks for the largest difference between the observed, ordered p-value (i.e., p_(j)) and the expected quantile under the global null (i.e., j/d). The largest difference typically occurs among the top proportion of the ranked p-values as causal variants tend to have small p-values. To ensure that we look through sufficient amount of top c₀ d ordered variants (and hence the speed-up will have little impact on the results), we set a sufficiently large value for c₀ d, i.e., at least 5000 or d/10, or equivalently, c₀ d = max{5000, d/10}. The quantity c_d > 0 is pre-computed empirically to control the Type I error rate under the global null hypothesis that no causal variants exist. Specifically, we randomly simulate M sets of p-values, , from the uniform distribution under the global null hypothesis for m = 1, 2, …, M. For set m, we order the p-values to obtain , standardize them, and compute V_m by taking the maximum, i.e., (5) Then, c_d is obtained as the (1 − α) quantile of the extreme values V_m’s. Estimation of the signal proportion has been rigorously evaluated in the statistical literature. [50, 51, 120] In particular, under high dimensionality, statistical consistency of the estimator in Eq 4 does not depend on strict statistical normality assumptions and can be expected to perform well even when the proportion of causal variants π is very small. [50] It readily adapts to the underlying sparsity of the data in large-scale association studies.

Step 3. Obtaining the AFNC cut point .

The AFNC procedure evaluates statistical significance along the ordered p-values and retains the top variants of Eq 3 as important variants. When , Eq 3 simplifies to (which is ≤t_α). In this case, if , the Bonferroni cut-off position t_α already encompasses the estimated number of causal variants. Such scenarios occur when the effect sizes are so strong that the Indistinguishable region degenerates in Fig 1 and nearly all causal variants can be identified in the Signals region. If , all variants are expected to be noncausal, which occurs under the global null hypothesis when both the Signals and Indistinguishable regions degenerate.

The more interesting scenario of occurs in NGS studies of rare variants when the Signals region is very small or degenerates and the Indistinguishable region may ensconce causal variants. In this case, we need to search further along the ordered test statistics, bypass some of the noncausal variants in the Indistinguishable region, and then stop when the number of false negatives is small relative to the total number of causal variants. The search starts at and ends at the smallest j, , such that the observed p-values, , is no greater than the β-th quantile of the j-th ordered p-value, , among the null variants. The rationale is that when not all causal variants rank before , the number of noncausal variants among the top variants, denoted by , would be greater than j. Then the observed , which is in essence , would be greater than . In other words, is implied by the event that the top variants still do not contain all causal variants. Therefore, our search should continue until the first time . In the extremely ideal case, one would wish that . In real practice, we set by looking for the j such that is less than or equal to the β-th quantile of to achieve a better balance between a small false-negative proportion and a reasonable total number of variants selected. When this event occurs (i.e., ≤ β-th quantile of ), the AFNC threshold asymptotically controls SMR^ϵ at level β for an arbitrarily small constant ϵ (i.e., ϵ is not changing with the total number of variants d).

In summary, using the cut-off position , AFNC can adaptively encompass a large proportion (1 − ϵ) of the causal variants with high probability (≈1 − β). In the case where the causal and noncausal variants are better separated, of AFNC will become closer to the Bonferroni cut-off position t_α. The AFNC procedure controls the signal missing rate with any consistent estimator of π (and in this paper, we employ the estimator of Eq 4). Finally, our procedure has a very low computational complexity O(d log d) and can be applied under extreme high dimensionality for WGS and WES studies.

Supporting Information

S1 Text. Derivation of signal missing rate control.

We measure the false negatives using the signal missing rate (SMR) and show that SMR for can be asymptotically controlled at level β.

https://doi.org/10.1371/journal.pcbi.1004993.s001

(PDF)

S1 Fig. Inclusion rate of causal variants across varying effect sizes and numbers of variants at s = 50.

Success rates of including at least 50%, 75%, 90%, and 95% of s variants are examined. Results are shown for s = 50 number of causal variants when C ≠ 0 and n = 2000 number of samples.

https://doi.org/10.1371/journal.pcbi.1004993.s002

(PDF)

S2 Fig. Comparisons across varying effect sizes and numbers of variants at s = 25.

Performance of AFNC, FDR, and Bonferroni is evaluated in terms of sensitivity, specificity, and g-measure. Results are shown for s = 25 number of causal variants when C ≠ 0 and n = 2000 number of samples.

https://doi.org/10.1371/journal.pcbi.1004993.s003

(PDF)

S3 Fig. Inclusion rate of causal variants across varying effect sizes and numbers of variants at s = 25.

Success rates of including at least 50%, 75%, 90%, and 95% of s variants are examined. Results are shown for s = 25 number of causal variants when C ≠ 0 and n = 2000 number of samples.

https://doi.org/10.1371/journal.pcbi.1004993.s004

(PDF)

S4 Fig. Inclusion rate of causal variants across sample sizes and numbers of causal variants at C = 0.5.

Success rates of including at least 50%, 75%, 90%, and 95% of s variants are examined. Results are shown for the effect-size multiplier C = 0.5 and d = 100,000 number of variants.

https://doi.org/10.1371/journal.pcbi.1004993.s005

(PDF)

S5 Fig. Comparisons across varying sample sizes and numbers of causal variants at C = 0.25.

Performance of AFNC, FDR, and Bonferroni is evaluated in terms of sensitivity, specificity, and g-measure. Results are shown for the effect-size multiplier C = 0.25 and d = 100,000 number of variants.

https://doi.org/10.1371/journal.pcbi.1004993.s006

(PDF)

S6 Fig. Inclusion rate of causal variants across sample sizes and numbers of causal variants at C = 0.25.

Success rates of including at least 50%, 75%, 90%, and 95% of s variants are examined. Results are shown for the effect-size multiplier C = 0.25 and d = 100,000 number of variants.

https://doi.org/10.1371/journal.pcbi.1004993.s007

(PDF)

S7 Fig. Comparisons across varying effect sizes and numbers of variants at s = 50 with the Fndr.

Performance of AFNC, FDR, and Bonferroni is compared with that of the Fndr in terms of sensitivity, specificity, and g-measure. Results are shown for s = 50 number of causal variants when C ≠ 0 and n = 2000 number of samples.

https://doi.org/10.1371/journal.pcbi.1004993.s008

(PDF)

S1 Table. Empirical type I error rates across varying sample sizes.

Standard errors are included in parentheses. Results are shown for d = 100,000 number of variants.

https://doi.org/10.1371/journal.pcbi.1004993.s009

(PDF)

S2 Table. Full annotation of AFNC-selected variants in the analysis of CoLaus data.

Gene-set p-values are computed using the SKAT. Genes are sorted in alphabetic order, and variants are sorted by their individual p-values among each gene. Variants marked with (*) are also selected by the FDR.

https://doi.org/10.1371/journal.pcbi.1004993.s010

(PDF)

S1 File. Files for simulations and analysis of CoLaus data.

File “simulation_code.R” contains R code for simulations. SNPs used to generate phenotypes at n = 2000 are included as “snp.n2000.RData”. File “CoLaus_code.R” contains R code for the analysis of CoLaus data.

https://doi.org/10.1371/journal.pcbi.1004993.s011

(ZIP)

S1 Dataset. Single-locus and gene-level p-values used in the analysis of CoLaus data.

Dataset “single_locus_pvalues.txt” contains variant-level p-values used in the analysis of the CoLaus data. Dataset “gene_level_pvalues.txt” contains gene-level p-values computed from the SKAT.

https://doi.org/10.1371/journal.pcbi.1004993.s012

(ZIP)

Acknowledgments

The authors thank Drs. Peter Vollenweider and Gerard Waeber, PIs of the CoLaus study, and Drs. Meg Ehm and Matthew Nelson, collaborators at GlaxoSmithKline for providing the CoLaus phenotype and sequence data.

Author Contributions

Conceived and designed the experiments: XJJ ZJD WL JYT. Performed the experiments: XJJ ZJD JYT. Analyzed the data: XJJ ZJD JYT. Contributed reagents/materials/analysis tools: XJJ ZJD JYT. Wrote the paper: XJJ ZJD WL JYT.

References

1. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. pmid:11404818
- View Article
- PubMed/NCBI
- Google Scholar
2. Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007;80:727–739. pmid:17357078
- View Article
- PubMed/NCBI
- Google Scholar
3. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. pmid:18509313
- View Article
- PubMed/NCBI
- Google Scholar
4. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456:18–21. pmid:18987709
- View Article
- PubMed/NCBI
- Google Scholar
5. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. pmid:19812666
- View Article
- PubMed/NCBI
- Google Scholar
6. Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872. pmid:15297675
- View Article
- PubMed/NCBI
- Google Scholar
7. Cohen JC, Boerwinkle E, M TH Jr, Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med. 2006;354:1264–1272. pmid:16554528
- View Article
- PubMed/NCBI
- Google Scholar
8. Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, et al. Medical sequencing at the extremes of human body mass. Am J Hum Genet. 2007;80:779–791. pmid:17357083
- View Article
- PubMed/NCBI
- Google Scholar
9. Romeo S, Pennacchio LA, Fu Y, Boerwinkle E, Tybjaerg-Hansen A, Hobbs HH, et al. Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet. 2007;39:513–516. pmid:17322881
- View Article
- PubMed/NCBI
- Google Scholar
10. Ji W, Foo JN, O’Roak BJ, Zhao H, Larson MG, Simon DB, et al. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet. 2008;40:592–599. pmid:18391953
- View Article
- PubMed/NCBI
- Google Scholar
11. Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, et al. Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest. 2009;119:70–79. pmid:19075393
- View Article
- PubMed/NCBI
- Google Scholar
12. Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009;324:387–389. pmid:19264985
- View Article
- PubMed/NCBI
- Google Scholar
13. Holm H, Gudbjartsson DF, Sulem P, Masson G, Helgadottir HT, Zanon C, et al. A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet. 2011;43:316–320. pmid:21378987
- View Article
- PubMed/NCBI
- Google Scholar
14. McClellan J, King MC. Genetic heterogeneity in human disease. Cell. 2010;141:210–217. pmid:20403315
- View Article
- PubMed/NCBI
- Google Scholar
15. Ionita-Laza I, Cho MH, Laird NM. Statistial challenges in sequence-based association studies with population- and family-based designs. Statistics in Biosciences. 2013;5:54–70.
- View Article
- Google Scholar
16. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12:628–640. pmid:21850043
- View Article
- PubMed/NCBI
- Google Scholar
17. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21. pmid:18691683
- View Article
- PubMed/NCBI
- Google Scholar
18. Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2007;615:28–56. pmid:17101154
- View Article
- PubMed/NCBI
- Google Scholar
19. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an Unusual Distribution of Rare Variants. PLoS Genetics. 2011;7(3):e1001322. pmid:21408211
- View Article
- PubMed/NCBI
- Google Scholar
20. Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol. 2008;32:560–566. pmid:18428428
- View Article
- PubMed/NCBI
- Google Scholar
21. Pan W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol. 2009;33:497–507. pmid:19170135
- View Article
- PubMed/NCBI
- Google Scholar
22. Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol, in press. 2011;35:606–19.
- View Article
- Google Scholar
23. Lin DY, Tang ZZ. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89:354–67. pmid:21885029
- View Article
- PubMed/NCBI
- Google Scholar
24. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). Am J Hum Genet. 2011;89:82–93. pmid:21737059
- View Article
- PubMed/NCBI
- Google Scholar
25. Daye ZJ, Li H, Wei Z. A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res. 2012;40:e60. pmid:22262732
- View Article
- PubMed/NCBI
- Google Scholar
26. Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, et al. Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. Am J Hum Genet. 2011;89:277–288.
- View Article
- Google Scholar
27. Sunyaev SR. Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet. 2012;21(R1):R10–17. pmid:22990389
- View Article
- PubMed/NCBI
- Google Scholar
28. Kinnamon DD, Hershberger RE, Martin ER. Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants. PLoS One. 2012;7:e30238. pmid:22363423
- View Article
- PubMed/NCBI
- Google Scholar
29. Barnett I. SNP-set Tests for Sequencing and Genome-Wide Association Studies. Harvard University; 2014.
30. Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful and adaptive association test for rare variants. Genetics. 2014;197:1081–95. pmid:24831820
- View Article
- PubMed/NCBI
- Google Scholar
31. Yuan HY, Chiou JJ, Tseng WH, Liu CH, Liu CK, Lin YJ, et al. FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006;34:W635–W641. pmid:16845089
- View Article
- PubMed/NCBI
- Google Scholar
32. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008;24:2938–2939. pmid:18974171
- View Article
- PubMed/NCBI
- Google Scholar
33. Lee PH, Shatkay H. F-SNP: computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008;36:D820–D824. pmid:17986460
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhang K, Chang S, Cui S, Guo L, Zhang L, Wang J. ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework. Nucleic Acids Res. 2011;39:W437–43. pmid:21622953
- View Article
- PubMed/NCBI
- Google Scholar
35. Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA. A Catalog of Published Genome-Wide Association Studies; 2011. Available at: www.genome.gov/gwastudies. Accessed July 15, 2011.
36. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. pmid:17701901
- View Article
- PubMed/NCBI
- Google Scholar
37. Agresti A. Categorical Data Analysis. 2nd ed. Gainesville, FL: John Wiley & Sons; 2002.
38. Dunn OJ. Multiple Comparisons Among Means. J American Statistical Association. 1961;56:52–64.
- View Article
- Google Scholar
39. Bush WS, Moore JH. Chapter 11: Genome-wide association studies. PLoS Comput Biol. 2012;8:e1002822. pmid:23300413
- View Article
- PubMed/NCBI
- Google Scholar
40. Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med. 1990;9:811–8. pmid:2218183
- View Article
- PubMed/NCBI
- Google Scholar
41. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B. 1995;57:289–300.
- View Article
- Google Scholar
42. Storey J. A direct approach to false discovery rates. J Royal Stat Soc B. 2002;64:479–498.
- View Article
- Google Scholar
43. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. pmid:12883005
- View Article
- PubMed/NCBI
- Google Scholar
44. Dudbridge F K B Gusnanto A. Detecting multiple associations in genome-wide studies. Hum Genomics. 2006;2:310–7.
- View Article
- Google Scholar
45. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–91. pmid:16983374
- View Article
- PubMed/NCBI
- Google Scholar
46. van den Oord EJ. Controlling false discoveries in genetic studies. American journal of medical genetics, Part B, Neuropsychiatric genetics. 2008;147B:637–644.
- View Article
- Google Scholar
47. Jeske D, Liu Z, Bent E, Borneman J. Classification rules that include neutral zones and their application to microbial community profiling. Communication in Statistics—Theory and Methods. 2007;36:1965–1980.
- View Article
- Google Scholar
48. Drton M, Perlman MD. A SINful approach to Gaussian graphical model selection. J Statistical Planning and Inference. 2008;138:1179–1200.
- View Article
- Google Scholar
49. Jeng XJ. Identification of signal, noise, and indistinguishable subsets in high-dimensional data analysis. arXiv. 2013;stat.ME:1305.0220.
- View Article
- Google Scholar
50. Meinshausen M, Rice J. Estimating the proportion of false null hypotheses among a large number of independent tested hypotheses. Ann Statist. 2006;34:373–393.
- View Article
- Google Scholar
51. Jin J, Cai T. Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J American Statistical Association. 2007;102:495–506.
- View Article
- Google Scholar
52. Cai T, Jeng XJ, Jin J. Optimal detection of heterogeneous and heteroscedastic mixtures. J Royal Stat Soc B. 2011;73:629–662.
- View Article
- Google Scholar
53. Jeng XJ, Cai T, Li H. Simultaneous Discovery of Rare and Common Segment Variants. Biometrika. 2013;100:157–172. pmid:23825436
- View Article
- PubMed/NCBI
- Google Scholar
54. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15:1576–83. pmid:16251467
- View Article
- PubMed/NCBI
- Google Scholar
55. Ionita-Laza I, Lee S, Makarov V, Buxbaum J, Lin X. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet. 2013;92:841–853. pmid:23684009
- View Article
- PubMed/NCBI
- Google Scholar
56. Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Machine Learning Technologies. 2011;2:37–63.
- View Article
- Google Scholar
57. Sokolova M, Japkowicz N, Szpakowicz S. Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation. In: Sattar A, Kang BH, editors. AI 2006: Advances in Artifical Intelligence. Berlin: Springer-Verlag; 2006.
58. Firmann M, Mayor V, Vidal PM, Bochud M, Pecoud A, Hayoz D, et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord. 2008;17:8:6.
- View Article
- Google Scholar
59. Nelson MR, Wegmann D, Ehm MG, Kessner D, Jean PS, Verzilli C, et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012;337:100–104. pmid:22604722
- View Article
- PubMed/NCBI
- Google Scholar
60. Song K, Nelson MR, Aponte J, Manas ES, Bacanu SA, Yuan X, et al. Sequencing of Lp-PLA2-encoding PLA2G7 gene in 2000 Europeans reveals several rare loss-of-function mutations. Pharmacogenomics J. 2012;12:425–31. pmid:21606947
- View Article
- PubMed/NCBI
- Google Scholar
61. Warren LL, Li L, Nelson MR, Ehm MG, Shen J, Fraser DJ, et al. Deep resequencing unveils genetic architecture of ADIPOQ and identifies a novel low-frequency variant strongly associated with adiponectin variation. Diabetes. 2012;61:1297–301. pmid:22403302
- View Article
- PubMed/NCBI
- Google Scholar
62. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association. Nat Genet. 2006;38:904–909. pmid:16862161
- View Article
- PubMed/NCBI
- Google Scholar
63. Durrington P. Dyslipidaemia. Lancet. 2003;362:717–731. pmid:12957096
- View Article
- PubMed/NCBI
- Google Scholar
64. Kelly M, Semsarian C. Multiple mutations in genetic cardiovascular disease: a marker of disease severity? Circ Cardiovasc Genet. 2009;2:182–190.
- View Article
- Google Scholar
65. van Loo KM, Dejaegere T, van Zweeden M, van Schijndel JE, Wijmenga C, Trip MD, et al. Male-specific association between a gamma-secretase polymorphism and premature coronary atherosclerosis. PLoS One. 2008;3(11):e3662. pmid:18987747
- View Article
- PubMed/NCBI
- Google Scholar
66. Serneels L, Dejaegere T, Craessaerts K, Horre K, Jorissen E, Tousseyn T, et al. Differential contribution of the three Aph1 genes to gamma-secretase activity in vivo. Proc Natl Acad Sci U S A. 2005;102:1719–24. pmid:15665098
- View Article
- PubMed/NCBI
- Google Scholar
67. Roscioli T, Cliffe ST, Bloch DB, Bell CG, Mullan G, Taylor PJ, et al. Mutations in the gene encoding the PML nuclear body protein Sp110 are associated with immunodeficiency and hepatic veno-occlusive disease. Nat Genet. 2006;38:620–2. pmid:16648851
- View Article
- PubMed/NCBI
- Google Scholar
68. Liu XR, Liu Q, Chen GY, Hu Y, Sham JS, Lin MJ. Down-regulation of TRPM8 in pulmonary arteries of pulmonary hypertensive rats. Cell Physiol Biochem. 2013;31:892–904. pmid:23817166
- View Article
- PubMed/NCBI
- Google Scholar
69. Fernandez JA, Skryma R, Bidaux G, Magleby KL, Scholfield CN, McGeown JG, et al. Short isoforms of the cold receptor TRPM8 inhibit channel gating by mimicking heat action rather than chemical inhibitors. J Biol Chem. 2012;287:2963–70. pmid:22128172
- View Article
- PubMed/NCBI
- Google Scholar
70. Yang XR, Lin MJ, McIntosh LS, Sham JS. Functional expression of transient receptor potential melastatin- and vanilloid-related channels in pulmonary arterial and aortic smooth muscle. Am J Physiol Lung Cell Mol Physiol. 2006;290:L1267–76. pmid:16399784
- View Article
- PubMed/NCBI
- Google Scholar
71. Out C, Dikkers A, Laskewitz A, Boverhof R, van der Ley C, Kema IP, et al. Prednisolone increases enterohepatic cycling of bile acids by induction of Asbt and promotes reverse cholesterol transport. J Hepatol. 2014;61:351–7. pmid:24681341
- View Article
- PubMed/NCBI
- Google Scholar
72. Beauharnois JM, Bolivar BE, Welch JT. Sirtuin 6: a review of biological effects and potential therapeutic properties. Mol Biosyst. 2013;9:1789–806. pmid:23592245
- View Article
- PubMed/NCBI
- Google Scholar
73. Webster KA. A sirtuin link between metabolism and heart disease. Nat Med. 2012;18:1617–9. pmid:23135512
- View Article
- PubMed/NCBI
- Google Scholar
74. Sundaresan NR, Vasudevan P, Zhong L, Kim G, Samant S, Parekh V, et al. The sirtuin SIRT6 blocks IGF-Akt signaling and development of cardiac hypertrophy by targeting c-Jun. Nat Med. 2012;18:1643–50. pmid:23086477
- View Article
- PubMed/NCBI
- Google Scholar
75. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. 5th ed. New York: Garland Science; 2007.
76. Bailey SF, Hinz A, Kassen R. Adaptive synonymous mutations in an experimentally evolved Pseudomonas fluorescens population. Nat Commun. 2014;5:4076. pmid:24912567
- View Article
- PubMed/NCBI
- Google Scholar
77. Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C. Exposing synonymous mutations. Trends Genet. 2014;30:308–21. pmid:24954581
- View Article
- PubMed/NCBI
- Google Scholar
78. Goebels C, Thonn A, Gonzalez-Hilarion S, Rolland O, Moyrand F, Beilharz TH, et al. Introns regulate gene expression in Cryptococcus neoformans in a Pab2p dependent pathway. PLoS Genet. 2013;9:e1003686. pmid:23966870
- View Article
- PubMed/NCBI
- Google Scholar
79. Spiltoir JI, Stratton MS, Cavasin MA, Demos-Davies K, Reid BG, Qi J, et al. BET acetyl-lysine binding proteins control pathological cardiac hypertrophy. J Mol Cell Cardiol. 2013;63:175–9. pmid:23939492
- View Article
- PubMed/NCBI
- Google Scholar
80. Duerr GD, Heinemann JC, Suchan G, Kolobara E, Wenzel D, Geisen C, et al. The endocannabinoid-CB2 receptor axis protects the ischemic heart at the early stage of cardiomyopathy. Basic Res Cardiol. 2014;109:425. pmid:24980781
- View Article
- PubMed/NCBI
- Google Scholar
81. Gonzalez C, Herradon E, Abalo R, Vera G, Perez-Nievas BG, Leza JC, et al. Cannabinoid/agonist WIN 55,212-2 reduces cardiac ischaemia-reperfusion injury in Zucker diabetic fatty rats: role of CB2 receptors and iNOS/eNOS. Diabetes Metab Res Rev. 2011;1:244–54.
- View Article
- Google Scholar
82. Ford WR, Honan SA, White R, Hiley CR. Evidence of a novel site mediating anandamide-induced negative inotropic and coronary vasodilatator responses in rat isolated hearts. Br J Pharmacol. 2002;1:244–54.
- View Article
- Google Scholar
83. Bi D, Toyama K, Lemaitre V, Takai J, Fan F, Jenkins DP, et al. The intermediate conductance calcium-activated potassium channel KCa3.1 regulates vascular smooth muscle cell proliferation via controlling calcium-dependent signaling. J Biol Chem. 2013;288:15843–53. pmid:23609438
- View Article
- PubMed/NCBI
- Google Scholar
84. Kohler R. Single-nucleotide polymorphisms in vascular Ca2+-activated K+-channel genes and cardiovascular disease. Pflugers Arch. 2010;460:343–51. pmid:20043229
- View Article
- PubMed/NCBI
- Google Scholar
85. Toyama K, Wulff H, Chandy KG, Azam P, Raman G, Saito T, et al. The intermediate-conductance calcium-activated potassium channel KCa3.1 contributes to atherogenesis in mice and humans. J Clin Invest. 2008;118:3025–37. pmid:18688283
- View Article
- PubMed/NCBI
- Google Scholar
86. Yamaguchi M, Nakayama T, Fu Z, Naganuma T, Sato N, Soma M, et al. Relationship between haplotypes of KCNN4 gene and susceptibility to human vascular diseases in Japanese. Med Sci Monit. 2009;15:CR389–97. pmid:19644414
- View Article
- PubMed/NCBI
- Google Scholar
87. Pereira NL, Aksoy P, Moon I, Peng Y, Redfield MM, Burnett JC, et al. Natriuretic peptide pharmacogenetics: membrane metallo-endopeptidase (MME): common gene sequence variation, functional characterization and degradation. J Mol Cell Cardiol. 2010;49:864–74. pmid:20692264
- View Article
- PubMed/NCBI
- Google Scholar
88. Munagala VK, Burnett JC, Redfield MM. The natriuretic peptides in cardiovascular medicine. Curr Probl Cardiol. 2004;29:707–69. pmid:15550914
- View Article
- PubMed/NCBI
- Google Scholar
89. Garg NJ. Inflammasomes in cardiovascular diseases. Am J Cardiovasc Dis. 2011;1:244–54. pmid:22254202
- View Article
- PubMed/NCBI
- Google Scholar
90. Tang Y, Mi C, Liu J, Gao F, Long J. Compromised mitochondrial remodeling in compensatory hypertrophied myocardium of spontaneously hypertensive rat. Cardiovasc Pathol. 2014;23:101–6. pmid:24388463
- View Article
- PubMed/NCBI
- Google Scholar
91. Walsh DA, McWilliams DF. Tachykinins and the cardiovascular system. Curr Drug Targets. 2006;7:1031–42. pmid:16918331
- View Article
- PubMed/NCBI
- Google Scholar
92. Hoover DB, Chang Y, Hancock JC, Zhang L. Actions of tachykinins within the heart and their relevance to cardiovascular disease. Jpn J Pharmacol. 2000;84:367–73. pmid:11202607
- View Article
- PubMed/NCBI
- Google Scholar
93. Tang H, Xiao K, Mao L, Rockman HA, Marchuk DA. Overexpression of TNNI3K, a cardiac-specific MAPKKK, promotes cardiac dysfunction. J Mol Cell Cardiol. 2013;54:101–11. pmid:23085512
- View Article
- PubMed/NCBI
- Google Scholar
94. Wheeler FC, Tang H, Marks OA, Hadnott TN, Chu PL, Mao L, et al. Tnni3k modifies disease progression in murine models of cardiomyopathy. PLoS Genet. 2009;5:e1000647. pmid:19763165
- View Article
- PubMed/NCBI
- Google Scholar
95. Theis JL, Zimmermann MT, Larsen BT, Rybakova IN, Long PA, Evans JM, et al. TNNI3K mutation in familial syndrome of conduction system disease, atrial tachyarrhythmia and dilated cardiomyopathy. Hum Mol Genet. 2014;23:5793–804. pmid:24925317
- View Article
- PubMed/NCBI
- Google Scholar
96. Zoledziewska M, Costa G, Pitzalis M, Cocco E, Melis C, Moi L, et al. Variation within the CLEC16A gene shows consistent disease association with both multiple sclerosis and type 1 diabetes in Sardinia. Genes Immun. 2009;10:15–7. pmid:18946483
- View Article
- PubMed/NCBI
- Google Scholar
97. Fox CS, Heard-Costa NL, Wilson PW, Levy D, D’Agostino RB, Atwood LD. Genome-wide linkage to chromosome 6 for waist circumference in the Framingham Heart Study. Diabetes. 2004;53:1399–402. pmid:15111512
- View Article
- PubMed/NCBI
- Google Scholar
98. Lee KW, Abrahamowicz M, Leonard GT, Richer L, Perron M, Veillette S, et al. Prenatal exposure to cigarette smoke interacts with OPRM1 to modulate dietary preference for fat. J Psychiatry Neurosci. 2015;40:38–45. pmid:25266401
- View Article
- PubMed/NCBI
- Google Scholar
99. Decramer M, Janssens W, Miravitlles M. Chronic obstructive pulmonary disease. Lancet. 2012;379:1341–51. pmid:22314182
- View Article
- PubMed/NCBI
- Google Scholar
100. Currie GP, Butler CA, Anderson WJ, Skinner C. Phosphodiesterase 4 inhibitors in chronic obstructive pulmonary disease: a new approach to oral treatment. Br J Clin Pharmacol. 2008;65:803–10. pmid:18341675
- View Article
- PubMed/NCBI
- Google Scholar
101. Giembycz MA. Phosphodiesterase-4: selective and dual-specificity inhibitors for the therapy of chronic obstructive pulmonary disease. Proc Am Thorac Soc. 2005;2:326–33. pmid:16267357
- View Article
- PubMed/NCBI
- Google Scholar
102. Giembycz MA. Cilomilast: a second generation phosphodiesterase 4 inhibitor for asthma and chronic obstructive pulmonary disease. Expert Opin Investig Drugs. 2001;10:1361–79. pmid:11772257
- View Article
- PubMed/NCBI
- Google Scholar
103. Li QS, Cheng P, Favis R, Wickenden A, Romano G, Wang H. SCN9A Variants may be Implicated in Neuropathic Pain Associated with Diabetic Peripheral Neuropathy and Pain Severity. Clin J Pain. 2015;
- View Article
- Google Scholar
104. Huang Y, Zang Y, Zhou L, Gui W, Liu X, Zhong Y. The role of TNF-alpha/NF-kappa B pathway on the up-regulation of voltage-gated sodium channel Nav1.7 in DRG neurons of rats with diabetic neuropathy. Neurochem Int. 2014;75:112–9. pmid:24893330
- View Article
- PubMed/NCBI
- Google Scholar
105. Liu DJ, Leal SM. A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions. PLoS Genetics. 2011;6:e1001156.
- View Article
- Google Scholar
106. Schifano ED, Epstein MP, Bielak LF, Jhun MA, Kardia SL, Peyser PA, et al. SNP Set Association Analysis for Familial Data. Genet Epidemiol. 2012;36:797–810. pmid:22968922
- View Article
- PubMed/NCBI
- Google Scholar
107. Oualkacha K, Dastani Z, Li R, Cingolani PE, Spector TD, Hammond CJ, et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genet Epidemiol. 2013;37:366–376. pmid:23529756
- View Article
- PubMed/NCBI
- Google Scholar
108. Efron B. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J American Statistical Association. 2004;99:96–104.
- View Article
- Google Scholar
109. Long N, Dickson SP, Maia JM, Kim HS, Zhu Q, Allen AS. Leveraging prior information to detect causal variants via multi-variant regression. PLoS Comput Biol. 2013;9(6):e1003093. pmid:23762022
- View Article
- PubMed/NCBI
- Google Scholar
110. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10(12):e1004729. pmid:25502226
- View Article
- PubMed/NCBI
- Google Scholar
111. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. pmid:12351577
- View Article
- PubMed/NCBI
- Google Scholar
112. Logan BR, Geliazkova MP, Rowe DB. An evaluation of spatial thresholding techniques in fMRI analysis. Hum Brain Mapp. 2008;29:1379–1389. pmid:18064589
- View Article
- PubMed/NCBI
- Google Scholar
113. Fan J, Han X, Gu W. Control of the false discovery rate under arbitrary covariance dependence. J American Statistical Association. 2012;107:1019–1045.
- View Article
- Google Scholar
114. Friguet C, Kloareg M, Causeur D. A Factor Model Approach to Multiple Testing Under Dependence. J the American Statistical Association. 2009;104:1406–15.
- View Article
- Google Scholar
115. Genovese C, Wasserman L. Operating characteristics and extensions of the false discovery rate. J Royal Stat Soc B. 2002;64:499–517.
- View Article
- Google Scholar
116. Sarkar SK. FDR-controlling stepwise procedure and their false negatives rates. J Statistical Planning and Inference. 2004;125:119–137.
- View Article
- Google Scholar
117. Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinformatics. 2008;9. pmid:18613966
- View Article
- PubMed/NCBI
- Google Scholar
118. Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J Royal Stat Soc B. 2004;66:187–205.
- View Article
- Google Scholar
119. Sarkar SK. False discovery and false nondiscovery rates in single-step multiple testing procedures. The Annals of Statistics. 2006;34:394–415.
- View Article
- Google Scholar
120. Cai T, Jin J, Low M. Estimation and Confidence Sets For Sparse Normal Mixtures. Ann Statist. 2007;35:2421–2449.
- View Article
- Google Scholar

[ref1] 1. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. pmid:11404818
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007;80:727–739. pmid:17357078
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. pmid:18509313
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456:18–21. pmid:18987709
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. pmid:19812666
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872. pmid:15297675
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Cohen JC, Boerwinkle E, M TH Jr, Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med. 2006;354:1264–1272. pmid:16554528
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, et al. Medical sequencing at the extremes of human body mass. Am J Hum Genet. 2007;80:779–791. pmid:17357083
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Romeo S, Pennacchio LA, Fu Y, Boerwinkle E, Tybjaerg-Hansen A, Hobbs HH, et al. Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet. 2007;39:513–516. pmid:17322881
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Ji W, Foo JN, O’Roak BJ, Zhao H, Larson MG, Simon DB, et al. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet. 2008;40:592–599. pmid:18391953
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, et al. Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest. 2009;119:70–79. pmid:19075393
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009;324:387–389. pmid:19264985
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Holm H, Gudbjartsson DF, Sulem P, Masson G, Helgadottir HT, Zanon C, et al. A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet. 2011;43:316–320. pmid:21378987
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. McClellan J, King MC. Genetic heterogeneity in human disease. Cell. 2010;141:210–217. pmid:20403315
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Ionita-Laza I, Cho MH, Laird NM. Statistial challenges in sequence-based association studies with population- and family-based designs. Statistics in Biosciences. 2013;5:54–70.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref16] 16. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12:628–640. pmid:21850043
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–21. pmid:18691683
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2007;615:28–56. pmid:17101154
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an Unusual Distribution of Rare Variants. PLoS Genetics. 2011;7(3):e1001322. pmid:21408211
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol. 2008;32:560–566. pmid:18428428
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Pan W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol. 2009;33:497–507. pmid:19170135
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref22] 22. Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol, in press. 2011;35:606–19.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref23] 23. Lin DY, Tang ZZ. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89:354–67. pmid:21885029
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref24] 24. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). Am J Hum Genet. 2011;89:82–93. pmid:21737059
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref25] 25. Daye ZJ, Li H, Wei Z. A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res. 2012;40:e60. pmid:22262732
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref26] 26. Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, et al. Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. Am J Hum Genet. 2011;89:277–288.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref27] 27. Sunyaev SR. Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet. 2012;21(R1):R10–17. pmid:22990389
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref28] 28. Kinnamon DD, Hershberger RE, Martin ER. Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants. PLoS One. 2012;7:e30238. pmid:22363423
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref29] 29. Barnett I. SNP-set Tests for Sequencing and Genome-Wide Association Studies. Harvard University; 2014.

[ref30] 30. Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful and adaptive association test for rare variants. Genetics. 2014;197:1081–95. pmid:24831820
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref31] 31. Yuan HY, Chiou JJ, Tseng WH, Liu CH, Liu CK, Lin YJ, et al. FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006;34:W635–W641. pmid:16845089
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref32] 32. Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008;24:2938–2939. pmid:18974171
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref33] 33. Lee PH, Shatkay H. F-SNP: computationally predicted functional SNPs for disease association studies. Nucleic Acids Res. 2008;36:D820–D824. pmid:17986460
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref34] 34. Zhang K, Chang S, Cui S, Guo L, Zhang L, Wang J. ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework. Nucleic Acids Res. 2011;39:W437–43. pmid:21622953
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref35] 35. Hindorff LA, Junkins HA, Hall PN, Mehta JP, Manolio TA. A Catalog of Published Genome-Wide Association Studies; 2011. Available at: www.genome.gov/gwastudies. Accessed July 15, 2011.

[ref36] 36. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. pmid:17701901
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref37] 37. Agresti A. Categorical Data Analysis. 2nd ed. Gainesville, FL: John Wiley & Sons; 2002.

[ref38] 38. Dunn OJ. Multiple Comparisons Among Means. J American Statistical Association. 1961;56:52–64.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref39] 39. Bush WS, Moore JH. Chapter 11: Genome-wide association studies. PLoS Comput Biol. 2012;8:e1002822. pmid:23300413
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref40] 40. Hochberg Y, Benjamini Y. More powerful procedures for multiple significance testing. Stat Med. 1990;9:811–8. pmid:2218183
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref41] 41. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B. 1995;57:289–300.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref42] 42. Storey J. A direct approach to false discovery rates. J Royal Stat Soc B. 2002;64:479–498.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref43] 43. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. pmid:12883005
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref44] 44. Dudbridge F K B Gusnanto A. Detecting multiple associations in genome-wide studies. Hum Genomics. 2006;2:310–7.
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref45] 45. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7:781–91. pmid:16983374
View Article
PubMed/NCBI
Google Scholar

[162] View Article

[163] PubMed/NCBI

[164] Google Scholar

[ref46] 46. van den Oord EJ. Controlling false discoveries in genetic studies. American journal of medical genetics, Part B, Neuropsychiatric genetics. 2008;147B:637–644.
View Article
Google Scholar

[166] View Article

[167] Google Scholar

[ref47] 47. Jeske D, Liu Z, Bent E, Borneman J. Classification rules that include neutral zones and their application to microbial community profiling. Communication in Statistics—Theory and Methods. 2007;36:1965–1980.
View Article
Google Scholar

[169] View Article

[170] Google Scholar

[ref48] 48. Drton M, Perlman MD. A SINful approach to Gaussian graphical model selection. J Statistical Planning and Inference. 2008;138:1179–1200.
View Article
Google Scholar

[172] View Article

[173] Google Scholar

[ref49] 49. Jeng XJ. Identification of signal, noise, and indistinguishable subsets in high-dimensional data analysis. arXiv. 2013;stat.ME:1305.0220.
View Article
Google Scholar

[175] View Article

[176] Google Scholar

[ref50] 50. Meinshausen M, Rice J. Estimating the proportion of false null hypotheses among a large number of independent tested hypotheses. Ann Statist. 2006;34:373–393.
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref51] 51. Jin J, Cai T. Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J American Statistical Association. 2007;102:495–506.
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref52] 52. Cai T, Jeng XJ, Jin J. Optimal detection of heterogeneous and heteroscedastic mixtures. J Royal Stat Soc B. 2011;73:629–662.
View Article
Google Scholar

[184] View Article

[185] Google Scholar

[ref53] 53. Jeng XJ, Cai T, Li H. Simultaneous Discovery of Rare and Common Segment Variants. Biometrika. 2013;100:157–172. pmid:23825436
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref54] 54. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15:1576–83. pmid:16251467
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref55] 55. Ionita-Laza I, Lee S, Makarov V, Buxbaum J, Lin X. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet. 2013;92:841–853. pmid:23684009
View Article
PubMed/NCBI
Google Scholar

[195] View Article

[196] PubMed/NCBI

[197] Google Scholar

[ref56] 56. Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Machine Learning Technologies. 2011;2:37–63.
View Article
Google Scholar

[199] View Article

[200] Google Scholar

[ref57] 57. Sokolova M, Japkowicz N, Szpakowicz S. Beyond Accuracy, F-score and ROC: a Family of Discriminant Measures for Performance Evaluation. In: Sattar A, Kang BH, editors. AI 2006: Advances in Artifical Intelligence. Berlin: Springer-Verlag; 2006.

[ref58] 58. Firmann M, Mayor V, Vidal PM, Bochud M, Pecoud A, Hayoz D, et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord. 2008;17:8:6.
View Article
Google Scholar

[203] View Article

[204] Google Scholar

[ref59] 59. Nelson MR, Wegmann D, Ehm MG, Kessner D, Jean PS, Verzilli C, et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012;337:100–104. pmid:22604722
View Article
PubMed/NCBI
Google Scholar

[206] View Article

[207] PubMed/NCBI

[208] Google Scholar

[ref60] 60. Song K, Nelson MR, Aponte J, Manas ES, Bacanu SA, Yuan X, et al. Sequencing of Lp-PLA2-encoding PLA2G7 gene in 2000 Europeans reveals several rare loss-of-function mutations. Pharmacogenomics J. 2012;12:425–31. pmid:21606947
View Article
PubMed/NCBI
Google Scholar

[210] View Article

[211] PubMed/NCBI

[212] Google Scholar

[ref61] 61. Warren LL, Li L, Nelson MR, Ehm MG, Shen J, Fraser DJ, et al. Deep resequencing unveils genetic architecture of ADIPOQ and identifies a novel low-frequency variant strongly associated with adiponectin variation. Diabetes. 2012;61:1297–301. pmid:22403302
View Article
PubMed/NCBI
Google Scholar

[214] View Article

[215] PubMed/NCBI

[216] Google Scholar

[ref62] 62. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association. Nat Genet. 2006;38:904–909. pmid:16862161
View Article
PubMed/NCBI
Google Scholar

[218] View Article

[219] PubMed/NCBI

[220] Google Scholar

[ref63] 63. Durrington P. Dyslipidaemia. Lancet. 2003;362:717–731. pmid:12957096
View Article
PubMed/NCBI
Google Scholar

[222] View Article

[223] PubMed/NCBI

[224] Google Scholar

[ref64] 64. Kelly M, Semsarian C. Multiple mutations in genetic cardiovascular disease: a marker of disease severity? Circ Cardiovasc Genet. 2009;2:182–190.
View Article
Google Scholar

[226] View Article

[227] Google Scholar

[ref65] 65. van Loo KM, Dejaegere T, van Zweeden M, van Schijndel JE, Wijmenga C, Trip MD, et al. Male-specific association between a gamma-secretase polymorphism and premature coronary atherosclerosis. PLoS One. 2008;3(11):e3662. pmid:18987747
View Article
PubMed/NCBI
Google Scholar

[229] View Article

[230] PubMed/NCBI

[231] Google Scholar

[ref66] 66. Serneels L, Dejaegere T, Craessaerts K, Horre K, Jorissen E, Tousseyn T, et al. Differential contribution of the three Aph1 genes to gamma-secretase activity in vivo. Proc Natl Acad Sci U S A. 2005;102:1719–24. pmid:15665098
View Article
PubMed/NCBI
Google Scholar

[233] View Article

[234] PubMed/NCBI

[235] Google Scholar

[ref67] 67. Roscioli T, Cliffe ST, Bloch DB, Bell CG, Mullan G, Taylor PJ, et al. Mutations in the gene encoding the PML nuclear body protein Sp110 are associated with immunodeficiency and hepatic veno-occlusive disease. Nat Genet. 2006;38:620–2. pmid:16648851
View Article
PubMed/NCBI
Google Scholar

[237] View Article

[238] PubMed/NCBI

[239] Google Scholar

[ref68] 68. Liu XR, Liu Q, Chen GY, Hu Y, Sham JS, Lin MJ. Down-regulation of TRPM8 in pulmonary arteries of pulmonary hypertensive rats. Cell Physiol Biochem. 2013;31:892–904. pmid:23817166
View Article
PubMed/NCBI
Google Scholar

[241] View Article

[242] PubMed/NCBI

[243] Google Scholar

[ref69] 69. Fernandez JA, Skryma R, Bidaux G, Magleby KL, Scholfield CN, McGeown JG, et al. Short isoforms of the cold receptor TRPM8 inhibit channel gating by mimicking heat action rather than chemical inhibitors. J Biol Chem. 2012;287:2963–70. pmid:22128172
View Article
PubMed/NCBI
Google Scholar

[245] View Article

[246] PubMed/NCBI

[247] Google Scholar

[ref70] 70. Yang XR, Lin MJ, McIntosh LS, Sham JS. Functional expression of transient receptor potential melastatin- and vanilloid-related channels in pulmonary arterial and aortic smooth muscle. Am J Physiol Lung Cell Mol Physiol. 2006;290:L1267–76. pmid:16399784
View Article
PubMed/NCBI
Google Scholar

[249] View Article

[250] PubMed/NCBI

[251] Google Scholar

[ref71] 71. Out C, Dikkers A, Laskewitz A, Boverhof R, van der Ley C, Kema IP, et al. Prednisolone increases enterohepatic cycling of bile acids by induction of Asbt and promotes reverse cholesterol transport. J Hepatol. 2014;61:351–7. pmid:24681341
View Article
PubMed/NCBI
Google Scholar

[253] View Article

[254] PubMed/NCBI

[255] Google Scholar

[ref72] 72. Beauharnois JM, Bolivar BE, Welch JT. Sirtuin 6: a review of biological effects and potential therapeutic properties. Mol Biosyst. 2013;9:1789–806. pmid:23592245
View Article
PubMed/NCBI
Google Scholar

[257] View Article

[258] PubMed/NCBI

[259] Google Scholar

[ref73] 73. Webster KA. A sirtuin link between metabolism and heart disease. Nat Med. 2012;18:1617–9. pmid:23135512
View Article
PubMed/NCBI
Google Scholar

[261] View Article

[262] PubMed/NCBI

[263] Google Scholar

[ref74] 74. Sundaresan NR, Vasudevan P, Zhong L, Kim G, Samant S, Parekh V, et al. The sirtuin SIRT6 blocks IGF-Akt signaling and development of cardiac hypertrophy by targeting c-Jun. Nat Med. 2012;18:1643–50. pmid:23086477
View Article
PubMed/NCBI
Google Scholar

[265] View Article

[266] PubMed/NCBI

[267] Google Scholar

[ref75] 75. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. 5th ed. New York: Garland Science; 2007.

[ref76] 76. Bailey SF, Hinz A, Kassen R. Adaptive synonymous mutations in an experimentally evolved Pseudomonas fluorescens population. Nat Commun. 2014;5:4076. pmid:24912567
View Article
PubMed/NCBI
Google Scholar

[270] View Article

[271] PubMed/NCBI

[272] Google Scholar

[ref77] 77. Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C. Exposing synonymous mutations. Trends Genet. 2014;30:308–21. pmid:24954581
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref78] 78. Goebels C, Thonn A, Gonzalez-Hilarion S, Rolland O, Moyrand F, Beilharz TH, et al. Introns regulate gene expression in Cryptococcus neoformans in a Pab2p dependent pathway. PLoS Genet. 2013;9:e1003686. pmid:23966870
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

[ref79] 79. Spiltoir JI, Stratton MS, Cavasin MA, Demos-Davies K, Reid BG, Qi J, et al. BET acetyl-lysine binding proteins control pathological cardiac hypertrophy. J Mol Cell Cardiol. 2013;63:175–9. pmid:23939492
View Article
PubMed/NCBI
Google Scholar

[282] View Article

[283] PubMed/NCBI

[284] Google Scholar

[ref80] 80. Duerr GD, Heinemann JC, Suchan G, Kolobara E, Wenzel D, Geisen C, et al. The endocannabinoid-CB2 receptor axis protects the ischemic heart at the early stage of cardiomyopathy. Basic Res Cardiol. 2014;109:425. pmid:24980781
View Article
PubMed/NCBI
Google Scholar

[286] View Article

[287] PubMed/NCBI

[288] Google Scholar

[ref81] 81. Gonzalez C, Herradon E, Abalo R, Vera G, Perez-Nievas BG, Leza JC, et al. Cannabinoid/agonist WIN 55,212-2 reduces cardiac ischaemia-reperfusion injury in Zucker diabetic fatty rats: role of CB2 receptors and iNOS/eNOS. Diabetes Metab Res Rev. 2011;1:244–54.
View Article
Google Scholar

[290] View Article

[291] Google Scholar

[ref82] 82. Ford WR, Honan SA, White R, Hiley CR. Evidence of a novel site mediating anandamide-induced negative inotropic and coronary vasodilatator responses in rat isolated hearts. Br J Pharmacol. 2002;1:244–54.
View Article
Google Scholar

[293] View Article

[294] Google Scholar

[ref83] 83. Bi D, Toyama K, Lemaitre V, Takai J, Fan F, Jenkins DP, et al. The intermediate conductance calcium-activated potassium channel KCa3.1 regulates vascular smooth muscle cell proliferation via controlling calcium-dependent signaling. J Biol Chem. 2013;288:15843–53. pmid:23609438
View Article
PubMed/NCBI
Google Scholar

[296] View Article

[297] PubMed/NCBI

[298] Google Scholar

[ref84] 84. Kohler R. Single-nucleotide polymorphisms in vascular Ca2+-activated K+-channel genes and cardiovascular disease. Pflugers Arch. 2010;460:343–51. pmid:20043229
View Article
PubMed/NCBI
Google Scholar

[300] View Article

[301] PubMed/NCBI

[302] Google Scholar

[ref85] 85. Toyama K, Wulff H, Chandy KG, Azam P, Raman G, Saito T, et al. The intermediate-conductance calcium-activated potassium channel KCa3.1 contributes to atherogenesis in mice and humans. J Clin Invest. 2008;118:3025–37. pmid:18688283
View Article
PubMed/NCBI
Google Scholar

[304] View Article

[305] PubMed/NCBI

[306] Google Scholar

[ref86] 86. Yamaguchi M, Nakayama T, Fu Z, Naganuma T, Sato N, Soma M, et al. Relationship between haplotypes of KCNN4 gene and susceptibility to human vascular diseases in Japanese. Med Sci Monit. 2009;15:CR389–97. pmid:19644414
View Article
PubMed/NCBI
Google Scholar

[308] View Article

[309] PubMed/NCBI

[310] Google Scholar

[ref87] 87. Pereira NL, Aksoy P, Moon I, Peng Y, Redfield MM, Burnett JC, et al. Natriuretic peptide pharmacogenetics: membrane metallo-endopeptidase (MME): common gene sequence variation, functional characterization and degradation. J Mol Cell Cardiol. 2010;49:864–74. pmid:20692264
View Article
PubMed/NCBI
Google Scholar

[312] View Article

[313] PubMed/NCBI

[314] Google Scholar

[ref88] 88. Munagala VK, Burnett JC, Redfield MM. The natriuretic peptides in cardiovascular medicine. Curr Probl Cardiol. 2004;29:707–69. pmid:15550914
View Article
PubMed/NCBI
Google Scholar

[316] View Article

[317] PubMed/NCBI

[318] Google Scholar

[ref89] 89. Garg NJ. Inflammasomes in cardiovascular diseases. Am J Cardiovasc Dis. 2011;1:244–54. pmid:22254202
View Article
PubMed/NCBI
Google Scholar

[320] View Article

[321] PubMed/NCBI

[322] Google Scholar

[ref90] 90. Tang Y, Mi C, Liu J, Gao F, Long J. Compromised mitochondrial remodeling in compensatory hypertrophied myocardium of spontaneously hypertensive rat. Cardiovasc Pathol. 2014;23:101–6. pmid:24388463
View Article
PubMed/NCBI
Google Scholar

[324] View Article

[325] PubMed/NCBI

[326] Google Scholar

[ref91] 91. Walsh DA, McWilliams DF. Tachykinins and the cardiovascular system. Curr Drug Targets. 2006;7:1031–42. pmid:16918331
View Article
PubMed/NCBI
Google Scholar

[328] View Article

[329] PubMed/NCBI

[330] Google Scholar

[ref92] 92. Hoover DB, Chang Y, Hancock JC, Zhang L. Actions of tachykinins within the heart and their relevance to cardiovascular disease. Jpn J Pharmacol. 2000;84:367–73. pmid:11202607
View Article
PubMed/NCBI
Google Scholar

[332] View Article

[333] PubMed/NCBI

[334] Google Scholar

[ref93] 93. Tang H, Xiao K, Mao L, Rockman HA, Marchuk DA. Overexpression of TNNI3K, a cardiac-specific MAPKKK, promotes cardiac dysfunction. J Mol Cell Cardiol. 2013;54:101–11. pmid:23085512
View Article
PubMed/NCBI
Google Scholar

[336] View Article

[337] PubMed/NCBI

[338] Google Scholar

[ref94] 94. Wheeler FC, Tang H, Marks OA, Hadnott TN, Chu PL, Mao L, et al. Tnni3k modifies disease progression in murine models of cardiomyopathy. PLoS Genet. 2009;5:e1000647. pmid:19763165
View Article
PubMed/NCBI
Google Scholar

[340] View Article

[341] PubMed/NCBI

[342] Google Scholar

[ref95] 95. Theis JL, Zimmermann MT, Larsen BT, Rybakova IN, Long PA, Evans JM, et al. TNNI3K mutation in familial syndrome of conduction system disease, atrial tachyarrhythmia and dilated cardiomyopathy. Hum Mol Genet. 2014;23:5793–804. pmid:24925317
View Article
PubMed/NCBI
Google Scholar

[344] View Article

[345] PubMed/NCBI

[346] Google Scholar

[ref96] 96. Zoledziewska M, Costa G, Pitzalis M, Cocco E, Melis C, Moi L, et al. Variation within the CLEC16A gene shows consistent disease association with both multiple sclerosis and type 1 diabetes in Sardinia. Genes Immun. 2009;10:15–7. pmid:18946483
View Article
PubMed/NCBI
Google Scholar

[348] View Article

[349] PubMed/NCBI

[350] Google Scholar

[ref97] 97. Fox CS, Heard-Costa NL, Wilson PW, Levy D, D’Agostino RB, Atwood LD. Genome-wide linkage to chromosome 6 for waist circumference in the Framingham Heart Study. Diabetes. 2004;53:1399–402. pmid:15111512
View Article
PubMed/NCBI
Google Scholar

[352] View Article

[353] PubMed/NCBI

[354] Google Scholar

[ref98] 98. Lee KW, Abrahamowicz M, Leonard GT, Richer L, Perron M, Veillette S, et al. Prenatal exposure to cigarette smoke interacts with OPRM1 to modulate dietary preference for fat. J Psychiatry Neurosci. 2015;40:38–45. pmid:25266401
View Article
PubMed/NCBI
Google Scholar

[356] View Article

[357] PubMed/NCBI

[358] Google Scholar

[ref99] 99. Decramer M, Janssens W, Miravitlles M. Chronic obstructive pulmonary disease. Lancet. 2012;379:1341–51. pmid:22314182
View Article
PubMed/NCBI
Google Scholar

[360] View Article

[361] PubMed/NCBI

[362] Google Scholar

[ref100] 100. Currie GP, Butler CA, Anderson WJ, Skinner C. Phosphodiesterase 4 inhibitors in chronic obstructive pulmonary disease: a new approach to oral treatment. Br J Clin Pharmacol. 2008;65:803–10. pmid:18341675
View Article
PubMed/NCBI
Google Scholar

[364] View Article

[365] PubMed/NCBI

[366] Google Scholar

[ref101] 101. Giembycz MA. Phosphodiesterase-4: selective and dual-specificity inhibitors for the therapy of chronic obstructive pulmonary disease. Proc Am Thorac Soc. 2005;2:326–33. pmid:16267357
View Article
PubMed/NCBI
Google Scholar

[368] View Article

[369] PubMed/NCBI

[370] Google Scholar

[ref102] 102. Giembycz MA. Cilomilast: a second generation phosphodiesterase 4 inhibitor for asthma and chronic obstructive pulmonary disease. Expert Opin Investig Drugs. 2001;10:1361–79. pmid:11772257
View Article
PubMed/NCBI
Google Scholar

[372] View Article

[373] PubMed/NCBI

[374] Google Scholar

[ref103] 103. Li QS, Cheng P, Favis R, Wickenden A, Romano G, Wang H. SCN9A Variants may be Implicated in Neuropathic Pain Associated with Diabetic Peripheral Neuropathy and Pain Severity. Clin J Pain. 2015;
View Article
Google Scholar

[376] View Article

[377] Google Scholar

[ref104] 104. Huang Y, Zang Y, Zhou L, Gui W, Liu X, Zhong Y. The role of TNF-alpha/NF-kappa B pathway on the up-regulation of voltage-gated sodium channel Nav1.7 in DRG neurons of rats with diabetic neuropathy. Neurochem Int. 2014;75:112–9. pmid:24893330
View Article
PubMed/NCBI
Google Scholar

[379] View Article

[380] PubMed/NCBI

[381] Google Scholar

[ref105] 105. Liu DJ, Leal SM. A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions. PLoS Genetics. 2011;6:e1001156.
View Article
Google Scholar

[383] View Article

[384] Google Scholar

[ref106] 106. Schifano ED, Epstein MP, Bielak LF, Jhun MA, Kardia SL, Peyser PA, et al. SNP Set Association Analysis for Familial Data. Genet Epidemiol. 2012;36:797–810. pmid:22968922
View Article
PubMed/NCBI
Google Scholar

[386] View Article

[387] PubMed/NCBI

[388] Google Scholar

[ref107] 107. Oualkacha K, Dastani Z, Li R, Cingolani PE, Spector TD, Hammond CJ, et al. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness. Genet Epidemiol. 2013;37:366–376. pmid:23529756
View Article
PubMed/NCBI
Google Scholar

[390] View Article

[391] PubMed/NCBI

[392] Google Scholar

[ref108] 108. Efron B. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J American Statistical Association. 2004;99:96–104.
View Article
Google Scholar

[394] View Article

[395] Google Scholar

[ref109] 109. Long N, Dickson SP, Maia JM, Kim HS, Zhu Q, Allen AS. Leveraging prior information to detect causal variants via multi-variant regression. PLoS Comput Biol. 2013;9(6):e1003093. pmid:23762022
View Article
PubMed/NCBI
Google Scholar

[397] View Article

[398] PubMed/NCBI

[399] Google Scholar

[ref110] 110. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10(12):e1004729. pmid:25502226
View Article
PubMed/NCBI
Google Scholar

[401] View Article

[402] PubMed/NCBI

[403] Google Scholar

[ref111] 111. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. pmid:12351577
View Article
PubMed/NCBI
Google Scholar

[405] View Article

[406] PubMed/NCBI

[407] Google Scholar

[ref112] 112. Logan BR, Geliazkova MP, Rowe DB. An evaluation of spatial thresholding techniques in fMRI analysis. Hum Brain Mapp. 2008;29:1379–1389. pmid:18064589
View Article
PubMed/NCBI
Google Scholar

[409] View Article

[410] PubMed/NCBI

[411] Google Scholar

[ref113] 113. Fan J, Han X, Gu W. Control of the false discovery rate under arbitrary covariance dependence. J American Statistical Association. 2012;107:1019–1045.
View Article
Google Scholar

[413] View Article

[414] Google Scholar

[ref114] 114. Friguet C, Kloareg M, Causeur D. A Factor Model Approach to Multiple Testing Under Dependence. J the American Statistical Association. 2009;104:1406–15.
View Article
Google Scholar

[416] View Article

[417] Google Scholar

[ref115] 115. Genovese C, Wasserman L. Operating characteristics and extensions of the false discovery rate. J Royal Stat Soc B. 2002;64:499–517.
View Article
Google Scholar

[419] View Article

[420] Google Scholar

[ref116] 116. Sarkar SK. FDR-controlling stepwise procedure and their false negatives rates. J Statistical Planning and Inference. 2004;125:119–137.
View Article
Google Scholar

[422] View Article

[423] Google Scholar

[ref117] 117. Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinformatics. 2008;9. pmid:18613966
View Article
PubMed/NCBI
Google Scholar

[425] View Article

[426] PubMed/NCBI

[427] Google Scholar

[ref118] 118. Storey JD, Taylor JE, Siegmund D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J Royal Stat Soc B. 2004;66:187–205.
View Article
Google Scholar

[429] View Article

[430] Google Scholar

[ref119] 119. Sarkar SK. False discovery and false nondiscovery rates in single-step multiple testing procedures. The Annals of Statistics. 2006;34:394–415.
View Article
Google Scholar

[432] View Article

[433] Google Scholar

[ref120] 120. Cai T, Jin J, Low M. Estimation and Confidence Sets For Sparse Normal Mixtures. Ann Statist. 2007;35:2421–2449.
View Article
Google Scholar

[435] View Article

[436] Google Scholar

Figures

Abstract

Author Summary

Introduction

Results

Simulation studies

Simulation designs.

Comparison across different effect sizes and numbers of variants.

Comparison across different sample sizes and numbers of causal variants.

Analysis of CoLaus cardiovascular diseases dataset

Pinpointing individual variants in candidate genes for follow-up analysis.

Inferring disease-related genes with single-variant results.

Comparison with Bonferroni and FDR at varying control levels.

Discussion

Methods

Adaptive false-negative control of individual rare variants

Step 2. Estimating π.

Step 3. Obtaining the AFNC cut point .

Supporting Information

S1 Text. Derivation of signal missing rate control.

S1 Fig. Inclusion rate of causal variants across varying effect sizes and numbers of variants at s = 50.

S2 Fig. Comparisons across varying effect sizes and numbers of variants at s = 25.

S3 Fig. Inclusion rate of causal variants across varying effect sizes and numbers of variants at s = 25.

S4 Fig. Inclusion rate of causal variants across sample sizes and numbers of causal variants at C = 0.5.

S5 Fig. Comparisons across varying sample sizes and numbers of causal variants at C = 0.25.

S6 Fig. Inclusion rate of causal variants across sample sizes and numbers of causal variants at C = 0.25.

S7 Fig. Comparisons across varying effect sizes and numbers of variants at s = 50 with the Fndr.

S1 Table. Empirical type I error rates across varying sample sizes.

S2 Table. Full annotation of AFNC-selected variants in the analysis of CoLaus data.

S1 File. Files for simulations and analysis of CoLaus data.

S1 Dataset. Single-locus and gene-level p-values used in the analysis of CoLaus data.

Acknowledgments

Author Contributions

References