Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genome Wide Association Studies for Milk Production Traits in Chinese Holstein Population

  • Li Jiang ,

    Contributed equally to this work with: Li Jiang, Jianfeng Liu, Dongxiao Sun

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Jianfeng Liu ,

    Contributed equally to this work with: Li Jiang, Jianfeng Liu, Dongxiao Sun

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Dongxiao Sun ,

    Contributed equally to this work with: Li Jiang, Jianfeng Liu, Dongxiao Sun

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Peipei Ma,

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Xiangdong Ding,

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Ying Yu,

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

  • Qin Zhang

    qzhang@cau.edu.cn

    Affiliation Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, People's Republic of China

Abstract

Genome-wide association studies (GWAS) based on high throughput SNP genotyping technologies open a broad avenue for exploring genes associated with milk production traits in dairy cattle. Motivated by pinpointing novel quantitative trait nucleotide (QTN) across Bos Taurus genome, the present study is to perform GWAS to identify genes affecting milk production traits using current state-of-the-art SNP genotyping technology, i.e., the Illumina BovineSNP50 BeadChip. In the analyses, the five most commonly evaluated milk production traits are involved, including milk yield (MY), milk fat yield (FY), milk protein yield (PY), milk fat percentage (FP) and milk protein percentage (PP). Estimated breeding values (EBVs) of 2,093 daughters from 14 paternal half-sib families are considered as phenotypes within the framework of a daughter design. Association tests between each trait and the 54K SNPs are achieved via two different analysis approaches, a paternal transmission disequilibrium test (TDT)-based approach (L1-TDT) and a mixed model based regression analysis (MMRA). In total, 105 SNPs were detected to be significantly associated genome-wise with one or multiple milk production traits. Of the 105 SNPs, 38 were commonly detected by both methods, while four and 63 were solely detected by L1-TDT and MMRA, respectively. The majority (86 out of 105) of the significant SNPs is located within the reported QTL regions and some are within or close to the reported candidate genes. In particular, two SNPs, ARS-BFGL-NGS-4939 and BFGL-NGS-118998, are located close to the DGAT1 gene (160bp apart) and within the GHR gene, respectively. Our findings herein not only provide confirmatory evidences for previously findings, but also explore a suite of novel SNPs associated with milk production traits, and thus form a solid basis for eventually unraveling the causal mutations for milk production traits in dairy cattle.

Introduction

Over the last decades, advances in DNA-based marker technology make it possible to identify genome regions (namely quantitative trait loci, QTL) underlying complex traits such as milk yield in dairy cattle. Instead of traditional animal breeding programmes solely relying on phenotype and pedigree information, the incorporation of detected QTL into genetic evaluation provides a great potential to enhance selection accuracies, hence expediting the genetic improvement of animal productivity.

In dairy cattle, since the seminal work on QTL mapping by Georges el al [1], a large number of articles have been published concerning detection of QTLs for milk production traits. So far a total number of 1,137 QTL for milk production traits have been reported via genome scan based on marker-QTL linkage analyses (http://www.animalgenome.org/QTLdb/cattle.html, May 22, 2010). The limitations of QTL mapping using linkage analysis (LA) and/or linkage disequilibrium (LD) [2] based on panels of low to moderate density markers have been well documented previously [3], [4]. In the past decades merely few strong candidate genes with potential effects on milk production traits, i.e., the DGAT1 gene [5] and the GHR gene [6], have been identified and/or functionally confirmed from those findings derived from QTL linkage analyses and fine mapping studies.

With the advent of genome-wide panels of single nucleotide polymorphisms (SNPs), SNPs have been widely used for the detection and localization of QTL for complex traits in many species [7], and have proved powerful and useful in identification of casual mutations associated with economically important traits in livestock [8], [9], [10], [11], [12], [13], [14], [15] as well as human diseases [16], [17], [18], [19]. Most recently, along with maturing of genome sequencing and high throughput SNP genotyping technologies, genome-wide association studies (GWAS) are becoming practical for exploring genes associated with complex traits. Compared with traditional QTL mapping strategy, GWAS brings on major advantages both in power to detect causal variants with modest effects and in defining narrower genomic regions harboring causal variants [20]. GWAS has been widely accepted as a primary approach for gene finding and achieved huge success in identifying genes conferring modest disease risks in human. However, only few GWAS focusing on identifying genes for milk production traits have been performed [21], [22]. Furthermore, the common limitation of these studies is that low-density SNP makers were employed in the analyses, leading to a decrease in power to capturing causal genes.

Motivated by searching for novel casual variants for milk production traits beyond previous findings via traditional linkage studies, the present study is to perform GWAS to detect potential casual genetic variants for milk production traits, using the Illumina BovineSNP50 BeadChip. The identified SNP loci may be considered as preliminary foundation for further replication studies and eventually unraveling the causal mutations for milk production traits in dairy cattle.

Materials and Methods

The blood samples were collected along with the regular quarantine inspection of the farms, so no ethical approval was required for this study.

Animal resource

A daughter design was employed in this study. In total 2,093 daughters as well as their 14 corresponding sires were collected to construct the study population. The numbers of daughters of the 14 sires range from 83 to 358 daughters with an average of 150. These daughters were from 15 Holstein cattle farms in Beijing, China, where regular and standard performance testing (dairy herd improvement, DHI) has been conducted since 1999. The official up to date estimated breeding values (EBVs) of five milk production traits, including milk yield (MY), fat yield (FY), protein yield (PY), fat percentage (FP), and protein percentage (PP) were used as phenotypes in this study. These EBVs were obtained based on a multiple trait random regression test-day model [23] using the software RUNGE provided by Canadian Dairy Network (CDN) (http://www.cdn.ca). The descriptive statistics of these EBVs for the five traits as well as the average reliabilities of EBVs of the 2,093 daughters are presented in Table 1. It is notable that the program RUNGE gave two sets of accuracies of EBVs for the five milk production traits. One is for milk yield (MY) and the other for the four milk content traits (FY, PY, FP, and PP). This is because that the amount of information used for calculating EBVs was different for MY and for the 4 milk content traits, while all of the 4 milk content traits provided the same amount of information for calculating EBVs.

thumbnail
Table 1. Descriptive statistics of EBVs and the accuracy of five milk production traits for 2,093 daughters.

https://doi.org/10.1371/journal.pone.0013661.t001

Genotyping

DNA was extracted from blood sample of the daughters and semen sample of the sires using the routine procedures. DNA was quantified and genotyped using the Illumina BovineSNP50 BeadChip containing 54001 SNPs, which is a multi-sample genotyping panel powered by Illumina's Infinium® II Assay. Features of the Illumina BovineSNP50 BeadChip have been detailed previously [24]. All samples were genotyped using BEADSTUDIO (Illumina) and a custom cluster file developed from the 2180 samples.

Genotype quality control

To assess the technical reliability of the genotyping panel, a randomly selected DNA sample was genotyped twice and over 99% identity of called genotypes (two mismatches) was obtained. This demonstrates the technically robust feature of the 50K SNP BeadChip panel employed herein.

The quality control procedure can be largely split into two categories, including individual exclusion and SNP removal, as follows:

Firstly, an individual would be excluded from the analyses if it had more than 10% missing genotypes or its SNP genotypes had a Mendelian error rate above 2%. For the second criterion, for each sire-daughter pair, we randomly choose 10,000 genotyped SNP loci for which both the sire and the daughter are homozygotes. A Mendelian error happens herein if the two homozygotes are different in the context that the maternal genotype is unavailable. Accordingly, if more than 200 out of 10,000 SNP have Mendelian errors, the daughter will be removed from the sample.

Secondly, a SNP would be removed if (1) its call rate was less than 90%, or (2) its minor allele frequency (MAF) was less than 3%, or (3) it was severely depart from Hardy Weinberg Equilibrium (HWE) with a P value lower than 10−6, or (4) its minor genotype frequency was less than five individuals.

After the quality control procedures, 73 daughters with >10% missing genotypes and 205 daughters with Mendelian error rate above 2% were excluded, leading to 1,815 daughters remaining for the association analysis. On the other hand, we removed 1,218 SNPs with <90% genotype call rate, 11,008 SNPs with a MAF <0.03, 482 SNPs with extreme value of HWE statistics (P<10−06), and 1,073 SNPs with minor genotype frequency <5 individuals. Eventually, 40,220 SNPs (74.5%) passed these quality control filters. The distribution of the remaining SNPs after filtering and the average distances between adjacent SNPs on each chromosome are given in Table 2. In addition, for the L1-TDT analyses, we excluded extra 1,057 SNPs for which all paternal genotypes are homozygotes, and 39,163 SNPs were finally utilized.

thumbnail
Table 2. Distributions of SNPs after quality control and the average distances between adjacent SNPs on each chromosome.

https://doi.org/10.1371/journal.pone.0013661.t002

Statistical analyses

Two methods are adopted to perform GWAS in our studies as follows:

TDT-based single locus regression analyses (L1-TDT).

L1-TDT is a TDT-based association procedure [25], which is specifically suitable for the situation where only a single parent instead of both parents are genotyped for TDT analyses. As merely the genotypes of bulls and their daughters are available within the framework of a daughter design, we employed it to explore the existence of associations between phenotypes and SNP allele transmissions from bulls to their daughters within sire families. Under such circumstance, a phenotypic observation, i.e., EBV considered herein, can be modeled by a SNP effect within family due to transmission disequilibrium of the SNP alleles as well as the effect of the sire corresponding to each half-sib family. For each milk production trait, the equation of the model is given as follows:(1)where yij is the EBV of the jth daughter of sire i, μ is the overall mean, si is the fixed effect of sire i, TDSij is an indicator variable with a value −1, 0 or 1 to indicate the transmission of a specific SNP allele from sire i to his jth daughter, which is determined according to [26], β is the regression coefficient (or the substitution effect of the SNP), and eij is the residual error. For each SNP, β is estimated via a weighted least squares analysis with the weights equal to 1/RELij, where RELij is the reliability of the EBV of daughter j in family i. The association between the SNP and the trait is tested via the F-test.

Mixed model based single locus regression analyses (MMRA).

Similar to the studies of [21] and [22], we performed association test for each SNP via regression analysis based on the following linear mixed model:(2)where y is the vector of EBVs of all daughters, b is the regression coefficient of EBV on SNP genotypes, x is the vector of the SNP genotype indicators which takes values 0, 1 or 2 corresponding to the three genotypes 11, 12 and 22 (assuming 2 is the allele with a minor frequency), a is the vector of the residual polygenetic effects with (where A is the additive genetic relationship matrix and is the additive variance, and e is the vector of residual errors with (where W is a diagonal matrix with the diagonal elements equal to 1/RELij and is the residual error variance). For each SNP, the estimate of b and the corresponding sampling variance can be obtained via mixed model equations (MME), and a Wald chi-squared statistic with df = 1 is constructed to examine whether the SNP is associated with the trait.

We employed Fortran 95 to code the computing programs for L1-TDT and MMRA and they are available upon request.

Statistical Inference

For both analyses, the Bonferroni method was adopted to adjust for multiple testing from the number of SNP loci detected. We declared a significant SNP at the genome-wise significance level if a raw P value <0.05/N, here N is the number of SNP loci tested in analyses.

Population stratification assessment

Confounding due to population stratification has been considered as a major plague to the validity of genetic association studies [27]. To view if the population stratification exists in our experimental population, we examined the distribution of the test statistics obtained from the numerous association tests performed and assessed their deviation from the expected distribution of no SNP being associated with the trait of interest utilizing a quantile-quantile (Q-Q) plot, which is a routine and most frequently used tool for scrutinizing the population stratification in GWAS. Since merely MMRA method is not immune to potential population stratification, “Q-Q” plots for the test statistics of MMRA were conducted for the five traits.

Results

Significant SNPs

The profiles of P values (in terms of −log(p)) of all tested SNPs for the five investigated traits are shown in Fig. 1. The numbers of genome-wise significant SNPs detected by L1-TDT or MMRA for the five traits are presented in Table 3. In total, the numbers of significant SNPs detected by either L1-TDT or MMRA for the five traits are 20, 9, 21, 65 and 28, respectively. Since some of these SNPs are associated with more than one trait, the total number of distinct identified SNPs is 105. Of these 105 SNPs, 38 were commonly detected by both methods, while four and 63 were solely detected by L1-TDT and MMRA, respectively. With exception of only four SNPs, all SNPs detected by L1-TDT were also detected by MMRA. The details of these significant SNPs for the five traits, including their positions in the genome, the nearest known genes and the raw P values, are given in Tables 4 through 8, respectively, and further described as follows.

thumbnail
Figure 1. Genome-wide plots of −log10(p-values) for association of SNP loci with five milk production traits in sequential order.

Chromosomes 1–29 and X are shown separated by color. Fig. 1-a1, 1-a2, 1-a3, 1-a4 and 1-a5 refer to plots generated by L1-TDT for MY, FY, PY, FP and PP, respectively. Fig. 1-b1, 1-b2, 1-b3, 1-b4 and 1-b5 refer to plots generated by MMRA for MY, FY, PY, FP and PP, respectively. The corresponding horizontal lines indicate the genome-wise significance levels (−log10(1.28×10−6) for L1-TDT and −log10(1.24×10−6) for MMRA).

https://doi.org/10.1371/journal.pone.0013661.g001

thumbnail
Table 3. Numbers of significant SNPs detected by L1-TDT and MMRA.

https://doi.org/10.1371/journal.pone.0013661.t003

thumbnail
Table 4. Genome-wise significant (p<0.05) SNPs for milk yield (MY).

https://doi.org/10.1371/journal.pone.0013661.t004

thumbnail
Table 5. Genome-wise significant (p<0.05) SNPs with fat yield (FY).

https://doi.org/10.1371/journal.pone.0013661.t005

thumbnail
Table 6. Genome-wise significant (p<0.05) SNPs with protein yield (PY).

https://doi.org/10.1371/journal.pone.0013661.t006

thumbnail
Table 7. Genome-wise significant (p<0.05) SNPs with fat percentage (FP).

https://doi.org/10.1371/journal.pone.0013661.t007

thumbnail
Table 8. Genome-wise significant (p<0.05) SNPs with protein percentage (PP).

https://doi.org/10.1371/journal.pone.0013661.t008

Milk Yield (MY).

As seen from Table 4, 14 out of 20 SNPs are located within a 3.63 Mb segment (between 0.07 and 3.7 Mb) on BTA 14. Ten of them fall into the regions that have been reported to harbor QTL for MY [5], [21], [28], [29], [30], [31], [32]. Furthermore, 6 of these SNPs are harbored within the regions of known genes, and the others are located 87 to 84,554 bp away from the nearest known genes.

Fat Yield (FY).

As presented in Table 5, 6 out of 9 SNPs are clustered within a 0.55 Mb segment (between 0.05 and 0.6 Mb) on BTA 14. Eight out of them fall in the regions which have been reported to harbor QTL for FY previously [5], [28], [31], [33], [34], [35]. Furthermore, two of these SNPs fall within the regions of known genes, and the others are located 160 to 285,289 bp away from the nearest known genes.

Protein Yield (PY).

As shown in Table 6, among these 21 SNPs, 7 out of them are located within a 3.33 Mb segment (between 0.07 to 3.4 Mb) on BTA 14. Further, 14 out of these SNPs are within the QTL regions for PY reported in previous studies [5], [21], [28], [33], [36], [37], [38], [39]; 5 of them are located within the regions of known genes, and the others are located 87 to 385,764 bp away from the nearest known genes.

Fat Percentage (FP).

From Table 7, 60 are located within a 6.2 Mb segment (between 0.05 to 6.25 Mb) on BTA 14. 53 of them are located within the QTL regions for FP reported in previous studies [5], [34], [39], [40], [41], [42], [43], [44]. Further, 27 of the 65 detected SNPs are located within the regions of known genes, and the others are 71 to 560,215 bp away from the nearest known genes.

Protein Percentage (PP).

As given by Table 8, out of 28 identified SNPs, there are 4, 7, and 14 SNPs located within a 8.0 Mb segment (between 33.9 to 41.9 Mb) on BTA6, a 2.59 Mb segment (between 0.23 to 2.82 Mb) on BTA14, and a 7.9 Mb segment (between 34.0 to 41.9 Mb) on BTA20, respectively. Among these 28 SNPs, 17 are located within the QTL regions for PP identified in previous studies [5], [28], [29], [34], [42], [45]. Further, 11 of these 28 SNPs are located within the regions of known genes, and the others are 160 to 401,634 bp away from the nearest known genes.

Population stratification assessment

The “Q-Q” plots for the test statistics of MMRA are shown in Fig. 2-1 to 2-5. From these plots, it is apparent that the distributions of the χ2 statistics generated from the association tests across the SNPs tested show no evidence of overall systematic bias. That is, the observed χ2 statistics of the significant SNPs are above the expected χ2 statistics, which are largely at the adjusted genome-wide significance level. The profiles of the Q-Q plots clearly show that the significant SNPs identified by MMRA are unlikely threaten by potential population stratification.

thumbnail
Figure 2. Quantile-quantile (Q-Q) plots of genome-wide association results by MMRA for five milk production traits.

Under the null hypothesis of no association at any SNP locus, the points would be expected to follow the slope lines. Deviations from the slope lines correspond to loci that deviate from the null hypotheses.

https://doi.org/10.1371/journal.pone.0013661.g002

Discussion

In this study, we performed a GWA study for five milk production traits using a daughter design in Chinese Holstein population. To our knowledge, this is one of the first GWA studies for milk production traits using the Illumina BovineSNP50 BeadChip. Two statistical methods, L1-TDT and MMRA, were implemented to analyze association between SNPs and phenotypes. These two methods belong to two distinct analytical approaches, i.e., family-based (L1-TDT) and population-based (MMRA) approaches, respectively, both of which have been widely employed in GWAS. Comparisons between the two methods have been well conducted by many investigators [27], [46], [47], [48]. Consensus with respect to their performance is twofold. On the one hand, population-based analyses largely outperform family-based analyses in statistical power and efficiency. The power limitation of family-based analyses results from “overmatching” on genotype [49]. Much fewer significant SNPs detected by L1-TDT compared with MMRA in this study present consistent evidence for this aspect in practice. On the other hand, family-based analysis always guards against population admixture/stratification caused by recent migration and/or non-random mating, and do not give spurious significant results, although at the expense of some loss of power [50]. The “Q-Q” plots for the test statistics of MMRA (Fig. 2-1 to 2-5) demonstrate that no population admixture/stratification exists in our population. Therefore, it is safe to declare that the SNPs detected by MMRA as well as L1-TDT have convincing associations with the traits of interest.

BTA14 has been received wide attention by many investigators. Apart from a large number of QTL reported on BAT14 [34], [39], [51], [52], the well-known DGAT1 gene[5] located at ∼0.44Mb is generally accepted as a major gene affecting milk production traits. Bennewitz et al. [28] revisited the QTL on BTA14 and concluded that there should exist a further conditional QTL which should be in linkage with the DGAT1 gene, and possible epistatic effects arising from them may be an additional source of genetic variation for milk production traits. Indeed, Kaupe et al. [53] recently reported that the CYP11B1 gene located at ∼1.33Mbp has significant effects on MY, PY, FP and PP, and the allele substitution effects of CYP11B1 and DGAT1 together explained more variation in milk production traits than DGAT1 alone. In our study, an apparent feature of our findings is that a large proportion of the significant SNPs (61 out of 105) are located on BTA14. Of the 61 SNPs, 59 are located within the reported QTL regions. In particular, all segments on BTA14 which harbor multiple SNPs for the five traits also harbor the DGAT1 gene, and the four segments for MY, PY, FP and PP also harbor the CYP11B1 gene. Within these segments, 13 SNPs are located very close (within 1Mb) to the DGAT1 gene with the closest one (ARS-BFGL-NGS-4939) only 160bp away from it and 14 SNPs very close to the CYP11B1 gene with the closest one (Hapmap25486-BTC-072553 ) only 8,693bp away from it.

In addition to the SNPs on BTA14, most (27 out of 44) of the significant SNPs on other chromosomes are also located within the reported QTL regions. Further, some SNPs are also within or close (within 1Mb) to the reported candidate genes (for a summary of cattle candidate genes for milk production traits, see [54]). In particular, a SNP (BFGL-NGS-118998) located at 34,036,832 bp on BTA20 was found to fall within the GHR gene, which is also generally accepted as a functional causal gene affecting milk yield and components [5], [6]. The other SNPs include the SNPs BTA-121739-no-rs and Hapmap24324-BTC-062449 on BTA6, which are 20,591bp and 450,868bp away from the ABCG2 gene [55], respectively, and the SNP ARS-BFGL-NGS-26919 on BTA11, which is 41,562bp away from the LGB gene [56].

It is notable that for either L1-TDT or MMRA some detected SNPs are associated with phenotypic variation in multiple production traits, including the SNPs ARS-BFGL-NGS-4939, ARS-BFGL-NGS-57820, and ARS-BFGL-NGS-107379 on BTA14 (for all of the five traits), the SNPs ARS-BFGL-NGS-94706 and ARS-BFGL-NGS-34135 on BTA14 (for MY, FY, FP, and PP), the SNPs Hapmap30383-BTC-005848, Hapmap30646-BTC-002054, ARS-BFGL-NGS-100480, and UA-IFASA-6329 on BTA14 (for MY, PY, and FP), the SNP UA-IFASA-6878 on BTA14 (for MY, FP, and PP), the SNPs Hapmap52798-ss46526455, BFGL-NGS-110563, Hapmap25486-BTC-072553, and Hapmap30086-BTC-002066 on BTA14 (for MY and FP), the SNPs ARS-BFGL-NGS-91705 on BTA1, Hapmap38643-BTA-95454 on BTA3, BFGL-NGS-110018 on BTA5, and Hapmap50053-BTA-61516 on BTA26 (for MY and PY), the SNPs Hapmap30381-BTC-005750 on BTA14 (for FY and FP), and the SNP Hapmap27703-BTC-053907 on BTA14 (for FP and PP). This could be explained by pleiotropic effects of these SNPs on multiple milk production traits, leading to genetic correlations among them and there were similar result in many prior studies [28], [53].

In this study, we performed GWAS in the way of SNP by SNP individually via regressing the observations of a single trait on either the genotypes of a SNP (MMRA) or the allele transmission patterns of a SNP from bulls to corresponding half-sib offspring (L1-TDT). Previous studies have shown that single marker tests provide similar or greater power than haplotype-based approaches [57], [58]. In contrast to haplotype-based methods, the main advantage of the single locus test is that it does not necessitate information of SNP positions and reconstruction haplotypes of multiple SNP loci. Thus, it is the preferable method for large scale genome-wise association analyses, e.g., GWAS. Also, we individually perform GWAS for each of five milk production traits. This is the most conventional strategy for current GWAS. However, the five milk production traits considered here are generally regarded as correlated and thus should share common environmental/genetic factors. A multiple traits instead of single trait analysis may be a promising way to take correlations among these traits into consideration. Multivariate analyses have been widely adopted in linkage studies [59], [60], [61], [62], [63], and it has been generally accepted that multivariate analyses outperform univariate analyses in terms of increasing statistical power and precision of parameter estimation [64], [65]. In the next step, an optimal multiple traits analytical strategy will be pursued to further enhance our GWA studies.

In our study, the EBVs of daughters were used as phenotypes for association analysis. Besides EBVs, yield deviation (YD) and de-regressed EBVs of individuals are also commonly used as phenotypic observations in GWAS as well as in LA and LA/LD analyses for milk production traits. Comparison among these three kinds of phenotypes with respect to their influence on QTL mapping [66] and marker assisted selection studies [67] demonstrate that none of them has absolute advantages over the others. We also compared using EBVs and de-regressed EBVs as phenotypes for our GWAS and it turned out that the findings of them are basically overlap (data not shown). Therefore, only the findings from using EBVs are reported herein.

In all, the present study revealed 105 genome-wise significant SNPs for milk production traits in Chinese dairy cattle population using two different association analysis approaches (L1-TDT and MMRA). Most of these SNPs (86 out of 105) are located within the previously reported QTL regions, and some within or close to the reported candidate genes. The general consistence of the significant SNPs detected herein with the reported QTL and candidate genes and the agreement of the results of the two analysis approaches present strong support for the outcomes of this study. Our findings herein lay a preliminary foundation for guiding follow-up replication studies, and eventually revealing the causal mutations underlying milk production traits in dairy cattle.

Acknowledgments

We are grateful to the two reviewers for their insightful comments and constructive suggestions that greatly improved our manuscript. We thank Dairy Association of China for supplying the official EBVs and Beijing Dairy Cattle Center for providing blood and semen samples.

Author Contributions

Conceived and designed the experiments: QZ. Performed the experiments: LJ DS PM. Analyzed the data: JL. Contributed reagents/materials/analysis tools: XD YY. Wrote the paper: LJ JL QZ.

References

  1. 1. Georges M, Nielsen D, Mackinnon M, Mishra A, Okimoto R, et al. (1995) Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics 139: 907–920.
  2. 2. Meuwissen TH, Goddard ME (2001) Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 33: 605–634.
  3. 3. Goddard ME, Hayes BJ (2009) Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet 10: 381–391.
  4. 4. Andersson L, Georges M (2004) Domestic-animal genomics: deciphering the genetics of complex traits. Nat Rev Genet 5: 202–212.
  5. 5. Grisart B, Farnir F, Karim L, Cambisano N, Kim JJ, et al. (2004) Genetic and functional confirmation of the causality of the DGAT1 K232A quantitative trait nucleotide in affecting milk yield and composition. Proc Natl Acad Sci U S A 101: 2398–2403.
  6. 6. Blott S, Kim JJ, Moisio S, Schmidt-Kuntzel A, Cornet A, et al. (2003) Molecular dissection of a quantitative trait locus: a phenylalanine-to-tyrosine substitution in the transmembrane domain of the bovine growth hormone receptor is associated with a major effect on milk yield and composition. Genetics 163: 253–266.
  7. 7. Daw EW, Heath SC, Lu Y (2005) Single-nucleotide polymorphism versus microsatellite markers in a combined linkage and segregation analysis of a quantitative trait. BMC Genet 6: Suppl 1S32.
  8. 8. Georges M (2007) Mapping, fine mapping, and molecular dissection of quantitative trait Loci in domestic animals. Annu Rev Genomics Hum Genet 8: 131–162.
  9. 9. Amills M, Vidal O, Varona L, Tomas A, Gil M, et al. (2005) Polymorphism of the pig 2,4-dienoyl CoA reductase 1 gene (DECR1) and its association with carcass and meat quality traits. J Anim Sci 83: 493–498.
  10. 10. Kaminski S, Help H, Brym P, Rusc A, Wojcik E (2008) SNiPORK - a microarray of SNPs in candidate genes potentially associated with pork yield and quality - development and validation in commercial breeds. Anim Biotechnol 19: 43–69.
  11. 11. Brym P, Kaminski S, Wojcik E (2005) Nucleotide sequence polymorphism within exon 4 of the bovine prolactin gene and its associations with milk performance traits. J Appl Genet 46: 179–185.
  12. 12. Haegeman A, Williams JL, Law A, Van Zeveren A, Peelman LJ (2003) Mapping and SNP analysis of bovine candidate genes for meat and carcass quality. Anim Genet 34: 349–353.
  13. 13. Vallet JL, Freking BA, Leymaster KA, Christenson RK (2005) Allelic variation in the erythropoietin receptor gene is associated with uterine capacity and litter size in swine. Anim Genet 36: 97–103.
  14. 14. Vallet JL, Freking BA, Leymaster KA, Christenson RK (2005) Allelic variation in the secreted folate binding protein gene is associated with uterine capacity in swine. J Anim Sci 83: 1860–1867.
  15. 15. Horin P, Osickova J, Necesankova M, Matiasovic J, Musilova P, et al. (2008) Single nucleotide polymorphisms of interleukin-1 beta related genes and their associations with infection in the horse. Dev Biol (Basel) 132: 347–351.
  16. 16. Craig DW, Stephan DA (2005) Applications of whole-genome high-density SNP genotyping. Expert Rev Mol Diagn 5: 159–170.
  17. 17. Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV, et al. (2007) A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J Clin Psychiatry 68: 613–618.
  18. 18. Ng CC, Yew PY, Puah SM, Krishnan G, Yap LF, et al. (2009) A genome-wide association study identifies ITGA9 conferring risk of nasopharyngeal carcinoma. J Hum Genet 54: 392–397.
  19. 19. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341–1345.
  20. 20. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6: 95–108.
  21. 21. Daetwyler HD, Schenkel FS, Sargolzaei M, Robinson JA (2008) A genome scan to detect quantitative trait loci for economically important traits in Holstein cattle using two methods and a dense single nucleotide polymorphism map. J Dairy Sci 91: 3225–3236.
  22. 22. Kolbehdari D, Wang Z, Grant JR, Murdoch B, Prasad A, et al. (2009) A whole genome scan to map QTL for milk production traits and somatic cell score in Canadian Holstein bulls. J Anim Breed Genet 126: 216–227.
  23. 23. Schaeffer LR, Jamrozik J, Kistemaker GJ, Van Doormaal BJ (2000) Experience with a test-day model. J Dairy Sci 83: 1135–1144.
  24. 24. Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, et al. (2009) Development and characterization of a high density SNP genotyping assay for cattle. PLoS One 4: e5350.
  25. 25. Kolbehdari D, Jansen GB, Schaeffer LR, Allen BO (2006) Transmission disequilibrium test for quantitative trait loci detection in livestock populations. J Anim Breed Genet 123: 191–197.
  26. 26. Sun FZ, Flanders WD, Yang QH, Zhao HY (2000) Transmission/disequilibrium tests for quantitative traits. Ann Hum Genet 64: 555–565.
  27. 27. Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. Jama 299: 1335–1344.
  28. 28. Bennewitz J, Reinsch N, Paul S, Looft C, Kaupe B, et al. (2004) The DGAT1 K232A mutation is not solely responsible for the milk production quantitative trait locus on the bovine chromosome 14. J Dairy Sci 87: 431–442.
  29. 29. Bagnato A, Schiavini F, Rossoni A, Maltecca C, Dolezal M, et al. (2008) Quantitative trait loci affecting milk yield and protein percentage in a three-country Brown Swiss population. J Dairy Sci 91: 767–783.
  30. 30. Harder B, Bennewitz J, Reinsch N, Thaller G, Thomsen H, et al. (2006) Mapping of quantitative trait loci for lactation persistency traits in German Holstein dairy cattle. J Anim Breed Genet 123: 89–96.
  31. 31. Lund MS, Guldbrandtsen B, Buitenhuis AJ, Thomsen B, Bendixen C (2008) Detection of quantitative trait loci in Danish Holstein cattle affecting clinical mastitis, somatic cell score, udder conformation traits, and assessment of associated effects on milk yield. J Dairy Sci 91: 4028–4036.
  32. 32. Wiener P, Maclean I, Williams JL, Woolliams JA (2000) Testing for the presence of previously identified QTL for milk production traits in new populations. Anim Genet 31: 385–395.
  33. 33. Schulman NF, Sahana G, Lund MS, Viitala SM, Vilkki JH (2008) Quantitative trait loci for fertility traits in Finnish Ayrshire cattle. Genet Sel Evol 40: 195–214.
  34. 34. Heyen DW, Weller JI, Ron M, Band M, Beever JE, et al. (1999) A genome scan for QTL influencing milk production and health traits in dairy cattle. Physiol Genomics 1: 165–175.
  35. 35. Awad A, Russ I, Emmerling R, Forster M, Medugorac I (2010) Confirmation and refinement of a QTL on BTA5 affecting milk production traits in the Fleckvieh dual purpose cattle breed. Anim Genet 41: 1–11.
  36. 36. Looft C, Reinsch N, Karall-Albrecht C, Paul S, Brink M, et al. (2001) A mammary gland EST showing linkage disequilibrium to a milk production QTL on bovine Chromosome 14. Mamm Genome 12: 646–650.
  37. 37. Gao H, Fang M, Liu J, Zhang Q (2009) Bayesian shrinkage mapping for multiple QTL in half-sib families. Heredity 103: 368–376.
  38. 38. Lund MS, Sorensen P, Madsen P, Jaffrezic F (2008) Detection and modelling of time-dependent QTL in animal populations. Genet Sel Evol 40: 177–194.
  39. 39. Ashwell MS, Heyen DW, Sonstegard TS, Van Tassell CP, Da Y, et al. (2004) Detection of quantitative trait loci affecting milk production, health, and reproductive traits in Holstein cattle. J Dairy Sci 87: 468–475.
  40. 40. Rodriguez-Zas SL, Southey BR, Heyen DW, Lewin HA (2002) Detection of quantitative trait loci influencing dairy traits using a model for longitudinal data. J Dairy Sci 85: 2681–2691.
  41. 41. Viitala SM, Schulman NF, de Koning DJ, Elo K, Kinos R, et al. (2003) Quantitative trait loci affecting milk production traits in Finnish Ayrshire dairy cattle. J Dairy Sci 86: 1828–1836.
  42. 42. Boichard D, Grohs C, Bourgeois F, Cerqueira F, Faugeras R, et al. (2003) Detection of genes influencing economic traits in three French dairy cattle breeds. Genet Sel Evol 35: 77–101.
  43. 43. Bennewitz J, Reinsch N, Grohs C, Leveziel H, Malafosse A, et al. (2003) Combined analysis of data from two granddaughter designs: A simple strategy for QTL confirmation and increasing experimental power in dairy cattle. Genet Sel Evol 35: 319–338.
  44. 44. Ashwell MS, Van Tassell CP, Sonstegard TS (2001) A genome scan to identify quantitative trait loci affecting economically important traits in a US Holstein population. J Dairy Sci 84: 2535–2542.
  45. 45. Viitala S, Szyda J, Blott S, Schulman N, Lidauer M, et al. (2006) The role of the bovine growth hormone receptor and prolactin receptor genes in milk, fat and protein production in Finnish Ayrshire dairy cattle. Genetics 173: 2151–2164.
  46. 46. Benyamin B, Visscher PM, McRae AF (2009) Family-based genome-wide association studies. Pharmacogenomics 10: 181–190.
  47. 47. Gauderman WJ, Witte JS, Thomas DC (1999) Family-based association studies. J Natl Cancer Inst Monogr 31–37.
  48. 48. Little J, Bradley L, Bray MS, Clyne M, Dorman J, et al. (2002) Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. Am J Epidemiol 156: 300–310.
  49. 49. Thomas DC, Haile RW, Duggan D (2005) Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 77: 337–345.
  50. 50. Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57: 455–464.
  51. 51. Farnir F, Grisart B, Coppieters W, Riquet J, Berzi P, et al. (2002) Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics 161: 275–287.
  52. 52. Coppieters W, Riquet J, Arranz JJ, Berzi P, Cambisano N, et al. (1998) A QTL with major effect on milk yield and composition maps to bovine chromosome 14. Mamm Genome 9: 540–544.
  53. 53. Kaupe B, Brandt H, Prinzenberg EM, Erhardt G (2007) Joint analysis of the influence of CYP11B1 and DGAT1 genetic variation on milk production, somatic cell score, conformation, reproduction, and productive lifespan in German Holstein cattle. J Anim Sci 85: 11–21.
  54. 54. Ogorevc J, Kunej T, Razpet A, Dovc P (2009) Database of cattle candidate genes and genetic markers for milk production and mastitis. Anim Genet.
  55. 55. Cohen-Zinder M, Seroussi E, Larkin DM, Loor JJ, Everts-van der Wind A, et al. (2005) Identification of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield and composition in Holstein cattle. Genome Res 15: 936–944.
  56. 56. Kuss AW, Gogol J, Geidermann H (2003) Associations of a polymorphic AP-2 binding site in the 5′-flanking region of the bovine beta-lactoglobulin gene with milk proteins. J Dairy Sci 86: 2213–2218.
  57. 57. Zhao HH, Fernando RL, Dekkers JC (2007) Power and precision of alternate methods for linkage disequilibrium mapping of quantitative trait loci. Genetics 175: 1975–1986.
  58. 58. Grapes L, Dekkers JC, Rothschild MF, Fernando RL (2004) Comparing linkage disequilibrium-based methods for fine mapping quantitative trait loci. Genetics 166: 1561–1570.
  59. 59. Williams KJ (2009) Some things just have to be done in vivo: GPIHBP1, caloric delivery, and the generation of remnant lipoproteins. Arterioscler Thromb Vasc Biol 29: 792–795.
  60. 60. Lange C, Whittaker JC (2001) Mapping quantitative trait Loci using generalized estimating equations. Genetics 159: 1325–1337.
  61. 61. Huang J, Jiang Y (2003) Genetic linkage analysis of a dichotomous trait incorporating a tightly linked quantitative trait in affected sib pairs. Am J Hum Genet 72: 949–960.
  62. 62. Allison DB, Thiel B, St Jean P, Elston RC, Infante MC, et al. (1998) Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 63: 1190–1201.
  63. 63. Liu J, Liu Y, Liu X, Deng HW (2007) Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet 81: 304–320.
  64. 64. Liu YZ, Pei YF, Liu JF, Yang F, Guo Y, et al. (2009) Powerful bivariate genome-wide association analyses suggest the SOX6 gene influencing both obesity and osteoporosis phenotypes in males. PLoS One 4: e6827.
  65. 65. Liu J, Pei Y, Papasian CJ, Deng HW (2009) Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol 33: 217–227.
  66. 66. Thomsen H, Reinsch N, XU N, LOOFT C, GRUPE S, et al. (2001) Comparison of estimated breeding values, daughter yield deviations and de-regressed proofs within a whole genome scan for QTL. J Anim Breed Genet 118: 357–370.
  67. 67. Thomsen H (2006) The choice of phenotypes for use of marker assisted selection in dairy cattle 8th World Congress on Genetics Applied to Livestock Production. Belo Horizonte, MG, Brasil.