Research Article

Longitudinal Genome-Wide Association of Cardiovascular Disease Risk Factors in the Bogalusa Heart Study

  • Erin N. Smith,

    Affiliation: Scripps Genomic Medicine and Scripps Translational Science Institute, La Jolla, California, United States of America

  • Wei Chen,

    Affiliation: Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America

  • Mika Kähönen,

    Affiliation: Department of Clinical Physiology, University Hospital of Tampere and University of Tampere Medical School, Tampere, Finland

  • Johannes Kettunen,

    Affiliations: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom, FIMM, Institute for Molecular Medicine Finland, Helsinki, Finland

  • Terho Lehtimäki,

    Affiliation: Department of Clinical Chemistry, University Hospital of Tampere and University of Tampere Medical School, Tampere, Finland

  • Leena Peltonen,

    Affiliations: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom, FIMM, Institute for Molecular Medicine Finland, Helsinki, Finland, The Broad Institute of MIT and Harvard, Boston, Massachusetts, United States of America

  • Olli T. Raitakari,

    Affiliation: Department of Clinical Physiology, Turku University Hospital and Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland

  • Rany M. Salem,

    Affiliation: The Broad Institute of MIT and Harvard, Boston, Massachusetts, United States of America

  • Nicholas J. Schork,

    Affiliation: Scripps Genomic Medicine and Scripps Translational Science Institute, La Jolla, California, United States of America

  • Marian Shaw,

    Affiliation: Scripps Genomic Medicine and Scripps Translational Science Institute, La Jolla, California, United States of America

  • Sathanur R. Srinivasan,

    Affiliation: Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America

  • Eric J. Topol,

    Affiliation: Scripps Genomic Medicine and Scripps Translational Science Institute, La Jolla, California, United States of America

  • Jorma S. Viikari,

    Affiliation: Department of Medicine, University of Turku and Turku University Hospital, Turku, Finland

  • Gerald S. Berenson,

    Affiliation: Department of Epidemiology, Tulane University, New Orleans, Louisiana, United States of America

  • Sarah S. Murray mail

    Affiliation: Scripps Genomic Medicine and Scripps Translational Science Institute, La Jolla, California, United States of America

  • Published: September 09, 2010
  • DOI: 10.1371/journal.pgen.1001094


Cardiovascular disease (CVD) is the leading cause of death worldwide. Recent genome-wide association (GWA) studies have pinpointed many loci associated with CVD risk factors in adults. It is unclear, however, if these loci predict trait levels at all ages, if they are associated with how a trait develops over time, or if they could be used to screen individuals who are pre-symptomatic to provide the opportunity for preventive measures before disease onset. We completed a genome-wide association study on participants in the longitudinal Bogalusa Heart Study (BHS) and have characterized the association between genetic factors and the development of CVD risk factors from childhood to adulthood. We report 7 genome-wide significant associations involving CVD risk factors, two of which have been previously reported. Top regions were tested for replication in the Young Finns Study (YF) and two associations strongly replicated: rs247616 in CETP with HDL levels (combined P = 9.7×10−24), and rs445925 at APOE with LDL levels (combined P = 8.7×10−19). We show that SNPs previously identified in adult cross-sectional studies tend to show age-independent effects in the BHS with effect sizes consistent with previous reports. Previously identified variants were associated with adult trait levels above and beyond those seen in childhood; however, variants with time-dependent effects were also promising predictors. This is the first GWA study to evaluate the role of common genetic variants in the development of CVD risk factors in children as they advance through adulthood and highlights the utility of using longitudinal studies to identify genetic predictors of adult traits in children.

Author Summary

We have studied the association between genetic factors on a whole genome level and cardiovascular disease (CVD) risk factors in a population of individuals studied from childhood through adulthood. The longitudinal study design has enabled the investigation of genetic variation influencing trait values over time. We have identified DNA variants that are associated with CVD trait values consistently over time, and a second set of variants that are associated with CVD trait values in a time-dependent manner. We also show that variants previously identified in adult populations have consistent effects within our population and that these effects are usually similar across childhood through adulthood. The discovery of time-dependent variants that influence CVD trait values over time can potentially be used to screen young individuals who are pre-symptomatic and provide the opportunity for preventive measures decades before disease onset.


Cardiovascular disease (CVD) affects over 79 million people in the United States [1], and is the leading cause of death worldwide [2][4]. Identifying the genetic determinants of CVD can lead to more effective diagnostics, prognostics, therapeutics, and, ultimately, preventive strategies. The best chance for prevention would be to identify risk at the earliest possible age. Genome-wide association (GWA) leveraging cross-sectional phenotypic data has been a particularly useful approach to identifying loci that influence many of the quantitative risk factors of CVD [5][10], however the use of cross sectional data does not provide insight into how such risk factors develop over time. Longitudinal studies, particularly those that begin in childhood, allow for the identification of risk profiles of susceptible individuals before disease onset. The Bogalusa Heart Study (BHS) is a longitudinal study focused on the early natural history of CVD. The BHS began in 1973 and includes up to 9 phenotypic screenings in childhood (4–17 years of age) and up to 10 adult (18–48 years of age) cross-sectional screenings. We have conducted a longitudinal genome-wide association study on a subset of the total sample of unrelated individuals with a large number of measurements (mean number of measurements = 8, range = 4–13) and are of European Ancestry (N = 525).


Longitudinal GWA

We conducted a genome-wide association study of longitudinal measures of 12 traits measured from childhood through adulthood on participants of the BHS of European ancestry: anthropomorphic (height, weight, and waist circumference), blood pressure (BP) (diastolic and systolic BP), heart rate, blood lipids (low density lipoprotein cholesterol (LDL), high density lipoprotein cholesterol (HDL), total cholesterol (TC), and triglycerides), and metabolic traits (glucose and insulin). Genotyping was performed on the Illumina Human610 and HumanCVD BeadChips [11] for a total of 545,821 SNPs passing QC and allele frequency filters (see Materials and Methods). Imputation was performed using the CEU HapMap 2 as a reference population with the computer program MACH v.1.0.16 ( [12], providing genotype estimates for an additional 1,622,114 SNPs. For each SNP, we tested whether it had an average linear effect over time (SNP effect), and whether it entered into a time-dependent effect (SNPxAGE interaction effect), such that the genotype is associated with variation in the linear trajectory of the trait from childhood through adulthood. Both SNP and SNPxAGE effects were calculated using linear mixed models as implemented in the R nlme package [13], adjusting for age and gender.

Table 1 lists all regions showing SNP effect associations (P<10−6) and Table 2 lists all regions showing association (P<10−6) with SNPxAGE effects. We analyzed the regions surrounding the top associations for consistency with recombination hotspots and LD relationships (Figure S1) and provide Manhattan plots of each trait association (Figure S2). From both sets of analyses, there were 5 novel associations with a P-value less than 5×10−8 and 6 novel regions where there were at least 10 genotyped or imputed SNPs with P<10−5. The most significant association (rs7890572, P = 3.8×10−10) was observed with a linear triglyceride trajectory effect (i.e., SNPxAGE effect) on the X chromosome within the IL1RAPL1 gene and near the gene encoding glycerol kinase (GK), in which mutations have been implicated in pseudo-hypertriglyceridemia, caused by high levels of glycerol creating measurement artifacts in the triglyceride assay [14]. A novel association of potential biological interest involved a SNP effect on insulin levels with variation in the CHN2 locus (rs3793275, P = 5.8×10−9), a beta-chimerin that has recently been described as part of a fusion gene also containing the insulin receptor that was shown to be responsible for severe insulin deficiency [15]. This SNP is also associated with glucose trajectories in our dataset (SNPxAGE; P = 1.5×10−7). In the 7q11 region, 25 SNPs are associated (P<10−5) with diastolic BP (SNP effect; peak SNP rs709595, P = 7.0×10−7). The calcitonin gene-related peptide receptor (CRCP) is approximately 200 kb from the top SNP, but contains SNPs that are in LD with the top SNP (see Figure S1). The calcitonin gene-related peptide is a vasodilator [16] and its receptor CRCP has been previously implicated in hypertension in a small candidate gene association study of hypertension in Japanese individuals [17].


Table 1. Top SNP effect GWA hits for 12 phenotypes.


Table 2. Top SNPxAGE effect GWA hits for 12 phenotypes.


In addition to novel associations, there were three regions showing SNP associations that have been previously identified in GWA studies: rs853773 [18] near G6PC2 was associated with a glucose SNP effect (P = 7.0×10−7), rs247616 [5] near CETP was associated with an HDL SNP effect (P = 6.6×10−7), and the APOE e2 SNP rs7412 [19] was associated with a genome-wide significant LDL SNP effect (P = 1.6×10−8). A region near APOA5 that had been previously implicated in triglyceride levels showed a significant SNPxAGE effect on triglycerides in our study (rs12280753; P = 1.8×10−8). Although the nearest gene to rs12280753 is not APOA5, this SNP was also the most strongly associated SNP in previous studies of adult triglyceride levels [5], [10], [20].

Replication in the Young Finns

We pursued replication of these findings in genotyped individuals within the Young Finns (YF) cohort, consisting of 2,442 Finnish individuals tracked from childhood through middle adulthood (ages 3–45) with three measures in young individuals (ages 3–24) and two measures in older individuals (ages 24–45). These individuals have been genotyped on a custom-built Illumina genotyping chip (670K). Using the same analysis methods, we tested whether the top SNP was associated in the YF study (Table 3). Imputed genotype dosages were used when direct genotype data was not available. For the APOE-e2 SNP rs7412, which is not in HapMap or on the 670K chip, we used the SNP with the next strongest association in the BHS (rs445925). There were two SNPs that significantly replicated beyond the multiple testing threshold (P<0.05/51 = 1×10−3): the rs247616 SNP at CETP (P = 1.7×10−18), and rs445925 at APOE (P = 4.1×10−15). There was no trend to replicate the direction of effect between the studies: within the SNP effects, there were 12/21 (57%, chi-square P = 0.51) markers that showed the same direction of effect, while within SNPxAGE effects, there were 14/30 (47%, chi-square P = 0.72). The samples were combined and P-values were calculated for the combined BHS and YF data, using study as a covariate (Table 3). The associations at rs247616 at CETP with HDL-cholesterol (P = 9.7×10−24) and rs445925 at APOE with LDL-cholesterol (P = 8.7×10−19) were strongly significant, but no other regions in the combined BHS/YF data reached genome-wide significance of P<5×10−8.


Table 3. Replication results in the Young Finns.


Prediction of adult values given childhood values

Genetic variants will be most useful for trait prediction when they are associated with a trait above and beyond other known risk factors. In addition, the ability to predict adult trait levels in children, before disease onset, can lead to a disease prevention strategy. In longitudinal studies starting in childhood and going into adulthood, we can ask whether genetic loci are associated with the adult trait level above and beyond the trait level seen in the first measure taken in childhood. To test this hypothesis, we evaluated whether our associated markers were likely to be predictive of adult levels of the traits, after adjustment for trait levels in childhood. To account for variation in data collection, we also included the age at each of these measures as well as gender as covariates in the analysis. Within the BHS, variants that were characterized as SNPxAGE effects were more likely to be predictive of adult values after correcting for childhood values, which is expected since these variants were characterized in BHS initially (Table 4). In the YF study, however, we also saw more SNPxAGE variants associated with adult levels given childhood levels (Table 4). There were 6 variants that were associated with adult levels in the YF study at P<0.05, with 2 corresponding to the genome-wide significant SNP effects and 4 corresponding to BHS SNPxAGE variants. Only the association of rs445925 with LDL-cholesterol was strong enough to withstand multiple corrections. Further analysis of this observation is warranted in a larger cohort.


Table 4. Association of GWAS SNPs with adult trait levels after adjusting for childhood levels.


Previously identified markers

We assessed whether associations that have been described in previous adult cross-sectional GWA studies exhibit consistent effects over time and whether the effect sizes observed in children through middle-aged adults are consistent with those previously described. We identified 169 SNP-trait associations (see Materials and Methods) for which we had directly genotyped or imputed genotype data. We first estimated our power to detect each previous association at alpha = 0.05 under a more structured, but similar study design (i.e., 8 equally spaced measurements), given the previously reported effect size and allele frequency. Under this model, we would expect to have detected 40/169 (24%) associations at P<0.05, and we observed a similar number of SNP effects in the BHS data (32/169; 19%). We evaluated the associations across all traits together by comparing how well the previously reported effect size was recapitulated in the BHS GWA (Figure 1A). For consistency across studies and traits, if an effect size was not already expressed in terms of percent standard deviation (%SD), we converted the previously reported effect size into %SD and compared the previous effect size to the SNP effect. The previously reported effect size was a strong predictor of the SNP effect (slope = 0.47, P = 1.2×10−21), suggesting that SNPs that have been previously identified in adult cross-sectional GWA studies are good predictors of time-averaged effects in the BHS sample.


Figure 1. Effects of previously identified variants in the BHS.

A) Effect sizes of previously identified markers are plotted against observed effects in the BHS. SNP-trait associations are plotted by shape and color to indicate trait. The size of the point indicates the power to detect an association of the magnitude previously described, and whether a point is filled in or not indicates whether the association was significant at P<0.05 in the BHS. Linear regression lines are shown, with the slope and p-value of the association between previously reported effect sizes and the observed effect sizes. B) Individuals were scored based on the effect size of each previously identified marker and are grouped and color-coded based on the decile of their score. Linear lines are linear regression estimates of the average trajectory of each decile group.


We also determined whether the same previously identified SNPs were likely to show effects on a trait over time (SNPxAGE effects). Under a simple model that assumed that all of the effect in adults is due to a locus that has no effect in childhood, we estimated power to detect such an interaction effect in a similarly structured study with 8 repeated measures. Given these assumptions, we would have expected to see 24/169 (14%) SNPxAGE associations. We observed 6/169 (3.6%) SNPs that showed SNPxAGE effects at P<0.05, indicating that effects seen in SNPs described in adult GWA studies are not due primarily to differences in effects over time, although larger studies will be required to definitively characterize this.

Composite scoring

We considered whether a composite genotype score would better predict overall CVD risk factor trajectories or time-dependent effects than any single locus. For each person and each trait, we created a score by summing the expected effect in percent standard deviation of each allele that the person carried. We then determined whether the score was associated with the trait's average value and trajectory by using this score as a predictor for each trait in a linear mixed model, adjusting for age and gender. We assessed the score's average effect across time (score effect) and whether or not there was a time-dependent effect (score*age effect). The traits HDL, LDL, total cholesterol, triglycerides, and height showed strongly significant score effects, while only triglycerides showed a score*age effect (Table 5). Longitudinal data was visualized by color-coding the individuals according to the decile of their overall score and the average linear trend of each group was plotted (LDL, Figure 1B and others in Figure S3). These results indicate that the cumulative effects of SNPs that are identified in large adult cross-sectional studies are generally age-independent effects, with an exception in triglycerides, which was the only trait to show a significant score*age effect. We additionally tested whether previously identified variants were predictive of adult levels after adjusting for childhood levels (Table 6). We saw that 25/169 (14.8%) showed association at P<0.05. These observations in the BHS data suggest that even though results from existing GWA studies demonstrate age-independent effects, they can be predictive of trait values in adults.


Table 5. P-values associated with score and score*age effects (Age and Sex adjusted).


Table 6. P-values for predictive ability of previously identified SNPs, after adjusting for first measure in childhood.



We identified seven associations at P<5×10−8 showing either time-averaged or time-dependent effects on CVD risk factors in the BHS, two of which have been previously characterized. Of all associations with P<10−6, we were able to strongly replicate the association in the YF with HDL-cholesterol at CETP with a combined P = 9.7×10−24, and LDL-cholesterol at APOE with a combined P = 8.7×10−19. Differences that exist between the cohorts, such as birth year (15 year difference), and environmental differences could have influenced replication of the remaining SNPs. Larger discovery studies will also have better resolution and power to accurately estimate longitudinal effect sizes, likely allowing for more robust replication.

We evaluated the longitudinal effects of markers that have been previously identified in adult GWA studies. We found that previously identified markers showed time-averaged effects consistent with their reported effect size. This argues that the linear mixed model is an effective tool for modeling time-averaged effects in a GWA setting and that adult GWA studies may be capturing variation that tends to have consistent effects over time. Using a scoring approach, the overall signal from previously identified markers tended to have strong associations with time-averaged effects, but except in the case of triglycerides, did not show time-dependent effects. Previously identified markers were also likely to be associated with adult trait levels above and beyond childhood levels. Although we primarily describe time-averaged effects for previously identified markers, there may be more subtle time-dependent effects that larger studies will be better able to capture.

It is important to note that although we focused on analysis of linear trends over time, a linear model may not best capture these trends. Other approaches could be explored further such as non-linear models when there is an a priori expectation of trait trajectory, or model free approaches. These additional models could lead to additional variations that influence trajectories, or more precise estimations of effect size.

Longitudinal studies are particularly suited to capturing effects that vary over time. Genetic variation that shows a time-dependent effect may help predict those that will go onto develop disease before they show symptomatic traits. The discovery of variants associated with SNPxAGE interaction effects could thus be used to screen young individuals who are pre-symptomatic and provide the opportunity for preventive measures decades before disease onset. We explored how well the markers that we identified predicted adult traits after correcting for childhood traits and suggest further study of variants with SNPxAGE effects as possibly better predictors of adult trait levels above and beyond childhood levels. These results are consistent with the idea that longitudinal studies may be a useful tool to better capture time-dependent variation that could ultimately be better predictive of future outcomes.

Materials and Methods

Ethics statement

The study was approved by the institutional review board and the ethics committee of each institution. Written informed consent was obtained from each participant in accordance with institutional requirements and the Declaration of Helsinki Principles. All subjects in the BHS gave informed consent at each examination, and for those under 18 years of age, consent of a parent/guardian was obtained. Study protocols were approved by the Institutional Review Board of the Tulane University Health Sciences Center.

The Bogalusa Heart Study (BHS)

Between 1973 and 2008, 9 cross-sectional surveys of children aged 4–17 years and 10 cross-sectional surveys of adults aged 18–48 years (Figure S4), who had been previously examined as children, were conducted for CVD risk factor examinations in Bogalusa, Louisiana. This panel design of repeated cross-sectional examinations has resulted in serial observations from childhood to adulthood. By linking the 19 surveys, 12,163 individuals have been examined, with 37,317 observations. In the ongoing Longitudinal Aging Study funded by NIH and NIA since 2000, there are 1,202 subjects who have been examined 4–14 times from childhood to adulthood and have DNA available for GWA genotyping. Based on the analysis of identity-by-state (IBS) sharing from whole genome genotyping data, we focus on a subset of 525 genotyped individuals who are of European ancestry and unrelated (229 male, 296 female). The average number of measurements per individual is 8 (range 4–13).

The Cardiovascular Risk in Young Finns Study (YF)

The YF cohort is a Finnish longitudinal population study sample on the evolution of cardiovascular risk factors from childhood to adulthood [21]. The first cross-sectional study was conducted in 1980 in five centers and included 3,596 participants in the age groups of 3, 6, 9, 12, 15, and 18, who were randomly chosen from the national population register. After baseline in 1980 these subjects have been re-examined in 1983 and 1986 as young individuals, and in 2001 and 2007 as older individuals. Genotype data for the present analysis (DNA collected in 1980, 2001 and 2007) was available for 2,442 individuals.

In the latest follow-up in 2001, a total of 2,283 participants (of which DNA is available from 2,265 participants) were examined for numerous study variables, including serum lipoproteins, glucose, insulin, obesity indices, blood pressure, life-style factors, smoking status, alcohol use and general health status.

Genotyping & QC

BHS genotyping.

We genotyped 1,202 BHS samples using the Illumina Human610 Genotyping BeadChip [22], and HumanCVD BeadChip [11]. Genotypes were called using a clustering algorithm in Illumina's BeadStudio software. Three samples on the Human610 BeadChip gave poor results (call rates <99%) and were discarded from the study. In addition, 3 samples had a different estimated gender from genotype data versus gender provided with the phenotype data and were also discarded. SNPs with call rates <90% were discarded, and SNPs with call rates between 90–95% or cluster separation score <0.3 were manually inspected and cluster positions were edited if needed. We removed approximately 30,000 SNP loci (4.9%) due to poor performance. The final average sample call rate was 99.95% for the Human610 BeadChip, and 99.32% for the CVD BeadChip. We assessed reproducibility by genotyping 29 samples in duplicate (18 known replicates, 11 blind replicates), and observed >99.99% identical genotype calls on both BeadChips. Finally we observed 99.98% genotype concordance in 12,581 overlapping SNPs between the 610 and CVD BeadChips. A final list of 545,821 unique SNPs passed QC and allele frequency filters.

YF study genotyping.

For replication, we had genome-wide SNP data from a custom Illumina BeadChip containing 670,000 SNPs and CNV probes from 2,442 YF participants (1,123 males, 1,319 females). The custom content on the custom 670K array replaced some poor performing SNPs on the Human610 BeadChip and added more CNV content, and includes 546,677 SNPs passing QC from 594,210 SNPs on the chip. The custom 670K chip shares 562,643 SNPs in common with the Illumina Human610 BeadChip. Genotypes were called using Illumina's clustering algorithm [23]. A total of 2,556 samples were genotyped. After initial clustering, we removed 2 subjects for poor call rates (CR<0.90), and 54 samples failed subsequent QC (i.e., duplicated samples, heterozygosity, low call rate, or custom SNP fingerprint genotype discrepancy). The following filters were applied to the remaining data: MAF 0.01, GENO 0.05, MIND 0.05, and HWE 1×10−6. Three of 2,500 individuals were removed for low genotyping (MIND>0.05), 11,766 markers were excluded based on HWE test (P≤1×10−6), 7,746 SNPs failed missingness test (GENO>0.05), 34,596 SNPs failed frequency test (MAF<0.01) and one individual failed gender check. A final list of 546,677 SNPs passed QC and allele frequency filters.

Assessing cryptic relatedness.

Bogalusa participants with genotype data were filtered for relatedness. Whole-genome genotype data was used to calculate identity-by-descent (PI_HAT) values in PLINK [24]. Individuals were then removed such that no pair of individuals retained a PI_HAT value greater than 0.10. PI_HAT values were consistent with known sibling and half-sibling relationships. The final list consisted of 525 BHS individuals.

In the YF data, there were 546,770 SNPs and 2,496 individuals which were utilized to generate an identity-by-descent (IBD) matrix file in PLINK [24]. There were 51 pairs of individuals with pi-hat greater than 0.2 thus these individuals removed due to possible relatedness. One of the pair was removed using greater missingness as criteria. The final list consisted of 2,442 YF subjects.


We imputed genotypes in genotyped BHS individuals for all HapMap (phase II, release 22) SNPs using the program MACH [12]. The best estimate of the quantitative allele dosage was used as the predictor in association tests. The CEU HapMap phased haplotypes were used as a reference (N = 60 unrelated individuals). This resulted in overall allelic error rates of 1.6%. SNPs were filtered for minor allele frequency (<5%) and r2 with respect to genotyped SNPs (<0.30), resulting in genotype data in a total of 2,173,391 SNPs. Imputation was performed in the YF samples using MACH with the HapMap release 22 CEU haplotypes as reference.

Prediction ability

Previously identified markers were obtained through the NHGRI database [25] (accessed 5/20/09). Marker associations, alleles, and allele frequencies were verified with those reported in the original papers and corrected if required. Markers were used if the alleles at the locus provided unambiguous orientation or if the allele frequencies were different enough between A/T and C/G SNPs to distinguish which allele was the associated allele. We thus excluded any A/T or C/G SNPs with a minor allele frequency >0.4 and required that the allele frequency in the previously reported study be within 10% of the allele frequency in the BHS. We excluded studies of non-European Ancestry origin. One SNP per cytogenic region was used for each phenotype: the SNP with the smallest previously reported p-value was used.

Effect size was translated to percent standard deviation. If the effect size was reported in an absolute measure (e.g. cm for height), then the standard deviation from the BHS study was used. Standard deviation was calculated from the standard error of the SNP association reported in the linear mixed model. For glucose, cholesterol, and triglycerides measures, units were converted to mg/dl before converting to %SD.

A risk value was calculated for each individual based on the imputed genotype and previously reported effect size, converted to %SD. The %SD was multiplied by the allelic dosage for each SNP and summed over all the associated SNPs for each phenotype. The resulting risk value was then used as a predictor for the BHS individuals.

Genome-wide association

GWA was performed using linear mixed model regression with fixed covariates of age and sex, random slope, and random intercept. Genotypes were coded as 0,1, or 2 when the SNP was genotyped and by dosage (scale 0–2) when imputed. Analysis was performed within the nlme package in R [13]. Covariance structures were determined by testing all spatial covariance structures (exponential, Gaussian, linear, rational quadradics, and spherical) with covariates and a sample of SNPs, and picking the structure that best fit the data as measured by the lowest AIC (Akaike Information Criteria) value. SNP and SNPxAGE interaction effects were estimated separately. Although the default nlme optimizer tended to have difficulty converging, we obtained good results by using the optim optimizer on data where all missing data was removed. The number of SNPs that converged and for which we obtained results is listed in Table S1. Analyses were performed on a compute cluster with 600,000 tests taking ~3 hrs on 64 processors.

Filtering for genomic inflation

If genomic inflation factors were inflated or deflated, we reran the GWA using the first four MDS components as covariates. If the inflation factor was still less than 0.90 or greater than 1.05, we removed the analysis. In addition, we filtered body mass index (BMI) SNP, BMI SNPxAGE, and weight SNP analyses completely from the analysis due to a combination of consistently inflated or deflated genomic inflation factors or a long list of highly associated SNPs.


Power was calculated using G*Power 3 [26].We used the MANOVA repeated measures module with 8 repeated measures with a correlation of 0.5 between them, similar to the correlations seen in this study. We estimated power for between-factor and between-within interaction effects. Effect size (f) was calculated as
and R2 was calculated from the allele frequencies as reported in the original associations (p and q) and the effect size in terms of %SD [27].

Supporting Information

Figure S1.

Regional plots of top SNP and SNPxAGE associations. Regions are ordered by phenotype and significance as in Table 1 and 2. SNPs are indicated by triangle (directly genotyped) or square (imputed), and colored according to LD (r2) with the top SNP with increasing shades of red indicating stronger LD. Blue lines indicate recombination hotspots and refSeq genes are indicated.


(8.26 MB PDF)

Figure S2.

Manhattan Plots of GWAS results for each trait. Manhattan plots are shown for each SNP and SNPxAGE GWAS. Each point corresponds to an association with triangles indicating directly genotyped data and circles indicating imputed data. A horizontal line is plotted at P = 10−6 and SNPs above this point are outlined in pink. These SNPs occur in Tables 1, 2, and 3. Chromosomes are plotted in alternating blue and grey. P-values greater than 0.001 are not plotted.


(8.19 MB PDF)

Figure S3.

Longitudinal profiles of cumulative score from previously identified SNPs. Individuals were scored based on the effect size of each previously identified marker as in Figure 1B. Individuals are grouped and color-coded based on the decile of their score. Linear lines were calculated using linear regression with all points from all individuals in a given decile.


(1.26 MB PDF)

Figure S4.

Age at measurement in the BHS. All exam dates that were included in the study are plotted as a function of the age of participant at the exam date. Individuals had between 4–13 measurements. A single individual is highlighted in red.


(0.27 MB PDF)

Table S1.

Number of SNPs that successfully converged and produced association statistics.


(0.06 MB XLSX)


The authors would like to thank the participants of the Bogalusa Heart Study and the Young Finns Study, and their families, for their generous participation in the studies. In addition, the authors would like to thank the additional members of the i3C Consortium for their helpful discussion.

Author Contributions

Conceived and designed the experiments: ENS MK TL LP OTR RMS NJS EJT GSB SSM. Performed the experiments: ENS MK TL MS SSM. Analyzed the data: ENS JK OTR RMS SSM. Contributed reagents/materials/analysis tools: WC MK JK TL LP OTR SRS JSV GSB. Wrote the paper: ENS TL OTR NJS SSM.


  1. 1. (2007) NHLBI morbidity and mortality chartbook. Bethesda, MD: National Heart, Lung, and Blood Institute.
  2. 2. Bonow RO (2002) Primary prevention of cardiovascular disease: a call to action. Circulation 106: 3140–3141.
  3. 3. Mathers CD, Loncar D (2006) Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3: e442.
  4. 4. Murray CJ, Lopez AD (1997) Global mortality, disability, and the contribution of risk factors: Global Burden of Disease Study. Lancet 349: 1436–1442.
  5. 5. Aulchenko YS, Ripatti S, Lindqvist I, Boomsma D, Heid IM, et al. (2009) Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet 41: 47–55.
  6. 6. Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, et al. (2009) Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet 41: 56–65.
  7. 7. Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, et al. (2009) Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet 41: 666–676.
  8. 8. Levy D, Ehret GB, Rice K, Verwoert GC, Launer LJ, et al. (2009) Genome-wide association study of blood pressure and hypertension. Nat Genet 41: 677–687.
  9. 9. Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, et al. (2009) Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 41: 35–46.
  10. 10. Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, et al. (2008) Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 40: 161–169.
  11. 11. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, et al. (2008) Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS ONE 3: e3583.
  12. 12. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316: 1341–1345.
  13. 13. Jose Pinheiro DB, DebRoy Saikat, Sarkar Deepayan, the R Core team (2009) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-93.
  14. 14. Walmsley TA, Potter HC, George PM, Florkowski CM (2008) Pseudo-hypertriglyceridaemia: a measurement artefact due to glycerol kinase deficiency. Postgrad Med J 84: 552–554.
  15. 15. Suliman SG, Stanik J, McCulloch LJ, Wilson N, Edghill EL, et al. (2009) Severe insulin resistance and intrauterine growth deficiency associated with haploinsufficiency for INSR and CHN2: new insights into synergistic pathways involved in growth and metabolism. Diabetes 58: 2954–2961.
  16. 16. Brain SD, Williams TJ, Tippins JR, Morris HR, MacIntyre I (1985) Calcitonin gene-related peptide is a potent vasodilator. Nature 313: 54–56.
  17. 17. Sano M, Kuroi N, Nakayama T, Sato N, Izumi Y, et al. (2005) Association study of calcitonin-receptor-like receptor gene in essential hypertension. Am J Hypertens 18: 403–408.
  18. 18. Bouatia-Naji N, Bonnefond A, Cavalcanti-Proenca C, Sparso T, Holmkvist J, et al. (2009) A variant near MTNR1B is associated with increased fasting plasma glucose levels and type 2 diabetes risk. Nat Genet 41: 89–94.
  19. 19. Bennet AM, Di Angelantonio E, Ye Z, Wensley F, Dahlin A, et al. (2007) Association of apolipoprotein E genotypes with lipid levels and coronary risk. JAMA 298: 1300–1311.
  20. 20. Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, et al. (2008) Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet 40: 189–197.
  21. 21. Raitakari OT, Juonala M, Ronnemaa T, Keltikangas-Jarvinen L, Rasanen L, et al. (2008) Cohort profile: the cardiovascular risk in Young Finns Study. Int J Epidemiol 37: 1220–1226.
  22. 22. Eberle MA, Ng PC, Kuhn K, Zhou L, Peiffer DA, et al. (2007) Power to detect risk alleles using genome-wide tag SNP panels. PLoS Genet 3: 1827–1837.
  23. 23. Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, et al. (2007) A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics 23: 2741–2746.
  24. 24. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
  25. 25. Hindorff L, Junkins H, Mehta J, Manolio TACatalog of Published Genome-Wide Association Studies. Available at: Accessed 5/20/09.
  26. 26. Faul F, Erdfelder E, Lang AG, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39: 175–191.
  27. 27. Falconer DS, MacKay TFC (1996) Introduction to Quantitative Genetics: Benjamin Cummings.