Advertisement
Research Article

Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration

  • Erik Corona,

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America, Lucile Packard Children's Hospital, Palo Alto, California, United States of America

    X
  • Rong Chen,

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Lucile Packard Children's Hospital, Palo Alto, California, United States of America

    X
  • Martin Sikora,

    Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America

    X
  • Alexander A. Morgan,

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America

    X
  • Chirag J. Patel,

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America, Lucile Packard Children's Hospital, Palo Alto, California, United States of America

    X
  • Aditya Ramesh,

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Lucile Packard Children's Hospital, Palo Alto, California, United States of America

    X
  • Carlos D. Bustamante,

    Affiliation: Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America

    X
  • Atul J. Butte mail

    abutte@stanford.edu

    Affiliations: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America, Lucile Packard Children's Hospital, Palo Alto, California, United States of America

    X
  • Published: May 23, 2013
  • DOI: 10.1371/journal.pgen.1003447

Reader Comments (1)

Post a new comment on this article

What is the Relevance of Genetic Speculation that Contradicts Observed Reality?

Posted by jkaufman1 on 28 May 2013 at 19:29 GMT

Things as they are
Are changed upon the blue guitar.
----- Wallace Stevens

We agree that an effort to combine information about evolution, migration and genetic risk of common disease is an important area of future research for genomics. However, as epidemiologists we find this paper by Corona et al speculative to the degree that the main conclusions cannot be accepted. Although we are interested in the contribution that genomics can make to understanding disease causation, we have a more general concern that genomic studies of human health have too often lacked sufficient grounding in epidemiology.

The principle conclusion of this study is that a correlation exists between known genetic risk of T2D and geographic patterns of migration out of Africa. As the authors note, the primary objection to this conclusion is that it is in total contradiction to the observed risk. The authors provide a range of speculative explanations as to why this might be the case, but no data to support any of these hypotheses. If in fact genetic risk is too small to have an impact on disease expression, why should we be interested in it? More to the point, if it is too small to have any impact on observed disease rates, how would we ever verify if it really exists?

Although it is true that many genetic factors remain to be discovered, the emerging evidence supporting rare variants (which is one of the possibilities raised by the authors) does not look promising, at least in sequencing studies focused on exons. The other speculations are equally unsupported by evidence. We believe that it is most likely true that we have reached the asymptote of effect sizes that can be localized, and that T2D is – as originally hypothesized – polygenic and universally shared among human populations. If in fact these remaining factors include the majority of phenotypic effect, they could just as easily reverse the pattern seen, or at least neutralize it.

Consider the magnitude of the effect seen in epidemiologic studies along the continuum studied by these authors. Surveillance studies in rural Africa document T2D prevalences less than 1%, although this rate has risen in cities to 5-8%. However, among populations in the Pacific Islands and among some Native Americans prevalences of T2D are around 50%. The authors do acknowledge the importance of environment, yet if environment is over-determining, why should we care about average population risk? And if genetic effects were large enough to contribute to this gradient wouldn’t they have been identified? If they are so small as to be unapparent, aren’t they of little biological significance? Of course, the argument can be made that rare effects can illuminate important pathways, but that is not the purpose of this report. The purpose of this report is to rank order population risk, implying large effects that are commonly shared. If they only rank order persons in the same population then the evolution-migration hypothesis becomes moot. Likewise, genetic risk could influence within-population risk, which would be important in its own right, but the authors here refer exclusively to aggregate risk.

In the 1980’s the influential epidemiologist of diabetes Kelly West published a review of the world’s literature on T2D and potential causal mechanisms. He concluded that variation in prevalence of T2D among the world’s populations could be explained by variation in obesity. That observation has stood the test of time and – as in recently published work on population-wide weight loss and a 50% reduction in T2D in Cuba – been proven to have enormous public health value (1). To make similar contributions, genomics should attempt to be more closely tied to empirical evidence about disease occurrence.

Finally, the need to avoid stigmatizing populations based on genetic risk has been much discussed. It is not difficult to imagine a media announcement based on this publication - “Genetic risk of diabetes found in African populations”. Similar claims were made for intelligence not very long ago. Not all speculation is neutral.

As an artist, Picasso helped us see the world in a new way – as in his haunting painting in blue of a man with a guitar. In our discipline new ways of seeing should reflect deeper understanding of reality, not speculations merely internal to manipulations of a data set.

1. Franco M, Bilal U, Ordunez P, Benet M, Morejon A, Caballero B, Kennelly JF, Cooper RS. Population-wide weight loss and regain in relation to diabetes and cardiovascular disease mortality: Cuba 1980-2010 BMJ 2013;318:1700.

Richard S Cooper, MD
Jay S Kaufman, PhD

No competing interests declared.

RE: What is the Relevance of Genetic Speculation that Contradicts Observed Reality?

ecoronap replied to jkaufman1 on 06 Jul 2013 at 13:22 GMT

The goal of this study was to analyze the genetic basis of human disease in the context of migration and human relationships and test for genetic risk differentiation. We successfully integrated all necessary data and methods in order to perform this analysis and successfully addressed our questions. It appears you are interested in performing a very important, but different analysis by invoking a lack of grounding in epidemiology, which would be gratuitous in the context of the proposed issues/questions.


In a population, genetic risk is assessed by studying the distribution of risk alleles. Some take the estimated effect size into account, which is what we have done in this study. Observed disease rates are expected to differ substantially from genetic risk estimates. This often leads to the question regarding the utility of genetic risk of disease whenever one of the many articles relying on genetic risk estimates is published. I would like to point out that genetic risk does often agree with observed risk in our study so it is not in “total contradiction”. However, this is a moot point because we should not expect genetic risk to match observed risk. The primary objection you raise to this research is based on a common misconception that observed risk and genetic risk of complex disease are expected to agree or at least not be in full contradiction. While it may appear to make sense that high genetic risk should correlate with high overall risk, if we were to perform a worldwide survey of genetic resistance against malaria, we’d find that the individuals with the lowest genetic risk have the highest overall risk. One may be tempted to dismiss such a survey as lacking utility due to the contradiction of the genetic and observed risk. This would be a mistake. Protective alleles underwent selection as a natural result of the high overall risk. Likewise, high genetic risk could easily correspond to low overall risk. It’s entirely plausible that risk variants are allowed to increase in frequency in a population where the environment is unlikely to lead to the manifestation of the disease. In such a situation, there is no fitness decreasing penalty for the increase in frequency of risk alleles. We should not expect genetic risk to be in agreement with overall observed risk of complex disease, even in cases where the genetic component overshadows the environmental component of disease. We have relied on computational methods for testing differentiation of multiple SNPs for decades using Wright’s Fixation Index. We have recently relied on computational methods to assess positive selection within and across populations (Voight. el al) using the integrated haplotype score (Voight et al.) and XP-EHH (Sabeti et al.). In this study, we have developed a method to detect genetic risk differentiation in the context of a worldwide human phylogeny tree. This does does not require methodology grounded in epidemiology. Genetic risk by its definition excludes environmental factors and we can verify differentiation exists by studying the distribution of risk alleles as humans spread across the globe similarly to how we can detect differentiation with Fst, or positive selection with iHS and XP-EHH. Focusing tools of this nature on important traits (e.g. disease) allows for the characterization of the genetic basis of disease.


The question has been raised regarding why we should care about research revolving around genetic risk if it does not have an readily apparent impact on overall disease risk. Knowledge regarding the non-random distribution of the genetic basis of disease as humans spread across the world leads to many interesting questions. For example, what is responsible for the drop in genetic risk as humans migrated to different locations over time? It is entirely plausible that addressing this question could lead to additional novel insights about complex disease. This research has been carried out to increase our understanding of fundamental principles relating to the genetics of complex disease. It is not intended to yield immediate commercial or health benefits. However, in the long term, such research contributes to and can be the basis of applied research.

It remains to be seen where the missing heritability of the majority of complex diseases can be found. Evolutionary theory suggests much of the missing heritability will be rare. My speculations regarding where they may be located are intended to be speculations and by definition are not supported by evidence. Having evidence would mean that they can be proven and would then cease to be speculations. The missing heritability does exist somewhere and regardless of where it is found, the relevance of this research would persist. There are likely hundreds of type 2 diabetes loci that are yet to be discovered which may not follow the same general trend, but they will most likely represent a different “type” of SNP as it would reflect a different effect size and risk allele frequency. There are many potential avenues for the closing missing heritability gap not just in T2D, but across all complex diseases. These potential avenues include CNVs, rare alleles, alleles with a very small effect size, and pervasive epistatic effects. The currently known “high-confidence” variants for all complex diseases have an allele frequency and effect size combination that makes them detectable in the populations in which the GWASs were conducted. In general, SNPs below a certain risk allele frequency/effect size combination are unlikely to be detected in a GWAS. The results in this study reflect the particular category of SNPs that are detected in a GWAS. Despite the low predictive power caused by the missing heritability problem, we detected genetic risk differentiation in multiple diseases. This genetic risk differentiation establishes a non-random perturbation of the genetic risk of multiple diseases. The significance of these perturbations is important to identify and study, even if future work establishes (in the worst case) that they only occurred within SNPs of a particular risk allele frequency and effect size category. If further identified SNPs offset the genetic risk differentiation, the fact that genetic risk differentiation occurred for common SNPs with high effect size would make this study relevant and would encourage conducting further valuable investigations.


As previously established, the genetic risk of complex disease should not be expected to agree with or even correlate with observed risk. This research was inspired by a question: “Has the genetic basis of disease fluctuated more than can be explained by genetic drift as humans migrated across the globe”. This question does require neither empirical evidence about disease occurrence nor other methodology grounded in epidemiology. Inclusion of such data would be gratuitous and unnecessary to address the stated question. The question you wish to be addressed would make a fascinating separate and independent study.

I’m not sure what the research equivalent of Picasso’s “The Old Guitarist” would entail. If the implication is that this work does not represent a deeper understanding of reality, I disagree. If you’d like to explain why, in your opinion, genetic risk is fake in the sense that it fails to represent reality, I’d be happy to explain why it is indeed very real and relevant. Implying that this work is merely the result of “internal manipulations of a data set” ignores all of the work that went into the Human Genome Diversity Project. It took an enormous effort to retrieve genetic samples from diverse populations across the world. What you’re calling “manipulations of a data set” is in reality analysis of this data set. A manipulation implies that the data was altered as opposed to being analyzed. As a researcher, I would imagine that you are aware of the large difference between analyzing and “manipulating” a data set. As someone who is interested in a “deeper reality”, I fully expect you to either retract your accusation that this data set was “manipulated” or to prove that it was manipulated. As great as an artist like Picasso was and as tempted as I am to invoke Van Gogh, we should aim to make this a fact-based discussion rather than resort to empty rhetoric.

I have always found it troubling that many populations are excluded from genetic studies. You have raised a concern that results of this nature may be used to stigmatize populations. While it is always a cause for concern to have one’s research results misinterpreted, I am much more concerned with the failure to include a diverse set of populations in studies related to disease out of such concerns. It is important to include as diverse a set of worldwide populations in GWASs or studies such as this one, even if the results show that a particular population has a higher risk allele frequency. Finally, as a Mexican-American researcher, it is my hope that other researchers will be more inclusive than has been suggested as I often find it difficult to find data on my own population. I would like to encourage researchers to publish any studies addressing fundamental questions about the genetic basis of complex disease, despite the possibility that they may be misinterpreted.

No competing interests declared.