Search
Advanced Search
Average Rating (0 User Ratings)
    • Currently 0/5 Stars.
    See all categories
      • Currently 0/5 Stars.
      • Currently 0/5 Stars.
      • Currently 0/5 Stars.
    Rate This Article

Open Access

Research Article

The Genetic Structure of Pacific Islanders

Jonathan S. Friedlaender 1 *, Françoise R. Friedlaender 2 , Floyd A. Reed 3 , Kenneth K. Kidd 4 , Judith R. Kidd 4 , Geoffrey K. Chambers 5 , Rodney A. Lea 5 , Jun-Hun Loo 6 , George Koki 7 , Jason A. Hodgson 8 ¤, D. Andrew Merriwether 8 , James L. Weber 9

1 Anthropology Department, Temple University, Philadelphia, Pennsylvania, United States of America, 2 Independent Researcher, Philadelphia, Pennsylvania, United States of America, 3 Department of Biology, University of Maryland, College Park, Maryland, United States of America, 4 Department of Genetics, Yale University, New Haven, Connecticut, United States of America, 5 School of Biological Sciences, Victoria University, Wellington, New Zealand, 6 Transfusion Medicine Laboratory, Mackay Memorial Hospital, Taipei, Taiwan, 7 Institute for Medical Research, Goroka, Eastern Highlands Province, Papua New Guinea, 8 Department of Anthropology, Binghamton University, Binghamton, New York, United States of America, 9 Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America

Abstract

Human genetic diversity in the Pacific has not been adequately sampled, particularly in Melanesia. As a result, population relationships there have been open to debate. A genome scan of autosomal markers (687 microsatellites and 203 insertions/deletions) on 952 individuals from 41 Pacific populations now provides the basis for understanding the remarkable nature of Melanesian variation, and for a more accurate comparison of these Pacific populations with previously studied groups from other regions. It also shows how textured human population variation can be in particular circumstances. Genetic diversity within individual Pacific populations is shown to be very low, while differentiation among Melanesian groups is high. Melanesian differentiation varies not only between islands, but also by island size and topographical complexity. The greatest distinctions are among the isolated groups in large island interiors, which are also the most internally homogeneous. The pattern loosely tracks language distinctions. Papuan-speaking groups are the most differentiated, and Austronesian or Oceanic-speaking groups, which tend to live along the coastlines, are more intermixed. A small “Austronesian” genetic signature (always <20%) was detected in less than half the Melanesian groups that speak Austronesian languages, and is entirely lacking in Papuan-speaking groups. Although the Polynesians are also distinctive, they tend to cluster with Micronesians, Taiwan Aborigines, and East Asians, and not Melanesians. These findings contribute to a resolution to the debates over Polynesian origins and their past interactions with Melanesians. With regard to genetics, the earlier studies had heavily relied on the evidence from single locus mitochondrial DNA or Y chromosome variation. Neither of these provided an unequivocal signal of phylogenetic relations or population intermixture proportions in the Pacific. Our analysis indicates the ancestors of Polynesians moved through Melanesia relatively rapidly and only intermixed to a very modest degree with the indigenous populations there.

Author Summary

The origins and current genetic relationships of Pacific Islanders have been the subjects of interest and controversy for many decades. By analyzing the variation of a large number (687) of genetic markers in almost 1,000 individuals from 41 Pacific populations, and comparing these with East Asians and others, we contribute to the clarification and resolution of many of these issues. To judge by the populations in our survey, we find that Polynesians and Micronesians have almost no genetic relation to Melanesians, but instead are strongly related to East Asians, and particularly Taiwan Aborigines. A minority of Island Melanesian populations have indications of a small shared genetic ancestry with Polynesians and Micronesians (the ones that have this tie all speak related Austronesian languages). Inland groups who speak Papuan languages are particularly divergent and internally homogeneous. The genetic divergence among Island Melanesian populations, which is neatly organized by island, island size/topography, as well as their coastal or inland locations, is remarkable for such a small region, and enlarges our understanding of the texture of contemporary human variation.

Introduction

The populations in New Guinea and the islands immediately to the east (the Bismarck and Solomons archipelagos) are well-known for their great diversity in cultures, languages, and genetics, which by a number of measures is unsurpassed for a region of this size [1]. This area is referred to as Near Oceania, as opposed to the islands farther out in the Pacific, known as Remote Oceania [2] (see Figure 1). For simplicity, we refer only to the peoples of Near Oceania as “Melanesians,” although this term ordinarily encompasses additional groups to the east as far as Fiji, who are not covered in this study. Major parts of Near Oceania were settled from Southeast Asia early in modern human prehistory, between ~50,000 and ~30,000 years before present (YBP) [35]. Populations were relatively isolated at this edge of the human species range for the following 25,000 years. The early settlers in Near Oceania were very small groups of hunter-gatherers. For example, New Ireland, which is more than 300 km long, is estimated to have had a pre-Neolithic carrying capacity of ~1,200 people or fewer [6]. There is evidence of sporadic, modest contact between New Guinea and the Bismarcks from 22,000 YBP, and with Bougainville/Buka in the Solomons only from ~3,300 years ago [3,7].

thumbnail

Figure 1. Populations Included in This Study

(A) HGDP-CEPH population locations. The two Pacific groups are boxed.

(B) Pacific population locations. Our population samples are blue; the 2 HGDP-CEPH Melanesian “Oceanic” groups are red.

doi:10.1371/journal.pgen.0040019.g001

By ~3,300 YBP [3], at least one powerful new impulse of influence had come from Austronesian speaking migrants from Island Southeast Asia, likely associated with the development of effective sailing [8], that led to the appearance of the Lapita Cultural Complex in the Bismarck Archipelago. After only a few hundred years, “Lapita People” from this area had colonized the islands in Remote Oceania as far east as Tonga and Samoa, where Polynesian culture then developed [9].

The distribution and relations of Pacific language families reflect ancient settlement. Austronesian is a widespread and clearly defined linguistic family with more than 1,000 member languages, which has its greatest diversity, and likely origin, in Taiwan ~4,000–5,000 years ago [10]. Some basic phylogenetic relations within Austronesian are sketched in Figure S1. All Austronesian languages spoken outside Taiwan belong to the Malayo-Polynesian branch, and almost all the Malayo-Polynesian languages of Oceania belong to the Oceanic branch. It is Proto Oceanic, the immediate ancestor of the Oceanic languages, that is associated with an early phase of the Lapita Cultural Complex. Proto Oceanic split into a number of branches as its descendants spread across Remote Oceania, including Proto Nuclear Micronesian and Proto Polynesian (a branch of Central Oceanic).

Almost all the other indigenous languages of Oceania are referred to as non-Austronesian, or Papuan. Most Papuan languages are found in New Guinea, with the remainder in nearby islands. This is a residual category of ~800 languages. Most of these can be assigned to more than 20 different language families, but these families cannot be shown to be related on present evidence. There remain a number of “Papuan” isolates that cannot be grouped at all [11]. Trans New Guinea is the largest Papuan language family. It consists of ~400 languages and dates to 6,000 to 10,000 YBP [12]. Other Papuan families including the ones in the Bismarck and Solomon archipelagos probably also go back at least to this period [1315]. While it is reasonable to assume these different Papuan families had common origins further back in time, any evidence of such ties that is recoverable with standard methods of historical linguistics has been erased over the millennia. The concentration and number of these apparently unrelated language families and isolates is unsurpassed in any other region of the world [15].

Analyses of genetic variation at some informative loci, particularly the mitochondrial DNA (mtDNA) (reviewed in [16,1719]), non-recombining Y-chromosome markers (NRY) (reviewed in [19,20]), and a small set of autosomal microsatellites [21] have provided divergent impressions of the population genetic structure of both Near and Remote Oceania. Because they have ¼ the effective sample size of autosomal markers, the mtDNA and NRY haplotypes have been particularly subject to the effects of random genetic drift, and each autosomal marker, no matter how informative, still represents a minute fraction of the total genetic variation among populations. Even so, these data have shown that the genetic variation in Near Oceanic populations is considerably greater than in Remote Oceanic ones, and that there are a cluster of haplogroups that developed in particular islands of Near Oceania between approximately 50,000 and 30,000 years ago.

However, a number of unresolved issues remain concerning the proper interpretation of these and other data that a comprehensive genomic sampling of neutral biparental markers across Pacific populations should clarify. A list of these includes: 1) to whom are these diverse Melanesian populations most closely related outside this region (East or South Asians, or perhaps even Africans, whom they physically resemble)? 2) how does the genetic diversity and differentiation of Near Oceanic populations compare with those in other regions? 3) is there a clear organization of the variation among groups in Near Oceania (i.e., either by language, by island, or distance from major dispersal centers)? 4) is there a genetic signature of Aboriginal Taiwanese/Southeast Asian or Polynesian influence in Melanesian populations, especially in the Bismarcks, where the Lapita Cultural Complex developed? and 5) are Polynesians more closely related to Asian/Aboriginal Taiwanese populations or to Melanesians?

Here we report the analysis of 687 microsatellite and 203 insertion/deletion (indel) polymorphisms in 952 individuals from 41 Pacific populations, primarily in the Bismarck Archipelago and Bougainville Island, and also including select sample sets from New Guinea, Aboriginal Taiwan, Micronesia, and Polynesia. The results show the reduced internal variation of Near Oceanic Melanesian populations and the remarkable divergence among them, and how this divergence is influenced by island size and topography, and is also correlated with language affiliation. We also detected a very small but clear genetic signature of “Asian/Polynesian” intermixture in certain Austronesian (Oceanic)-speaking populations in the region (by “genetic signature,” we mean an ancestral proportion in some groups inferred by the STRUCTURE analysis that predominates in another ancestral grouping). For global context, these data were compared with data from the Centre d'Etude du Polymorphisme Humain human genome diversity panel (HGDP-CEPH), composed of cell lines [2224], especially its subset from East Asia. Figure 1A shows how undersampled the Pacific populations had been in the HGDP-CEPH dataset (as well as its emphasis on particular regions of Asia), and Figure 1B shows the distribution of our Pacific population samples, with its intensive coverage in Near Oceania.

Results

Our sampling strategy concentrated on Papuan-speaking populations and their immediate Oceanic-speaking neighbors from the islands immediately to the east of New Guinea, in what is called Northern Island Melanesia, consisting of the Bismarck and Solomon Archipelagos (see Figure 1B). The three largest islands of the region were most intensively sampled—New Britain, New Ireland, and Bougainville—along with two nearby smaller islands (New Hanover and Mussau). Additional Pacific samples came from New Guinea (one set from the lowland Sepik region and one set from the Eastern Highlands), Micronesia (primarily from Belau), Polynesia (Samoans and one New Zealand Mãori group), and aboriginal Taiwan (Amis and the Taroko, a mountain Atayal group). The details of the sample locations and language family affiliations are given in Table S1 and in the Methods section.

The Global Context

Figure 2 shows the estimated values of θ (θ̂) calculated from expected heterozygosity (He) arranged from highest to lowest values, combining our Pacific populations and the HGDP-CEPH global set (the values of θ̂, He, and the average number of alleles per locus are given in Table S1). From Ohta and Kimura [25], under a stepwise model, the expected relationship between θ and heterozygosity (H) is

thumbnail

Figure 2. Population Diversity

Values of θ̂ for the HGDP-CEPH and Pacific datasets, for 687 microsatellites. Populations are ordered by their declining values of θ̂, but systematic regional distinctions are indicated by vertical lines. Conglomerate groups tend to have higher values than nearby populations (Bantu South, Sepik, Highlands, Micronesia, Samoa, and Columbia). Papuan-speaking groups are in bold italics; the Melanesian inland/shore distinction is indicated by the two shades of orange. Abbreviated names are spelled out in Table S1.

doi:10.1371/journal.pgen.0040019.g002


which rearranges to



For autosomal loci, θ is defined as θ = 4Neμ, where Ne is defined as the effective population size and μ is the per generation mutation rate. Assuming the mutation rate is constant across populations and that the stepwise mutation model is appropriate, θ̂ provides an estimate that is linearly correlated with effective population size. In contrast, H asymptotically approaches a value of 1 as the effective population size increases. Therefore, the use of θ̂ is more appropriate to represent differences in effective population sizes among populations (e.g., a θ ratio of 2 between two populations indicates twice the effective population size between the populations, while an H ratio of 2 does not).

The pattern of variation in Figure 2 is consistent with a series of successive founder effects that modern humans underwent in their expansions out of Africa (also shown by [26]). African populations have the highest values, followed in order by Europe/Central Asians, East Asians, Melanesians, and Native Americans. All the Pacific populations ranked together in a narrow band towards the low end of θ̂ values (between 4.8 and 2.9). Within the Melanesian set, inland populations generally had lower values of θ̂ than shore-dwelling groups, as shown. The three non-Pacific groups in the range between 4.8 and 2.9 were the Maya, Columbia, and Lahu. The Maya are known to have some European ancestry, which would explain their relatively high θ̂ for a Native American group; and the Lahu are an Asian population that was subject to particularly strong random genetic drift [24]. Columbia and other conglomerate groups made up of individuals from different populations (e.g., Bantu South, Sepik, Highlands, Micronesia, and Samoa) consistently had higher values of θ̂ than related groups. This combining of groups has caused inflated levels of diversity and effective population size estimates (i.e., there is more variation in a combined sample set than is typically contained in one from a clearly defined population).

Ramachandran et al. [26] investigated the correlation between geographic distance and genetic differentiation as measured by pairwise FST in the global HGDP-CEPH dataset, and found a linear relationship existed, with major deviations from the fitted line they believed consistent with admixture or extreme isolation. We analyzed this correlation by major region, adding our expanded Pacific dataset. The results, shown in Figure 3, show the extremely heterogeneous nature of the linear correlations and distributions from region to region. The sampled Melanesian populations were distributed across a comparatively small geographic area, but their range of pairwise FST values was extremely large. Only the Native American groups had an equivalent range of FST values, but these were unreliable since there were only five American populations distributed across very large distances.

thumbnail

Figure 3. Genetic versus Geographic Distances within Continents

Regional correlations between FST and geographic distance for population pairs.

doi:10.1371/journal.pgen.0040019.g003

To quantify the degree of variation within and among populations, an analysis of molecular variance (AMOVA) for the Pacific materials plus the HGDP-CEPH dataset was performed, with the results shown in Table 1. The global AMOVA results first presented in [24] for the HGDP-CEPH dataset were based on 377 microsatellites, included some first degree relatives, and included only two “Oceanic” populations (from the Nasioi of Bougainville and highland New Guinea). In the current analysis based on 687 microsatellites, the Americas had the highest among-population variation component, followed in order by Melanesia, Africa, Asia, and Europe. This pattern follows directly from their ranking in population heterozygosities or θ̂ [27].

thumbnail

Table 1.

Analysis of Molecular Variance (AMOVA) for 687 Microsatellites for Major Regions (HGDP-CEPH plus Pacific)

doi:10.1371/journal.pgen.0040019.t001

As shown in Table 2, the microsatellite variation in Melanesia (New Guinea, New Britain, New Ireland, and Bougainville) was apportioned first by language group and then by island. While population variation among the different islands was considerable (refer to the 95% confidence interval), within-island variation among populations was more than three times greater. This was primarily due to the extensive variation within New Britain (with a 5% internal variance component), followed by Bougainville (3.7%), and New Ireland (2%, see Table S2). The variation among the three New Guinea samples in our series was lower, most likely because of their less rigorous population definitions (see the Methods section for sampling details).

thumbnail

Table 2.

Analysis of Molecular Variance (AMOVA) for 687 Microsatellites for Island Melanesia (partitioned by Island and by Language Group)

doi:10.1371/journal.pgen.0040019.t002

Apportioning the molecular variance by language group (between Oceanic speaking and Papuan speaking populations) only accounted for 0.2 % of the total, which, as indicated by the very small 95% confidence interval, was still significant. Since the two language categories are scattered across the islands, geography and intermixture will confound possible language effects. While the microsatellite variation among the Oceanic-speaking populations was significant, it was much greater among the Papuan-speaking populations (many of which are located in the mountainous interiors of the larger islands).

To investigate individual and population similarities, we applied a Bayesian model-based clustering algorithm implemented in the STRUCTURE program [28] to our Pacific dataset combined with the HGDP-CEPH panel (also genotyped by the Marshfield Clinic). This program identifies groups of individuals who have similar allele frequency profiles. The great advantage of this clustering approach is that it avoids a priori population classifications, and instead estimates the shared population ancestry of individuals based solely on their genotypes under an assumption of Hardy-Weinberg equilibrium and linkage equilibrium in ancestral populations. It infers individual proportions of ancestry from K clusters, where K is specified in advance and corresponds to the number of posited ancestral populations; K can be varied across independent runs. Individuals can be assigned admixture estimates from multiple ancestral populations, with the admixture estimates summing to 1 across these population clusters.

Figure 4 presents the STRUCTURE analysis of our Pacific dataset plus the HGDP-CEPH Panel for 687 microsatellites and 203 indels on the 22 autosomes, on a total of 1,893 individuals from 91 populations. Each increase in K split a cluster that had been defined in an earlier run, and individuals from the same populations had very similar membership coefficients in the inferred clusters. Details of the STRUCTURE results are provided in the Table S3. Inclusion of our large Pacific dataset altered the sequence of splitting, but did not change, the five major global clusters that had previously identified with a smaller set of microsatellites: Sub-Saharan Africa, Western Eurasia, East Asia, “Oceania,” and the Americas [24]. The Taiwan Aborigines clustered with East Asia, while Polynesians and Micronesians had a mixed position between East Asians and Melanesians (“Oceania”). The Mãori had the suggestion of a minor proportion of European admixture, which had been indicated by the donors themselves.

thumbnail

Figure 4. Global Population Structure

STRUCTURE analysis of the Pacific and HGDP-CEPH sets combined, for 687 microsatellites and 203 indels over 91 populations encompassing 1,893 samples (20,000/10,000 burnin/MCMC). Each vertical line represents an individual. The colors represent the proportion of inferred ancestry from K ancestral populations.

doi:10.1371/journal.pgen.0040019.g004

There was a small but consistent “Asian/Polynesian” admixture estimate in specific Melanesian groups. Because clustering after K = 6 mostly involved Near Oceanic populations, we stopped the combined global analysis there, and analyzed the Pacific subset separately thereafter.

An unrooted neighbor-joining tree for the same HGDP-CEPH and Pacific samples, excluding the indels, was calculated from a matrix of pairwise FST “coancestry” distances (similar to Reynolds' D [29], see Table S4), and is shown in Figure 5. For comparison, the cluster colors for the K = 6 STRUCTURE run were superimposed on the tree. The results were compatible with the clusters identified with STRUCTURE. Branch lengths varied inversely with values of θ̂ or expected heterozygosity, so that populations with the longest branch lengths had the lowest values of θ̂. The longest branches belonged to the Native American and separate Melanesian groups. As with the STRUCTURE results, this unrooted FST based tree had Melanesians, East Asians, and Native Americans at the opposite end of the human tree from Africans and Europeans. Trees based on other population pairwise genetic distance matrices (Nei's chord distance [30], (δμ)2 [31], the proportion of shared alleles [32], and Cavalli-Sforza and Edwards' chord distance [33]) also indicated relatively large distances between Africans and Melanesians, and also consistently placed the Taiwan Aborigines between the East Asians and Polynesians/Micronesians (Figure S2).

thumbnail

Figure 5. Global Population Tree

Neighbor-joining FST-based tree for the Pacific and HGDP-CEPH combined datasets (687 microsatellites). Superimposed colors are from the STRUCTURE analysis at K = 6 (also shown).

doi:10.1371/journal.pgen.0040019.g005

The Pacific

We performed STRUCTURE analyses on a combined East Asia–Pacific dataset to explore in detail the relationships among Melanesians, Polynesians, Taiwan Aborigines, and East Asians, and to clarify the role of intermixture there. The samples included in this analysis were our Pacific set of 40 groups, and from the HGDP-CEPH panel, the “Papuans,” (identified here as “Highlands”), the East Asians, and French (the French were included to identify European admixture). The STRUCTURE results are shown in Figure 6, and the details on their reproducibility in Table S5. At K = 2 and K = 3, the Asia-Pacific clusterings mirrored the first five runs of the global comparison. Bougainville formed a cluster contrasting with central New Britain at K = 3; the New Guinea groups separate at K = 4; and a central New Britain cluster splits at K = 5. Then, at K = 6, a Polynesian cluster appeared, centered on the Mãori, with high ancestral proportions for the Samoan and Micronesian samples as well as the Taiwanese Aborigines. The former “East Asian” ancestral proportion in Melanesian populations converted almost entirely to “Polynesian” in this run. At K = 7, 8, and 9, more Melanesian clusters formed in New Britain and New Ireland. All but one of the Melanesian cluster foci are Papuan-speaking groups, primarily located in the interiors of the large islands (see Figures 7 and 8). The Mamusi, who are Oceanic-speaking neighbors of the Ata, are the exception. There is reason to suspect the Mamusi were originally a Papuan-speaking group (perhaps Ata speakers) who adopted an Oceanic language [34]. At K = 10, the “Europeans” were finally identified as a separate cluster. As shown in Table S5, runs at K = 11 and above became unstable and not reproducible.

thumbnail

Figure 6. Pacific Population Structure

STRUCTURE analysis of the Pacific, HGDP-CEPH East Asia, and European (French) groups (687 microsatellites and 203 indels, 20,000/10,000 burnin/MCMC). Results are given from K = 2 to K = 10. Each vertical line represents an individual. The colors represent the proportion of inferred ancestry from K ancestral populations.

doi:10.1371/journal.pgen.0040019.g006
thumbnail

Figure 7. Pacific Population Structure Details

Individual and (below) mean population assignments at K = 10 for the Pacific, HGDP-CEPH East Asia, and French. Purple arrows denote the eight Oceanic-speaking populations with an “Austronesian” assignment signature above 5%. Papuan-speaking group names are in bold italics. Asterisks denote inland groups. Populations are arranged geographically, approximately from west to east.

doi:10.1371/journal.pgen.0040019.g007
thumbnail

Figure 8. The Geographic Patterning of STRUCTURE Results

Distribution of cluster assignment percentages (in pie-charts) among Northern Island Melanesian populations for K=10. Oceanic-speaking regions are stippled; the different Papuan-speaking regions have stripes or grid marks. Papuan-speaking group names are in bold italics. Inland group locations are dark orange dots; shore group locations are light orange dots. Baining (Mali) and Baining (Kaket) are two dialects; elsewhere, the two Kaket-speaking locales are identified (Rangulit and Malasait), as is Marabu (Mali-speakers).

doi:10.1371/journal.pgen.0040019.g008

The approximate percentage of “European” admixture is best seen in Figure 7, which gives average ancestral proportions by population. In the Mãori, the “European” ancestry was ~12%, and for Samoans it was ~5%. The Samoan and Micronesian results also suggested minor ties with East Asians and also Melanesians, specifically the “New Ireland” cluster (a number of Lapita sites have been found in the vicinity of New Ireland [3]). The Micronesians had low levels of inferred ancestry shared with populations in New Guinea, which is not far from Belau, where most of the Micronesian samples are from. This relationship is echoed in mtDNA results as well [35]. The typical ancestral proportions by population for a majority rule run are given in Table S6. As seen in Table S5, 15 out of 20 STRUCTURE runs on our Pacific dataset at K = 10 produced essentially the same group ancestry proportions as shown in Figures 6 and 7, with individual similarity coefficients ranging from 0.90 to 0.96, so these results are quite reproducible.

As in the global comparison, an “East Asian/Polynesian” estimated ancestry proportion for a number of Melanesian populations only occurred at frequencies of >5% in certain Oceanic-speaking (Austronesian) groups, and it is hereafter referred to as the “Austronesian” genetic signature. In Figure 7, the purple arrows point to those Oceanic-speaking groups in our Melanesian sample set that have this clear “Austronesian” signature. The probabilities were highest in the Kove and Saposa (just below 20%), followed by the Mussau at 15%, with the Teop, Mangseng, Nakanai (Bileki), Melamela, and Tigak having lower “Austronesian” signatures. In these Oceanic-speaking populations, the “Austronesian” ancestral assignment proportions never ranked higher than third, indicating their comparatively intermixed, and predominantly Papuan, genetic nature.

As a check on these results, particularly to verify the relationships of the Polynesians and Micronesians within our dataset, we performed a separate “supervised” STRUCTURE analysis [28,36], where the individual Mãori, Samoan, and Micronesian genotypes were distributed across eight representative populations (Taiwan Aborigines, East Asians, Europeans, and the Near Oceanic New Guinea, Ata, Baining, Kuot, and Aita). The results, shown in Figure S3A, underline the primary affinity of the Mãori, Samoans, and Micronesians to Taiwan Aborigines and secondarily to East Asians, with lesser suggestions of links to Europeans and New Ireland/New Britain (there is no suggestion of any Bougainville or Baining tie). In a second “supervised” STRUCTURE analysis where a ninth population was specified but not associated with a particular group a priori, the Polynesians/Micronesians constituted the largest proportion of this cluster (Figure S3B). Of the three populations in question, the Mãori had the smallest signal of external relationship, consistent with their extensive genetic drift, and the Micronesian group has the largest signal (to Taiwan, East Asia, New Guinea, and New Ireland/New Britain).

Figure 8 shows the distribution within Northern Island Melanesian populations of the STRUCTURE clustering probabilities for K = 10 in pie-chart form (some populations from the same language groups with very similar probability profiles were merged). Neighboring groups tended to share similar profiles. New Britain, the largest and most rugged island, had the greatest internal differentiation, with five different assigned clusters at >50% probabilities in different populations. Bougainville groups had two common cluster assignments, while there was only one common cluster in New Ireland.

Figure 9 shows the unrooted neighbor-joining tree for the East Asia–Pacific populations from a pairwise FST coancestry distance matrix for 687 microsatellites (the pairwise FST values are in Table S7). Bootstrap values for the branches, generated with the PHYLIP program from population allele frequencies for 100 different trees, are indicated by branch thicknesses. As shown, most of the trunk elements had high bootstrap values, as did a number of branches within Northern Island Melanesian groups. By contrast, the mainland East Asian group relationships were considerably more ambiguous, their branches were shorter, and only the Taiwan Aborigines had a strong internal branch. The tree branching again closely reflected the clustering in STRUCTURE, indicated by the corresponding colors from K = 10. The populations with the longest branches were those with the largest ancestral proportions assigned to single STRUCTURE clusters, and had the lowest heterozygosities. These populations tend to be Papuan-speaking groups in island interiors. The STRUCTURE analysis specifies the role and nature of admixture in a way that a population-based tree cannot.

thumbnail

Figure 9. Pacific Population Tree

Neighbor-joining FST-based tree for 687 microsatellites from the Pacific, East Asia, and French populations, with the range of bootstrap values indicated by branch thicknesses. Colors are the same as in the STRUCTURE analysis at K = 10. New Britain populations are circled. Papuan-speaking groups are in bold italics; inland groups in Melanesia have asterisks. Abbreviated names are spelled out in Table S1.

doi:10.1371/journal.pgen.0040019.g009

The AMOVA, STRUCTURE, and population tree analyses were all driven by large distinctions in allele frequencies, rather than by the presence of private alleles in one population or another, since these generally occur in very low frequencies. In the first publication on the global HGDP-CEPH set of 377 microsatellites, Rosenberg et al. quantified continental relationships independent of the STRUCTURE analysis by showing the number of alleles that were only present in one continent, shared by two, by three, etc. [24]. The pattern of specific allele sharing was taken to indicate greater African heterogeneity, and that allele sharing was least for the Americas and for the two “Oceanic” groups.

With our enlarged dataset and microsatellite coverage, we also compared patterns of private alleles and allele sharing between regions (Table 3). We recovered 271 Melanesian-specific alleles, which in raw numbers actually exceeded those for Africa. Correcting for sample sizes, the rate of Melanesian-specific alleles was at the high end of the range for the major regions except for Africa. The number of alleles missing from only one continent, also given in Table 3, shows the dramatic effect of genetic drift on the American populations. The number of shared alleles between pairs of regions is shown in Table 4, with the correction for sample sizes in Table 5. All non-African regions including Melanesia shared the most alleles with Africa, indicating they were primarily subsets of African diversity. Melanesia shared more alleles with East Asia than with any other non-African region, but they cannot simply be viewed as an extension or subset of East Asian diversity. When Papuan and Oceanic speaking groups in Melanesia were analyzed separately, the Papuan-speaking groups showed greater isolation, as they shared fewer alleles with all other regions than did Oceanic speaking groups (unpublished data).

thumbnail

Table 3.

Private and Missing Alleles by Continent

doi:10.1371/journal.pgen.0040019.t003
thumbnail

Table 4.

Bi-Continental Allele Sharing

doi:10.1371/journal.pgen.0040019.t004
thumbnail

Table 5.

Bi-Continental Allele Sharing, Corrected by Combined Sample Sizes

doi:10.1371/journal.pgen.0040019.t005

Discussion

Language and Genetic Correspondences

Our study suggests that in the Pacific, and specifically in Near Oceania, there is only a modest association between language and genetic affiliation. Oceanic languages were introduced and dispersed around the islands within the last 3,300 years, but there was apparently only a small infusion of accompanying “Austronesian” ancestry that has survived. Approximately one-half of the Oceanic-speaking groups in Melanesia had an identifiable “Austronesian” genetic signature (see Figure 7 and Table S8). In each case where there was such an “Austronesian” signature, at least two other cluster assignments had probabilities higher than the “Austronesian” one (see, in Figure 6, the Saposa and Teop of Bougainville; the Mussau and Tigak in New Ireland Province; and the Kove, Mangseng, Melamela, and Nakanai Bileki of New Britain). On the other hand, the Oceanic-speaking groups without the “Austronesian” signature were often genetically indistinguishable from their immediate Papuan-speaking neighbors (in New Britain, the Mamusi have no Austronesian signature, but they and the Nakanai Loso cluster closely with their Papuan-speaking Ata neighbors; the Nalik, Notsi, and Madak of New Ireland are genetically indistinguishable from their Papuan-speaking Kuot neighbors; the Tolai and Lavongai profiles suggest significant intermixture, but only between different Papuan-speaking groups). The result suggests that Oceanic languages were adopted by many formerly Papuan-speaking groups, while at the same time there was little genetic influence or marital exchange. At least in Near Oceania, rates of language borrowing and language adoption have been faster and more pervasive than rates of genetic admixture.

Melanesians in the Global Context

However it is measured, genetic variation is reduced within Melanesian populations (Figure 2), while the genetic divergences among them are very large (refer to Figures 6, 8, and 9 and to Tables 15). The size of the differences among the populations would appear to equal or surpass those among populations across East Asia, Europe, or even Africa. However, the large Melanesian population distinctions are a direct consequence of their very low levels of internal variation or heterozygosity. These low levels will directly inflate both the proportion of among group variation in AMOVA and also pairwise FST genetic distances (for a full discussion of this point, see especially [27] and also [ 26,37]). As population heterozygosities decrease, pairwise FSTs should increase because of this intrinsic mathematical relationship. This is illustrated by our global and Near Oceania datasets (Figure 10A and 10B). Those pairwise FSTs involving the Bantu South population (which has a heterozygosity approaching 1.0) are plotted against the heterozygosities of each population, and the resulting correlations approach 1.0.

thumbnail

Figure 10. The Correlation between Genetic Distances and Heterozygosity

The genetic distances used were the set of pairwise FSTs involving Bantu South (the population with the highest heterozygosity), highlighted in Table S4.

(A) The combined global dataset.

(B) Details for Melanesia.

doi:10.1371/journal.pgen.0040019.g010

Our Structure and tree analyses of the combined microsatellite datasets indicate that Melanesians are quite far removed from Africans, in spite of their superficial similarities in hair form and skin pigmentation [38]. In the initial analysis of the HGDP-CEPH dataset, the placement of the two Melanesian (“Oceanic”) groups was different. There, they split from Eurasia before Asians and Native Americans [39]. This also differed from the result of a genome-wide SNP study [40] on a very small world-wide dataset. The extreme positioning of Melanesians in our tree was not due to our over sampling. Rather, our extensive coverage of Melanesian variation has enabled a clearer resolution of their relationships with populations outside the region.

The Causes of Melanesian (Near Oceanic) Diversity

The pattern of Near Oceanic diversity has been made clear. The AMOVA analysis of the microsatellites showed that the larger and more rugged the island, the greater the differentiation among population