Nelis, M et al.  2009, 'Genetic structure of Europeans: a view from the North-East', PLoS One, vol. 4, no. 5, p. e5472.


Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by more than 270,000 single nucleotide polymorphisms (SNPs) genotyped with the Illumina Infinium platform. In cohorts where the sample size was >100, one hundred randomly chosen samples were used for analysis to minimize the sample size effect, resulting in a total of 1,564 samples. This analysis revealed that the genetic structure of the European population correlates closely with geography.


The 1,090 Estonian samples were selected from 10,317 samples of the Estonian biobank (year 2005). Eighty samples (40 males and 40 females) were selected randomly, according to the place of birth, from each of 13 Estonian counties (Harju, Ida-Viru, Jõgeva, Järva, Lääne-Viru, Põlva, Pärnu, Rapla, Saaremaa, Tartu, Valga, Viljandi, Võru), and 50 samples (25 males and 25 females) were selected from the combined Hiiumaa and Läänemaa counties. Prior to collection approval from the Ethics Committee of the Estonian biobank was obtained.

The Latvian samples were selected from a population-based collection that is part of the Genome Database of Latvian Population, the national biobank. Participants were randomly selected from a group of individuals whose nationality and both of their parents’ nationalities were reported as Latvian. Participants were recruited through general practitioners and were required to be older than 18 years, without any chronic disorder; anthropometric measurements (including weight and stature), ethnic, social, environmental information and familial health status were acquired from self-reported questionnaire. The study protocol was approved by Central Medical Ethics Committee of Latvia.

Peripheral blood samples were collected from unrelated individuals from six ethno-linguistic groups of Lithuania. Informed consent and information about birthplace, parents and grandparents were obtained from all donors. Approval was obtained from the Bioethics Committee of Lithuania prior to collection.

The Polish sample was randomly selected from a population-based collection of adults (with sex ratio 1:1) representing Polish ethnic group from the West-Pomeranian region of Poland. The participants were randomly selected from the patient roles of participating family physicians.

The Russian samples were obtained from healthy donors from the Andreapol district of the Tver region; all individuals were unrelated to each other and represented the native ethnic group of the region (i.e. they belonged to at least the third generation living in a particular geographic region).

See supplementary table S1 for description of other populations

Principal component (PC) and multidimensional scaling (MDS) analysis, pair-wise Fst index

Multidimensional scaling plot of the studied European individuals. Larger resolution image.

Pair-wise Fst between European samples. Larger resolution image

