#416173

Anonymous
Quote:
That map is a slightly tweaked version off the one from 23andme (shown below) and is nonsense. No Serbs cluster with Greeks and Albanians as the map shows.

The reference samples were taken from Population Reference Sample (POPRES),  Human Genome Diversity Project and  Leipzig sample (LPZ) Veeramah et al. (p. 996).
POPRES sample was described in another article 'Genes mirror Geography within Europe' (2008) Novembre et al. I am attaching documents for 'Genes mirror Geography within Europe' by Novembre et al. including supplementary and 'Genetic variation in the Sorbs of eastern Germany' by Veeramah et al.

Discussion in 'Supplementary for Genes mirror Geography within Europe', Novembre et al.

-Individual 44556 has 4 Italian grandparents, but was born in France, speaks French, and clusters with French individuals.
-Individual 7147 has 4 Russian grandparents, but was born in Romania, speaks Romanian, and is placed between Switzerland and Romania in the PC1-PC2 plot.
– Individual 43874 has 4 Swiss grandparents, but was born in Italy, speaks Italian, and clusters with Italian individuals.
– Individual 14215 has 4 Swiss grandparents but has parents who are Italian, speaks Italian, and clusters Italian. Israel is the individual’s country of birth.
-Individual 34088 has 4 German GPs but was born in Hungary, speaks Hungarian, and interestingly clusters with Italian individuals.
-The cluster of 5 outlier Italian individuals located well “southwest” of Italy, includes 3 individuals with unobserved grandparental origins and the other 2 have all four grandparents from Italy. Three of the five speak Italian, the other two have unobserved language data. Notably one of the individuals is from the LOLIPOP sub-study and the others are from Lausanne – so both studies identified these outlier Italian individuals, making it unlikely to be the result of some artifact that occurred within one of the two sub-studies from which we draw our data.
-Individual 13011 was born in Slovakia but has no observed grandparental or language information.

In addition, there were several small samples in PROPES project

Many of the empirical outliers are pairwise comparisons involving either Slovakia (SK) or Russia (RU). In addition, even after excluding Russia and Slovakia (Supplementary Fig. 5), many of the pairwise comparisons with large residuals involve comparisons with countries that have small sample sizes, [e.g., Kosovo (KS), Slovenia (SI), Scotland (Sct), Finland (FI), Cyprus (CY), Yugoslavia (YG), Croatia (HR)]. This suggests that outlier points are simply due to sampling variation, and not strong departures from a general model where PC1-PC2 position is principally determined by geography.

Then, authors used a bootstrapping a technique often employed by statisticians to address the problem of small size samples inducing average results based on a probability distribution.

The conclusion in 'Genes mirror geography within Europe' Novembre et al. study

In conclusion, the plots of country PC1-PC2 position vs. geographic position have no obvious outliers that cannot be explained plausibly by small sample size and/or the pitfalls of assuming a single proxy location for a large country (e.g. Russia). While there may be more subtle signals of unique population history in the data, the absence of empirical outliers from well-sampled countries suggests that the dominant signal in the data is that the genetics of European populations mirrors their geography.

Veeramah et al. in 'Genetic variation in the Sorbs of eastern Germany' produced 4 PCA plots in total (p. 997)

-Two PCA plots based on PCA analysis with and without POPRES/LPZ merge
-Two PCA plots using a bootstrapping technique with and without POPRES/HGDP merge

The two PCA plot I put above are based on a bootstrapping technique as in the other two plots it’s difficult to make out between ethnicities on the plots The four plots are similar anyway as mentioned by Novembre et al.

The obvious relevant outliers are Slovakian individual located in the bottom right corner and Russian individual. I would assume the plot you obtained from 23andme site has also come from 'Genes mirror Geography within Europe' by Novembre et al. study, as it’s been utilised by other researchers.
In fact, the article 'Genes mirror Geography within Europe' by Novembre et al. from which PCA plots were taken was cited around 285 times in other published scientific studies. So, it proved to be a popular scientific study and we may find the same reference European populations being used in other studies.  However, the abstract and the plots of the study I put above was specific to Lusatian Sorbs as they were sampled seperately. Any conclusion in regards to genetic proximity between other ethnicities on PCA plots should be taken with caution.