The Grand Locus / Life for statistical sciences


the Blog

ENCODE data, Principal Components and racism

“Thinking is classifying” wrote Georges Clémenceau*. This tells, in simple words, everything about the obsession of the human mind to keep things tidy. No surprise we ask computers a little help here and there. Is this email spam? Is this online user human? Is this text written by that author? Training machines to put things into the boxes created by our human mind is called supervised learning and it can be very lucrative. But what about the more philosophical cases where machines make their own boxes? Can we reverse the process and put things in boxes created by computers? Unsupervised learning, as it is called, creates a lot of interesting problems where we, humans, are left wondering whether the boxes make any sense.

The mother of all classification techniques is undisputedly Principal Component Analysis (PCA). But let me reassure those who hate PCA and those who never heard of it: I will just touch the surface, and then very briefly. PCA automatically arranges similar items close to each other on a plane. The rest is up to you. Similarity, in particular, depends on a bunch of arbitrary features, size, height, number of legs... In a classical introductory...

Genetics and racism (2)

In the first post of this series on genetics and racism, I explained how Richard Lewontin concluded from his work on human diversity that human races are of no value for taxonomy (the classification of living begins). This view was later criticized and even termed Lewontin's fallacy by A. W. F. Edwards. Yet, nobody ever doubted that Lewontin was honest in his approach. But more recently came another case that gives the shivers. The great Stephen Jay Gould, the author of the acclaimed Mismeasure of Man was accused of data manipulation.

The mismeasure of Gould

Stephen Jay Gould was this kind of scientist who pops up everywhere. I discovered him in a comment about the opinion of the Vatican on Evolution, others knew him for his statistical analyses of baseball records, while he was actually a paleontologist, author of the theory of punctuated equlibrium. But his most famous work is undoubtedly The Mismeasure of Man.

Like the author, the book is a strange chimera, somewhere in between scientific research and history, with a touch of lyricism. The Mismeasure of Man is a journey through the differences between people, or more precisely through the scientific discourse over this...

the Blog
Best of

the Lab
The team
Research lines
Work with us

Blog roll
Simply stats
Bits of DNA