The Grand Locus / Life for statistical sciences


the Blog

ENCODE data, Principal Components and racism

“Thinking is classifying” wrote Georges Clémenceau*. This tells, in simple words, everything about the obsession of the human mind to keep things tidy. No surprise we ask computers a little help here and there. Is this email spam? Is this online user human? Is this text written by that author? Training machines to put things into the boxes created by our human mind is called supervised learning and it can be very lucrative. But what about the more philosophical cases where machines make their own boxes? Can we reverse the process and put things in boxes created by computers? Unsupervised learning, as it is called, creates a lot of interesting problems where we, humans, are left wondering whether the boxes make any sense.

The mother of all classification techniques is undisputedly Principal Component Analysis (PCA). But let me reassure those who hate PCA and those who never heard of it: I will just touch the surface, and then very briefly. PCA automatically arranges similar items close to each other on a plane. The rest is up to you. Similarity, in particular, depends on a bunch of arbitrary features, size, height, number of legs... In a classical introductory...

Genetics and racism (3)

In the previous posts of this series on genetics and racism, I talked about two recent academic disputes over human races. With this post I hope to give a wider overview of what biology has to say about species, breeds and races.

Darwin’s pigeons

Modern genetics was born in 1900 with the re-discovery of Mendel's laws. Since the Neolithic Revolution, genetics had been an empirical art. Our ancestors isolated most of the breeds of animals and plants that we know today, i.e. groups that carry a trait of interest to the next generation when crossed together (for instance Chihuahuas are small dogs and Great Dane are large dogs).

But over the generations, pedigrees got lost in the myst of time and the overwhelming differences between some breeds of the same species raised the question whether they share the same natural origin. Before Darwin, it was difficult to imagine that the Chihuahuas and the Great Dane would have a common ancestor, and the theory went that breeds actually came from different species. This is actually one of the first questions tackled by Darwin in The Origin of Species. In the following passage, he exposes his...

Genetics and racism (2)

In the first post of this series on genetics and racism, I explained how Richard Lewontin concluded from his work on human diversity that human races are of no value for taxonomy (the classification of living begins). This view was later criticized and even termed Lewontin's fallacy by A. W. F. Edwards. Yet, nobody ever doubted that Lewontin was honest in his approach. But more recently came another case that gives the shivers. The great Stephen Jay Gould, the author of the acclaimed Mismeasure of Man was accused of data manipulation.

The mismeasure of Gould

Stephen Jay Gould was this kind of scientist who pops up everywhere. I discovered him in a comment about the opinion of the Vatican on Evolution, others knew him for his statistical analyses of baseball records, while he was actually a paleontologist, author of the theory of punctuated equlibrium. But his most famous work is undoubtedly The Mismeasure of Man.

Like the author, the book is a strange chimera, somewhere in between scientific research and history, with a touch of lyricism. The Mismeasure of Man is a journey through the differences between people, or more precisely through the scientific discourse over this...

Genetics and racism (1)

Important note: Please read the Erratum at the end of the post.

Tolstoy’s remorse

It is 1879. Leo Tolstoy, then rich and famous for War and Peace and Anna Karenina works on another kind of text. In A Confession he explains at length that he regrets writing those novels. The focus of his remorse and his anger towards himself is the heart of his talent, this innate sense of human nature. Tolstoy's pen had no equal when it came to paint the Russian society of the time, its characters and its culture. However, he explains that this attitude towards writing is wrong, because he has been telling without preaching, he has been describing without judging. He will even abandon the royalties of War and Peace and Anna Karenina, refusing to earn money from such immoral writings.

We were all then convinced that it was necessary for us to speak, write, and print as quickly as possible and as much as possible, and that it was all wanted for the good of humanity. And thousands of us, contradicting and abusing one another, all printed and wrote — teaching others. And without noticing that we knew nothing, and that...

the Blog
Best of

the Lab
The team
Research lines
Work with us

Blog roll
Simply stats
Bits of DNA