The Grand Locus / Life for statistical sciences

the Blog

## Who understands the histone code?

The most annoying thing about us biologists is that we keep using words that we don’t understand. “Epigenetics” is one of those that has drawn my attention for several years, as I explained in my last post. I suggested that the invasion started in 2001, the year that the histone code hypothesis was proposed by Thomas Jenuwein and David Allis in a seminal paper entitled Translating the Histone Code.

The histone code hypothesis was arguably the most influential concept of the last decade in molecular biology. Yet, most biologists would be hard pressed to say what the hypothesis is. It should not be that difficult, all you have to do is look up what Thomas Jenuwein and David Allis actually wrote. But believe it or not, this blog is one of the only places on the Internet where you will find the histone code hypothesis spelled out clearly. Most sources, including the Wikipedia article diverge substantially from the original statement.

Distinct qualities of higher order chromatin, such as euchromatic or heterochromatic domains, are largely dependent on the local concentration and combination of differentially modified nucleosomes.

DNA in the nucleus comes in a structure called the nucleosome. The...

## The rise of epigenetics

I started to study biology at the time epigenetics became a buzzword. I first heard the term at university in 2001, and as many young enthusiastic people of the time, I did my PhD on epigenetics because it was cool. But buzzes come and go, I finished my PhD and I got bored with epigenetics. Meanwhile, I thought that my interest had been mirroring that of the community, and that the trend was towards a loss of interest for epigenetics. I was about to write a blog post entitled “The death of epigenetics” when I did a quick PubMed search and realized that the peak of popularity was... 2013. Epigenetics is not dead, it is on the rise!

Above is the number* of PubMed hits for “epigenetics” per month since 1996, with “chromatin” shown as a reference for comparison. PubMed now displays a histogram of the occurrence of your search term over the years (check here for epigenetics). The growth is not due to articles published in late-adopting journals, since the trend-setters Cell, Nature and Science published more than half of their papers labelled “epigenetics” in the last three years.

One of...

## Meet planktonrules

Some of you may remember planktonrules from my series on IMDB reviews. For those of you who missed it, planktonrules is an outlier. In my attempt to understand what IMDB reviewers call a good movie, I realized that one reviewer in particular had written a lot of reviews. When I say a lot, I mean 14,800 in the last 8 years. With such a boon, I could not resist the temptation to use his reviews to analyze the variation of style between users, and to build a classifier that recognizes his reviews.

I finally got in contact with Martin Hafer (planktonrule’s real name) this year, and since he had planned to visit Barcelona, we set up a meeting in June. I have to admit that I expected him to be a sort of weirdo, or a cloistered sociopath. The reality turned out to be much more pleasant; we had an entertaining chat, speaking very little about movie reviews. He also pointed out to me that doing statistics on what people write on the Internet is a bit weird... True that.

Anyway, as an introduction, here is a mini interview of planktonrules. You can find out more...

## How to stop sucking and be awesome instead

I recently gave a motivation speech at the CRG/Institut Curie international PhD retreat. There was only one slide and the content was fairly general, so I thought I could reproduce it here. My goal was to motivate people, but also to surprise them a little, especially at the end. Finally, I wish such a nice title were mine, but I have to acknowledge Jeff Atwood. I stole it from his post on Coding Horror (which I also invite you to read).

### How to stop sucking and be awesome instead

Think about what we can do today. We can send people on the moon. We can talk to each other any time anywhere on the planet. We can go anywhere in about a day. We can transplant a heart. We can cure diseases that were fatal only 30 years ago. And yet, there is still one thing that we cannot do. We don’t know how to motivate people.

That’ right, we do not know how to make our colleagues enthusiastic about their work. If you watch a couple of TED videos or if you read a couple of books on management, you will see that we all...

## 100% non functional

### Panglossian genomics

As most French students of my generation, I had to study Candide, a short philosophical novella written by Voltaire. Back then, I was convinced that Voltaire was an arrogant prick, and I never imagined that his dumb criticism of Leibniz's theory of pre-established harmony, which he barely understood, would ever echo in my work as a biologist.

But here we are, years have passed, I have made peace with Voltaire, and the ENCODE consortium has issued its major and controversial statement that they find “biochemical functions for 80% of the genome”. As the arguments and the comments flow on the blogs and in the academic press, I cannot help thinking about the words of Dr. Pangloss – incarnating narrow optimism.

Observe, for instance, the nose is formed for spectacles, therefore we wear spectacles. The legs are visibly designed for stockings, accordingly we wear stockings.

What I will call the Panglossian reading of the “80% functional” statement above is the idea that 80% of the genome is meant to be the way it is. The architecture of a given locus is somehow designed to produce what happens there (transcription, transcription enhancing, transcription factor binding etc). Notice...

Everybody in the academia has a story about reviewer 3. If the words above sound familiar, you will definitely know what I mean, but for the others I should give some context. No decent scientific editor will accept to publish an article without taking advice from experts. This process, called peer review, is usually anonymous and opaque. According to an urban legend, reviewer 1 is very positive, reviewer 2 couldn't care less, and reviewer 3 is a pain in the ass. Believe it or not, the quote above is real, and it is all the review consists of. Needless to say, it was from reviewer 3.

For a long time, I wondered whether there is a way to trace the identity of an author through the text of a review. What methods do stylometry experts use to identify passages from the Q source in the Bible, or to know whether William Shakespeare had a ghostwriter?

### The 4-gram method

Surprisingly, the best stylistic fingerprints have little to do with literary style. For instance, lexical richness and complexity of the language are very difficult to exploit efficiently. The unconscious foibles...

## Genetics and racism (3)

In the previous posts of this series on genetics and racism, I talked about two recent academic disputes over human races. With this post I hope to give a wider overview of what biology has to say about species, breeds and races.

### Darwin’s pigeons

Modern genetics was born in 1900 with the re-discovery of Mendel's laws. Since the Neolithic Revolution, genetics had been an empirical art. Our ancestors isolated most of the breeds of animals and plants that we know today, i.e. groups that carry a trait of interest to the next generation when crossed together (for instance Chihuahuas are small dogs and Great Dane are large dogs).

But over the generations, pedigrees got lost in the myst of time and the overwhelming differences between some breeds of the same species raised the question whether they share the same natural origin. Before Darwin, it was difficult to imagine that the Chihuahuas and the Great Dane would have a common ancestor, and the theory went that breeds actually came from different species. This is actually one of the first questions tackled by Darwin in The Origin of Species. In the following passage, he exposes his...

## ... and academic reprints for all

Like many other academic journals, Molecular and Cellular Biology takes copyrights very seriously. And to trace the criminals who share scientific publications funded by public institutions, they add to the margin of the pdf reprints downloaded from their website the date and the identity of the license owner.

I recently heard that some people downloaded and installed the pdf toolkit pdftk and at the Linux terminal issued a command like the one below, where they replaced article.pdf by the name of the pdf they had downloaded.

pdftk article.pdf output uncompressed-article.pdf uncompress

Using their text editor, they opened the uncompressed pdf file and looked for lines like the ones below and commented them out with a % sign.

10 0 0 10 0 0 cm BT/R19 11 Tf0 -1 1 0 579.5 456.847 Tm[( on some day by Institution of the Evil Person)556]TJ-94.148 0 Td[(http://mcb.asm.org/)278]TJ-89.2543 0 Td[(Downloaded from )278]TJET

They then ran pdftk again to fix the pdf document, and the download information was gone.

pdftk uncompressed-article.pdf output stripped-article.pdf

Needless to say, doing...

## Genetics and racism (2)

In the first post of this series on genetics and racism, I explained how Richard Lewontin concluded from his work on human diversity that human races are of no value for taxonomy (the classification of living begins). This view was later criticized and even termed Lewontin's fallacy by A. W. F. Edwards. Yet, nobody ever doubted that Lewontin was honest in his approach. But more recently came another case that gives the shivers. The great Stephen Jay Gould, the author of the acclaimed Mismeasure of Man was accused of data manipulation.

### The mismeasure of Gould

Stephen Jay Gould was this kind of scientist who pops up everywhere. I discovered him in a comment about the opinion of the Vatican on Evolution, others knew him for his statistical analyses of baseball records, while he was actually a paleontologist, author of the theory of punctuated equlibrium. But his most famous work is undoubtedly The Mismeasure of Man.

Like the author, the book is a strange chimera, somewhere in between scientific research and history, with a touch of lyricism. The Mismeasure of Man is a journey through the differences between people, or more precisely through the scientific discourse over this...

## Genetics and racism (1)

Important note: Please read the Erratum at the end of the post.

### Tolstoy’s remorse

It is 1879. Leo Tolstoy, then rich and famous for War and Peace and Anna Karenina works on another kind of text. In A Confession he explains at length that he regrets writing those novels. The focus of his remorse and his anger towards himself is the heart of his talent, this innate sense of human nature. Tolstoy's pen had no equal when it came to paint the Russian society of the time, its characters and its culture. However, he explains that this attitude towards writing is wrong, because he has been telling without preaching, he has been describing without judging. He will even abandon the royalties of War and Peace and Anna Karenina, refusing to earn money from such immoral writings.

We were all then convinced that it was necessary for us to speak, write, and print as quickly as possible and as much as possible, and that it was all wanted for the good of humanity. And thousands of us, contradicting and abusing one another, all printed and wrote — teaching others. And without noticing that we knew nothing, and that...