The Grand Locus / Life for statistical sciences


the Blog

What is bioinformatics about?

A brief note published a few weeks ago initiated a discussion on the blogosphere about who is a bioinformatician and who is not. According to Wikipedia

bioinformatics combines computer science, statistics, mathematics, and engineering to study and process biological data.

To see how the community defines itself, I downloaded the abstracts of Bioinformatics from 2014 (for a total around 1000 articles, extracted the most relevant keywords and put the top 100 terms in a word cloud where a word size shows its frequency*.

Obviously, bioinformatics is about data, mostly gene/protein sequences and expression. It is also good to know that bioinformatics likes genomes and networks, and that it has more affinities with structural biology than with evolution.

The favourite organism of bioinformaticians is Homo sapiens, actually it is the only one mentioned in the word cloud, and when bioinformaticians work on a disease, it is cancer.

When bioinformaticians describe their work, it is “novel” and “new”, and what they talk about is “biological”, “different”, “multiple” and “single” (the last two are usually followed by “sequence alignment” and “nucleotide polymorphism”). It is also “functional” and “available”, but somewhat less. I expected it to be “fast” and “accurate”, but...

A flurry of copycats on PubMed

It started with a search for trends on PubMed. I am not sure what I expected to find, but it was nothing like the “CISCOM meta-analyses”. Here is the story of how my colleague Lucas Carey (from Universitat Pompeu Fabra) and myself discovered a collection of disturbingly similar scientific papers, and how we got to the bottom of it.

Pattern breaker

CISCOM is the medical publication database of the Research Council for Complementary Medicine. Available since 1995, it used to be mentioned in 2 to 3 papers per year, until Feburary 2014 when the number of hits started to skyrocket. Since then, “CISCOM” surfs a tsunami of one new hit per week.

But this is not what drew my attention, such waves are not unheard of on PubMed. For instance, the progression of CRISPR/Cas9, is more impressive. It was the titles of the hits that convinced me that something fishy was going on: all of them are on the model “something and something else: a meta-analysis”.

The strange pattern caught my attention, but I somehow missed its significance and put this in the back of my mind. It was only later that Lucas convinced me...

Detecting trends in culture

On June 28, 1914, Archduke Franz Ferdinand was assassinated in Sarajevo. One month later, Austria-Hungary declared war on Serbia, to which Russia responded by declaring war on Austria-Hungary, forcing its allies France and Great Britain into the war. In the aftermath, Germany honoured its defensive pact with Austria-Hungary and declared war on France, plunging Europe in a chaos that nobody had predicted.

Cliodynamics, the mathematical approach to History, still has a long way to go to reach the accuracy of Isaac Asimov’s fictive psychohistory. Its closest non science-fiction relative, culturomics, relies on the idea that historical trends are accessible through the digital literature. As Jean-Baptiste Michel explains on TED, the course of History leaves a strong mark on the things we write about, and on the way we write about them.

But historical events are not the only thing we write about. The digital records are mostly about anything we find interesting. Knowing what is talked about is not science-fiction, it is actually fairly easy. More challenging is to know whether a topic is currently on the rise or merely fluctuating, which is a changepoint detection problem. Research on changepoint problems...

The rise of epigenetics

I started to study biology at the time epigenetics became a buzzword. I first heard the term at university in 2001, and as many young enthusiastic people of the time, I did my PhD on epigenetics because it was cool. But buzzes come and go, I finished my PhD and I got bored with epigenetics. Meanwhile, I thought that my interest had been mirroring that of the community, and that the trend was towards a loss of interest for epigenetics. I was about to write a blog post entitled “The death of epigenetics” when I did a quick PubMed search and realized that the peak of popularity was... 2013. Epigenetics is not dead, it is on the rise!

Above is the number* of PubMed hits for “epigenetics” per month since 1996, with “chromatin” shown as a reference for comparison. PubMed now displays a histogram of the occurrence of your search term over the years (check here for epigenetics). The growth is not due to articles published in late-adopting journals, since the trend-setters Cell, Nature and Science published more than half of their papers labelled “epigenetics” in the last three years.

What is epigenetics anyway?

One of...

What’s in a title?

Trying to come up with a name for the blog, I wondered what a good title should be. If you ever wrote a scientific article, you probably found yourself in the same situation. You try to surf the trend, mix in carefully selected buzzwords and present the work under its sexiest side. Sexy, that is, to the veterans. Admittedly, not everyone will crave to read “Epithelial cell adhesion molecule (EpCAM) complex proteins promote transcription factor-mediated pluripotency reprogramming” (no offense intended, I just took the first title that showed up in PubMed).

Meta-analysis of scientific literature tells us a lot about how science and scientific discourse change over time. A simple title word analysis of the articles published in Nature in an 8 year interval shows how some topics fell from grace, whereas others rose to the top.

The struggle-for-hype allows us to tell what scientists and editors find exciting at a given time. To play with this idea, I collected all the titles of the Nature articles published in 2002 and in 2010, and ran Wordle on them. The size of a word in the cloud is proportional to its occurrence in the corpus...

the Blog
Best of

the Lab
The team
Research lines
Work with us

Blog roll
Simply stats
Bits of DNA