The Grand Locus / Life for statistical sciences



What is bioinformatics about?

A brief note published a few weeks ago initiated a discussion on the blogosphere about who is a bioinformatician and who is not. According to Wikipedia

bioinformatics combines computer science, statistics, mathematics, and engineering to study and process biological data.

To see how the community defines itself, I downloaded the abstracts of Bioinformatics from 2014 (for a total around 1000 articles, extracted the most relevant keywords and put the top 100 terms in a word cloud where a word size shows its frequency*.

Obviously, bioinformatics is about data, mostly gene/protein sequences and expression. It is also good to know that bioinformatics likes genomes and networks, and that it has more affinities with structural biology than with evolution.

The favourite organism of bioinformaticians is Homo sapiens, actually it is the only one mentioned in the word cloud, and when bioinformaticians work on a disease, it is cancer.

When bioinformaticians describe their work, it is “novel” and “new”, and what they talk about is “biological”, “different”, “multiple” and “single” (the last two are usually followed by “sequence alignment” and “nucleotide polymorphism”). It is also “functional” and “available”, but somewhat less. I expected it to be “fast” and “accurate”, but it seems that novelty is more important.

According to this word cloud, bioinformaticians are more concerned with proteins than with DNA (or RNA), with predictions and methods than with quality, with datasets than with research and with information than with algorithms.

This gives an idea of who bioinformaticians are, or at least what they claim to do.

* I used the Alchemy API, which does a very good job at extracting relevant keywords from a text. There were 1467 articles published in Bioinformatics in 2014, but Alchemy grants only 1000 free API calls per day. I kept only the keywords with higher than 80% relevance, which usually consist of several tokens, such as "piRNA-transposon interaction information" or "protein structures". The word cloud contains the 100 most abundant tokens of this list.

« | »

blog comments powered by Disqus

the Blog
Best of

the Lab
The team
Research lines

Blog roll
Simply Stats
Ivory Idyll
Bits of DNA