The Grand Locus / Life for statistical sciences



Bioinformatics without Excel

About a year after setting up my laboratory, an observation suddenly hit me. All the job applicants were biologists who wanted to do bioinformatics. I was myself trained as an experimental biologist and started bioinformatics during my post-doc. They saw in my laboratory the opportunity to do the same. Indeed, “how did you become a bioinformatician?” is a question that I hear very often.

For lack of a better plan, most people grab a book about Linux or sign up for a Coursera class, try to do a bit every day and... well, just learn bioinformatics. I have seen extremely few people succeed this way. The content inevitably becomes too difficult, motivation decreases and other commitments take over. I will not lie, self-learning bioinformatcs is hard and it is frustrating... but it can be fun if you know how to do it. And most importantly, if you understand your worst enemy: yourself.

Here is a small digest of how it happened for me. I do not mean that this is the only way. I simply hope that this will be useful to those who seriously want to dive into bioinformatics.

Step 1. Get out of your comfort zone

I am happy to celebrate my fifth year without using Excel (actually I don’t even have it). What is wrong with Excel? Not much really. Only that it keeps you thinking like a non bioinformatician. Bioinformatics is not a tool, it is another way of thinking, and if you want to acquire it you have to let go of the other ways.

Most people give up when they get lost. Ironically, this is the moment when they were finally learning something new. If you feel totally incompetent, you are doing it right. The risk is to seek reassurance by doing things like you are used to (say, using Excel). There is nothing wrong about being a beginner, even if you are an expert at something else. If you want to dive, you first have to sink.

Getting out of my comfort zone happened naturally at the beginning of my post-doc. I found a Linux computer on my desk, I was too proud to ask for a Mac instead. All I managed to do in the first week is to change my username from user to guillaume. Things did not look very promising...

Step 2. Become addicted

Seriously? Yes! If you are about to learn something as demanding as bioinformatics, you need to spend the time it deserves, and the only way it can happen is if it becomes an addiction.

By addiction, here I mean anything that takes self control in order to stop. Who addicted to bioinformatics exercises? Nobody, so this is not the right way. You do not need to like it in order to become addicted to something, you just need to want it. The best way is to find a problem that really fascinates you and to try to solve it using bioinformatics. This could be anything, as long as you are really interested in finding the solution.

The first problem that caught my attention was to identify ‘sticky proteins’ in two-hybrid datasets. Nothing particular, neither difficult, I just got into it. It took me an incredible amount of sweat to write a few lines of lousy Perl code, but in the end it worked. This was the first of a long series of addictions at the computer. I enjoyed each and every one of them, and I still do.

Step 3. Join the community

You made it till here, well done! There is only one last roadblock on the way, but it is a bulky one. So far you have been self-learning, it is now time to non self-learn. Actually, it is time to learn bioinformatics because let’s be honest, you haven’t learned anything about it yet.

Learning by doing and by solving problems that you care about is the right attitude, but you cannot reinvent 50 years of work on your own. Solving your problems will give you an impression of competence, which will give you the motivation you need to carry on... but most likely you are doing it all wrong. Getting out of the Dunning-Kruger hole is always hard and there is only one way to do it: expose yourself.

Or more accurately, expose your work. This will hurt in the beginning, but you need to share your work with experts, or at least people who know better, and soak the negative feedback. You also need to read their code and take inspiration from them. Reading someone else’s code is extremely hard, but it is the only way to learn real bioinformatics.

It took me four long years to realize that I was no longer making any progress on my own. Things changed with this blog. In the back end, it runs as a Python app adapted from Nick Johnson. It took me a considerable amount of time to make my way through his code, but this is how I learned most of what I know about web apps. Later I became involved in Cross Validated, which was also the occasion to realize that I am not as good as I thought in statistics and that there is so much I can learn from others.

Final words

You probably noticed that this post is not a tutorial, I am not saying anything about which book to read, which Coursera class to take, or which website to visit. I do not have particular recommendations, there are tons of material online, and you need to find the ones you like. If you happen to know good sources, do not hesitate to suggest them in the comments below.

« | »

blog comments powered by Disqus

the Blog
Best of

the Lab
The team
Research lines

Blog roll
Simply Stats
Ivory Idyll
Bits of DNA