It is summer, Edgar and Sofia are comfortably sitting on the terrace, watching the beautiful light of the end of the day. Edgar starts:
“Let’s play a game to see who is the better statistician! Immanuel my cat will give each of us a secret number strictly greater than zero. The other person will have to guess it.”
“How are we going to guess?”
“Let’s say that the secret numbers are the means of some Poisson variable. We generate samples at random. The one who gets the closest estimate by dinner time wins.”
“That sounds easy! Will Immanuel give us the same number?”
“What is the fun in that? Let’s ask him to give two different numbers. You know what to do. Just give me your first sample whenever you are ready and I will try to guess your secret number.”
Immanuel whispers something in the ear of Sofia and then does the same with Edgar. Sofia opens her laptop and after a few keystrokes she says “The first number I have for you is 1.”
“OK, I give up. You win.”
Sofia is puzzled at first, but then she notices how Immanuel is rolling...
This post is the third part of a tutorial on t-SNE. The first part introduces dimensionality reduction and presents the main ideas of t-SNE. The second part introduces the notion of perplexity. The present post covers the details of the nonlinear embedding.
On the origins of t-SNE
If you are following the field of artificial intelligence, the name Geoffrey Hinton should sound familiar. As it turns out, the “Godfather of Deep Learning” is the author of both t-SNE and its ancestor SNE. This explains why t-SNE has a strong flavor of neural networks. If you already know gradient-descent and variational learning, then you should feel at home. Otherwise no worries: we will keep it relatively simple and we will take the time to explain what happens under the hood.
We have seen previously that t-SNE aims to preserve a relationship between the points, and that this relationship can be thought of as the probability of hopping from one point to the other in a random walk. The focus of this post is to explain what t-SNE does to preserve this relationship in a space of lower dimension.
I recently left Barcelona after spending nearly nine years in the company of wonderful people who supported me and helped me carry forward my teaching and my research. At a goodbye dinner, I was surprised that a friend insisted that I should explain how I learned math, that it would be useful and inspirational.
I had never thought about it. I guess what he meant is that I should explain why formal computational approaches are so important for me, given that I do not have any diploma in mathematics or statistics.
Doing the math
I have been doing one hour of math every day for 23 years. I will get to the how, but for now I want to discuss the why. Sticking to this schedule rigorously means that you do 365 hours of math every year. At the time I studied (in biology), you had to take ~500 hours of class per year for three years in order to graduate, so it was considered that you reach the graduate level after you study ~1500 hours. With one hour per day, it will take you a little more than four years.
At the graduate level, we had about...
When I established my lab and started to recruit people, I thought that it would be interesting to gather some information about what makes a good or a bad scientist. To this end, I designed a short questionnaire with eight questions. There was no right or wrong, nor even a preferred answer. Those were just questions to help me know the candidates better.
The first question was “What is the most important quality of a scientist?” I had no particular expectation. Actually, I did not even know my own answer to this question. As it turned out, most candidates answered that it was either creativity or persistence.
If you have been in science for even a short while, you know why this makes sense. We have complicated problems to solve, so creativity and persistence are important. Yet, I was not convinced that a good scientist is someone who is either very creative or very persistent. The reason is that neither of these qualities defines a scientist. Artists, politicians, business people, social workers and pretty much everyone else greatly benefits from being creative or persistent.
Having spent more time with scientists, I came to find the answer to my...
Literature discussions were usually very quiet in the laboratory, but somehow, this article had sparked a debate. Linda thought it was very bad. Albert liked it very much. Kate, the PI, was undecided. At some point the discussion stalled, so Kate made a move to wrap up.
“So, Linda, why do you think the article is bad?”
“Because they are missing a thousand controls.”
“OK. Albert, why do you like this article?”
“I find their model in figure 6 really cool. Actually, if it is true, it…”
“Precisely my point!” interrupted Linda. “It’s pure speculation!”
“Albert, you describe figure 6 as a model. What makes it a model?”
Albert spoke after a pause.
“It’s an idealized summary of their findings.”
“Fantasized you mean!” replied Linda.
Kate ignored the point and turned to Linda.
“Linda, do you think that figure 6 is a model?”
“Of course not! It’s just speculation.”
“Now I have a question for you Albert: what is the difference between a model and a summary?”
While Albert was thinking, Kate continued.
“And I also have a question for you Linda: what is the difference between speculation and assumption?”
Now they were...
In this post I explain what perplexity is and how it is used to parametrize t-SNE. This post is the second part of a tutorial on t-SNE. The first part introduces dimensionality reduction and presents the main ideas of t-SNE. This is where you should start if you are not already familiar with t-SNE.
What is perplexity?
Before you read on, pick a number at random between 1 and 10 and ask yourself whether I can guess it. It looks like my chances are 1 in 10 so you may think “no there is no way”. In fact, there is a 28% chance that you chose the number 7, so my chances of guessing are higher than you may have thought initially. In this situation, the random variable is somewhat predictable but not completely. How could we quantify that?
To answer the questions, let us count the possible samples from this distribution. We ask $(N)$ people to choose a number at random between 1 and 10 and we record their answers $((x_1, x_2, \ldots, x_N))$. The number 1 shows up with probability $(p_1 = 0.034)$ so the total in the sample is approximately $(n_1...
The story of the Kullback-Leibler divergence starts in a top secret research facility. In 1951, right after the war, Solomon Kullback and Richard Leibler were working as cryptanalysts for what would soon become the National Security Agency. Three years earlier, Claude Shannon had shaken the academic world by formulating the modern theory of information. Kullback and Leibler immediately saw how this could be useful in statistics and they came up with the concept of information for discrimination, now known as relative entropy or Kullback-Leibler divergence.
The concept was introduced in an oringinal article, and later expanded by Kullback in the book Information Theory and Statistics. It has now found applications in most aspects of information technologies, and most prominently artificial neural networks. In this post, I want to give an advanced introduction on this concept, hoping to make it intuitive.
The original motivation given by Kullback and Leibler is still the best way to expose the main idea, so let us follow their rationale. Suppose that we hesitate between two competing hypotheses $(H_1)$ and $(H_2)$. To make things more concrete, say that we have an encrypted message $(x)$ that may come from two possible...
Among the things that make science unique is the fact that scientists agree on what they say. There can be disagreement, but it is always understood as a temporary state, because either someone will be proven wrong, or new information will eventually reconcile everyone. Agreement is enforced in many ways, but pre-publication peer review is currently the dominant process, and it has been for over a century.
It is surprising that so little information is available about the efficiency of the peer review process. For instance, there is barely any justification as to why it is by default anonymous. Even more surprising is that people who express their opinion in this regard do not back it up with empirical evidence, because there is essentially no data. Let me clarify something: I do not have any data to show. But I have been signing my reviews for over seven years and I am happy to share this experience with those who wonder what happens when you do this.
How did it start?
I was first contacted by editors to review manuscripts at the time Stack Overflow eclipsed nearly all the forums on the Internet. The forums were supposed...
In this tutorial, I would like to explain the basic ideas behind t-distributed Stochastic Neighbor Embedding, better known as t-SNE. There are tons of excellent material out there explaining how t-SNE works. Here, I would like to focus on why it works and what makes t-SNE special among data visualization techniques.
If you are not comfortable with formulas, you should still be able to understand this post, which is intended to be a gentle introduction to t-SNE. The next post will peek under the hood and delve into the mathematics and the technical detail.
One thing we all agree on is that we each have a unique personality. And yet it seems that five character traits are sufficient to sketch the psychological portrait of almost everyone. Surely, such portraits are incomplete, but they capture the most important features to describe someone.
The so-called five factor model is a prime example of dimensionality reduction. It represents diverse and complex data with a handful of numbers. The reduced personality model can be used to compare different individuals, give a quick description of someone, find compatible personalities, predict possible behaviors etc. In many...
The key of success is to choose the right people for your team. That’s what everybody will tell you. But if you ask “how?”, things will get a little more complicated. Practitioners admit that this is a tough problem, and you cannot expect to win all the time. Google attempted to answer the question with the so-called Project Aristotle. After sifting a huge amount of data through the best algorithms, they concluded that you can ignore gender, age, culture, education and pretty much everything else... What makes a good team is not good people, it is good interactions.
This is great news, but it does not really answer the question of how to choose the right people for your team. How does one know that a candidate will nicely interact with the team? Those were the questions I had in mind when I had to assemble my scientific team. Like almost everyone, I did not have any kind of training in this regard and I was not really prepared. So I sought advice from everybody who would care to give me their opinion, may they be colleagues, friends or family members. In the process, I did...