Bayes' Theorem and Bayesian Inference

Interactive Bayesian Grid

Prior probability (Prevalence of disease)
Likelihood (Sensitivity in disease testing)
False positive rate

Question:

What is Bayes' formula?
In the grid below, each dot represents a client. Shaded dots indicate clients who tested positive. \(H\) (or red dots) indicates infected clients.
From the grid, identify and calculate the following:
- Prior probability
- Likelihood
- False positive rate
- Marginal Likelihood
- Posterior probability
Verify your result with the Bayes' formula.

\(H\)	🔴	🔴	🔴	🔴	🔴	🔴	🔴	🔴	🔴	🔴
\(\neg H\)	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢
	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢	🟢

Explanations

The best way to memorize Bayes' Theorem is by rearranging terms from the definition of conditional probability.

Always remember

\(P(A|B) = \frac{P(A \cap B)}{P(B)}\)

This definition allows us to "split" the \(\cap\) to either condition \(A\) or \(B\)

\(P(A \cap B) = P(A|B)P(B)\)\(P(A \cap B) = P(B|A)P(A)\)

Using the above properties on \(H\) and \(E\):

\( \begin{aligned} P(H|E) &= \frac{P(H \cap E)}{P(E)} \\ &= \frac{P(E|H) \cdot P(H)}{P(E)} \\ \end{aligned} \)

Or:

\(P(H|E) = P(H) \cdot \frac{P(E|H)}{P(E)}\)

This theorem allows us to:

flip \(P(H|E)\) to \(P(E|H)\), which is usually easier to find.
"update" the probability of a hypothesis from new evidence, if you think of \(H\) as the Hypothesis we want to prove, and \(E\) as the Evidence we obtained.

Example: Patient Testing Positive for a Disease

A patient is going to have a test of a certain disease.

\(H\)ypothesis: The patient has a disease.
\(E\)vidence: The patient is tested positive.

Let's assume:

\(P(H) = 1\%\) (1% prevalence). The probability of hypothesis prior to evidence, or **prior probability**, or just **prior**.
\(P(E|H) = 98\%\) (98% sensitivity). The probability of evidence given that the hypothesis is true, which is called the **likelihood**
\(P(E|\neg H) = 5\%\) (5% false positive rate). The probability of evidence given that the hypothesis is wrong. It is the **false positive rate**.

The **posterior (probability)**, \(P(H|E)\), which is the probability of hypothesis posterior to evidence, is given by:

\(P(H|E) = P(H) \cdot \frac{P(E|H)}{P(E)}\)

Note that to calculate \(P(E)\), the **marginal likelihood**, we need to split into 2 (or more, depending on the question) cases, either H is true or false, and add up the probabilities.

\( \begin{aligned} P(H|E) &= P(H) \cdot \frac{P(E|H)}{P(E|H) \cdot P(H) + P(E|\neg H) \cdot P(\neg H)} \\ &= 1\% \cdot \frac{98\%}{98\% \cdot 1\% + 5\% \cdot 99\%} \\ &\approx 1\% \cdot 16.53 \\ &\approx 16.53\% \end{aligned} \)

So, given a positive test result, the probability of having the disease is approximately 16.53%.

This is very counter-intuitive. You are tested positive, but the chance of having the disease is only 16.53%! Why is that?

The reason is that the prior probability \(P(H) = 1\%\) of the hypothesis, or the **base rate**, is actually very low here. Our evidence increased it to \(16.53\) times, but it is still low.

Failing to consider the base rate is called the **base rate fallacy**. This is a very common cognitive bias because, while the probabilities are hard to calculate, "how representative is this event of the hypothesis" is much easier to answer, and we humans are likely to switch to an easier question unconsciously. It is related to a concept called the **representativeness heuristic**.