How to Develop an Intuition for Joint, Marginal, and Conditional Probability

Author: Jason Brownlee

Probability for a single random variable is straight forward, although it can become complicated when considering two or more variables.

With just two variables, we may be interested in the probability of two simultaneous events, called joint probability: the probability of one event given the occurrence of another event called the conditional probability, or just the probability of an event regardless of other variables, called the marginal probability.

These types of probability are easy to define but the intuition behind their meaning can take some time to sink in, requiring some worked examples that can be tinkered with.

In this tutorial, you will discover the intuitions behind calculating the joint, marginal, and conditional probability.

After completing this tutorial, you will know:

How to calculate joint, marginal, and conditional probability for independent random variables.
How to collect observations from joint random variables and construct a joint probability table.
How to calculate joint, marginal, and conditional probability from a joint probability table.

Discover bayes opimization, naive bayes, maximum likelihood, distributions, cross entropy, and much more in my new book, with 28 step-by-step tutorials and full Python source code.

Let’s get started.

How to Develop an Intuition for Joint, Marginal, and Conditional Probability
Photo by joiseyshowaa, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Joint, Marginal, and Conditional Probabilities
Probabilities of Rolling Two Dice
Probabilities of Weather in Two Cities

Joint, Marginal, and Conditional Probabilities

Calculating probability is relatively straight forward when working with a single random variable.

It gets more interesting when considering two or more random variables, as we often do in many real world circumstances.

There are three main types of probabilities that we may be interested in calculating when working with two (or more) random variables.

Briefly, they are:

Joint Probability. The probability of simultaneous events.
Marginal Probability. The probability of an event irrespective of the other variables.
Conditional Probability. The probability of events given the presence of other events.

The meaning and calculation of these different types of probabilities vary depending on whether the two random variables are independent (simpler) or dependent (more complicated).

We will explore how to calculate and interpret these three types of probability with worked examples.

In the next section, we will look at the independent rolls of two dice, and in the following section, we will look at the occurrence of weather events of to geographically close cities.

Probabilities of Rolling Two Dice

A good starting point for exploring joint and marginal probabilities is to consider independent random variables as the calculations are very simple.

The roll of a fair die gives a one in six (1/6) or 0.166 (16.666%) probability of a number 1 to 6 coming up.

P(dice1=1) = 1/6
P(dice1=2) = 1/6
P(dice1=3) = 1/6
P(dice1=4) = 1/6
P(dice1=5) = 1/6
P(dice1=6) = 1/6

If we roll a second die, we get the same probability of each value on that die. Each event for a die has an equal probability and the rolls of dice1 and diec2 do not affect each other.

P(dice1={1,2,3,4,5,6}) = 1.0
P(dice2={1,2,3,4,5,6}) = 1.0

First, we can calculate the probability of rolling an even number for dice1 as the sum of the probabilities of rolling a 2, 4, or 6, for example:

P(dice1={2, 4, 6}) = P(dice1=2) + P(dice1=4) + P(dice1=6)
P(dice1={2, 4, 6}) = 1/6 + 1/6 + 1/6

This is 0.5 or 50% as we might intuitively expect.

Now, we might consider the joint probability of rolling an even number with both dice simultaneously. The joint probability for independent random variables is calculated as follows:

P(A and B) = P(A) * P(B)

This is calculated as the probability of rolling an even number for dice1 multiplied by the probability of rolling an even number for dice2. The probability of the first event constrains the probability of the second event.

P(dice1={2, 4, 6} and dice2={2, 4, 6}) = P(dice1={2, 4, 6}) * P(dice2={2, 4, 6})

We know that the probability of rolling an even number of each die is 0.5, therefore the probability of rolling two even numbers is 3/6 or 0.5. Plugging that in, we get: 0.5 * 0.5 (0.25) or 25%.

Another way to look at this is to consider that rolling one die gives 6 combinations. Rolling two dice together gives 6 combinations for dice2 for each of the 6 combinations of dice1 or (6×6) 32 combinations. A total of 3 of the 6 combinations of dice1 will be even, and 3 of the 6 combinations of those will be even. That gives (3×3) 9 out of the 36 combinations as an even number of each die or (9/36 = 0.25) 25%.

Tip: If you are ever in doubt of your probability calculations when working with independent variables with discrete events, think in terms of combinations and things will make sense again.

We can construct a table of the joint probabilities based on our knowledge of the domain. The complete table is listed below with dice1 across the top (x-axis) and dice2 along the side (y-axis). The joint probabilities of each event for a given cell are calculated using the joint probability formula, e.g. 0.166 * 0.166 or 0.027 or about 2.777%.

1      2      3      4      5      6
1  0.027  0.027  0.027  0.027  0.027  0.027
2  0.027  0.027  0.027  0.027  0.027  0.027
3  0.027  0.027  0.027  0.027  0.027  0.027
4  0.027  0.027  0.027  0.027  0.027  0.027
5  0.027  0.027  0.027  0.027  0.027  0.027
6  0.027  0.027  0.027  0.027  0.027  0.027

This table captures the joint probability distribution of the events of the two random variables, dice1 and dice2. It is pretty boring, but we can use it to sharpen our understanding of joint and marginal probability of independent variables.

For example, the joint probability of rolling a 2 with dice1 and a 2 with dice2 can be read from the table directly as 2.777%. We can explore more elaborate cases, such as rolling a 2 with dice1 and rolling an odd number with dice2.

This can be read off as summing the values in the second column for rolling a 2 with dice1 and the first, third, and fifth rows for rolling an odd number with dice2.

P(dice1=2, dice2={1,3,5}) = 0.027 + 0.027 + 0.027

This comes out to be about 0.083, or about 8.333%.

We can also use this table to calculate the marginal probability. This is calculated as the sum of an entire column of probabilities for dice1 or a row of probabilities for dice2.

For example, we can calculate the marginal probability of rolling a 6 with dice2 as the sum of probabilities across the final row of the table. This comes out to be about 0.166 or 16.666% as we may intuitively expect.

Importantly, if we sum the probabilities for all cells in the table, it must equal 1.0. Additionally, if we sum the probabilities for each row, then the sum of these sums must equal 1.0. The same if we sum the probabilities in each column, then the sum of these sums too must equal 1.0. This is a requirement for a table of joint probabilities.

Because the events are independent, there is nothing special needed to calculate conditional probabilities.

P(A given B) = P(A)

For example, the probability of rolling a 2 with dice1 is the same regardless of what was rolled with dice2.

P(dice1=2 given dice2=6) = P(dice1=2)

In this way, conditional probability does not have a useful meaning for independent random variables.

Developing a table of joint probabilities is a helpful tool for better understanding how to calculate and explore the joint and marginal probabilities.

In the next section, let’s look at a more complicated example with dependent random variables.

Want to Learn Probability for Machine Learning

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Probabilities of Weather in Two Cities

We can develop an intuition for joint and marginal probabilities using a table of joint probabilities of events for two dependent random variables.

Consider the situation where there are two cities, city1 and city2. The cities are close enough that they are generally affected by the same weather, yet they are far enough apart that they do not get identical weather.

We can consider discrete weather classifications for these cities on a given day, such as sunny, cloudy, and rainy. When it is sunny in city1, it is usually sunny in city2, but not always. As such, there is a dependency between the weather in the two cities.

Now, let’s explore the different types of probability.

Data Collection

First, we can record the observed weather in each city over twenty days.

For example, on day 1, what was the weather in each, day 2, and so on.

Day | City1 | City2
1     Sunny   Sunny
2     Sunny   Cloudy
3
...

The complete table of results is omitted for brevity, we will make up totals later.

We can then calculate the sum of the total number of paired events that were observed.

For example, the total number of times it was sunny in city1 and sunny in city2, the total number of times it was sunny in city1 and cloudy in city2, and so on.

City 1 | City 2 | Total
sunny    sunny    6/20
sunny    cloudy   1/20
sunny    rainy    0/20
...

Again, the complete table is omitted for brevity, we will make up totals later.

This data provides the basis for exploring the probability of weather events in the two cities.

Joint Probabilities

First, we might be interested in the probability of weather events in each city.

We can create a table that contains the probabilities of the paired or joint weather events.

The table below summarizes the probability of each discrete weather for the two cities, with city1 defined across the top (x-axis) and city2 defined along the side (y-axis).

Sunny | Cloudy | Rainy
Sunny     6/20    2/20     0/20
Cloudy    1/20    5/20     2/20
Rainy     0/20    1/20     3/20

A cell in the table describes the joint probability of an event in each city, and together, the probabilities in the table summarize the joint probability distribution of weather events for the two cities.

The sum of the joint probabilities for all cells in the table must equal 1.0. Additionally, the sum of the sums across each row must equal 1.0, and the sum of the sums across each column must equal 1.0.

We can calculate the joint probability for the weather in two cities. For example, we would expect the joint probability of it being sunny in both cities at the same time as being high. This can be stated formally as:

P(city1=sunny and city2=sunny)

Or more compactly:

P(sunny, sunny)

We can read this off the table directly as 6/20 or 0.3 or 30%. A relatively high probability.

We can take this a step further and consider the probability of it not being rainy in the first city but having rain in the second city. We could state this as:

P(city1=sunny or cloudy and city2=rainy)

Again, we can calculate this directly from the table. Firstly, P(sunny,rainy) is 0/20 and P(cloudy,rainy) is 1/20. We can then add these probabilities together to give 1/20 or 0.05 or 5%. It can happen, but it is not likely.

The table also gives an idea of the marginal distribution of events. For example, we might be interested in the probability of a sunny day in city1, regardless of what happens in city2. This can be read from the table by summing the probabilities for city1 for sunny, e.g the first column of probabilities:

P(city1=sunny) = P(city1=sunny, city2=sunny) + P(city1=sunny, city2=cloudy) + P(city1=sunny, city2=rainy)

P(city1=sunny) = 6/20 + 1/20 + 0/20
P(city1=sunny) = 7/20

Therefore, the marginal probability of a sunny day in city1 is 0.35 or 35%.

We can do the same thing for city2 by calculating the marginal probability of an event across some or all probabilities in a row. For example, the probability of a rainy day in city2 would be calculated as the sum of probabilities along the bottom row of the table:

P(city2=rainy) = 0/20 + 1/20 + 3/20
P(city2=rainy) = 4/20

Therefore, the marginal probability of a rainy day in city2 is 0.2 or 20%.

The marginal probabilities are often interesting and useful, and it is a good idea to update the table of joint probabilities to include them; for example:

Sunny | Cloudy | Rainy | Marginal
Sunny     6/20    2/20     0/20    8/20
Cloudy    1/20    5/20     2/20    8/20
Rainy     0/20    1/20     3/20    4/20
Marginal  7/20    8/20     5/20    20/20

Conditional Probabilities

We might be interested in the probability of a weather event given the occurrence of a weather event in another city.

This is called the conditional probability and can be calculated using the joint and marginal probabilities.

P(A given B) = P(A and B) / P(B)

For example, we might be interested in the probability of it being sunny in city1, given that it is sunny in city2.

This can be stated formally as:

P(city1=sunny given city2=sunny) = P(city1=sunny and city2=sunny) / P(city2=sunny)

We can fill in the joint and marginal probabilities from the table in the previous section; for example:

P(city1=sunny given city2=sunny) = 6/20 / 8/20
P(city1=sunny given city2=sunny) = 0.3 / 0.4

This comes out to be 0.75 or 75%, which is intuitive. We would expect that if it is sunny in city2 that city1 should also be sunny most of the time.

This is different from the joint probability of it being sunny in both cities on a given day which has a lower probability of 30%.

It makes more sense if we consider it from the perspective of the number of combinations. We have more information in this conditional case, therefore we don’t have to calculate the probability across all 20 days. Specifically, we are assuming it is sunny in city2, which dramatically reduces the number of days from 20 to 8. A total of 6 of those days that were sunny in city2 were also sunny in city1, giving the fraction 6/8 or (0.75) 75%.

All of this can be read from the table of joint probabilities.

An important aspect of conditional probability that is often misunderstood is that it is not reversible.

P(A given B) != P(B given A)

That is the probability of it being sunny in city1 given that it is sunny in city2 is not the same as the probability of it being sunny in city2 given that it is sunny in city1.

P(city1=sunny given city2=sunny) != P(city2=sunny given city1=sunny)

In this case, the probability of it being sunny in city2 given that it is sunny in city1 is sunny is calculated as follows:

P(city2=sunny given city1=sunny) = P(city2=sunny and city1=sunny) / P(city1=sunny)
P(city2=sunny given city1=sunny) = 6/20 / 7/20
P(city2=sunny given city1=sunny) = 0.3 / 0.35
P(city2=sunny given city1=sunny) = 0.857

In this case, it is higher, at about 85.714%.

We can also use the conditional probability to calculate the joint probability.

P(A and B) = P(A given B) * P(B)

For example, if all we know is the conditional probability of sunny in city2 given city1 and the marginal probability of city2, we can calculate the joint probability as:

P(city1=sunny and city2=sunny) = P(city2=sunny given city1=sunny) * P(city1=sunny)
P(city1=sunny and city2=sunny) = 0.857 * 0.35
P(city1=sunny and city2=sunny) = 0.3

This gives 0.3 or 30% as we expected.

Summary

In this tutorial, you discovered the intuitions behind calculating the joint, marginal, and conditional probability.

Specifically, you learned:

How to calculate joint, marginal, and conditional probability for independent random variables.
How to collect observations from joint random variables and construct a joint probability table.
How to calculate joint, marginal, and conditional probability from a joint probability table.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Develop an Intuition for Joint, Marginal, and Conditional Probability appeared first on Machine Learning Mastery.

Go to Source