Explore counter-intuitive probability problems including the Birthday Paradox, Monty Hall Problem, and dice simulations using NumPy.
Probability can be surprisingly counter-intuitive. This guide explores famous problems that challenge our intuition and shows how to solve them using both simulation and analytical methods.
n_rolls = 20_000# First rollfirst_rolls = np.array([np.random.choice(dice) for _ in range(n_rolls)])# Second rollsecond_rolls = np.array([np.random.choice(dice) for _ in range(n_rolls)])# Sum both rolls (vectorized operation)sum_of_rolls = first_rolls + second_rollsprint(f"Mean of sum: {np.mean(sum_of_rolls):.2f}")print(f"Variance of sum: {np.var(sum_of_rolls):.2f}")print(f"Covariance:\n{np.cov(first_rolls, second_rolls)}")# Plot histogramsns.histplot(sum_of_rolls, stat="probability", discrete=True)plt.title("Sum of Two Dice Rolls")plt.show()
Why is Covariance Near Zero?
The two rolls are independent - the result of the first roll doesn’t affect the second. Therefore, covariance is approximately zero.
Simulate unfair dice where one side has higher probability:
def load_dice(n_sides, loaded_number): # Equal initial probabilities probs = np.array([1/(n_sides+1) for _ in range(n_sides)]) # Double probability for loaded side probs[loaded_number-1] = 1 - sum(probs[:-1]) # Verify probabilities sum to 1 assert np.isclose(sum(probs), 1), "Probabilities must sum to 1" return probs# Get probabilities for dice loaded toward side 2probs_loaded = load_dice(6, loaded_number=2)# Visualize probabilitiessns.barplot(x=dice, y=probs_loaded)plt.title("Loaded Dice Probabilities")plt.ylim(0, 0.5)plt.show()
Fair Dice
Loaded Dice
# All sides have equal probabilityprobs_fair = np.array([1/6] * 6)
# Use loaded probabilitiesrolls = np.array([ np.random.choice(dice, p=probs_loaded) for _ in range(20_000)])
Simulate scenarios where the second roll depends on the first:
n_rolls = 20_000first_rolls = np.array([np.random.choice(dice) for _ in range(n_rolls)])second_rolls = np.array([np.random.choice(dice) for _ in range(n_rolls)])# Only count second roll if first roll >= 4second_rolls = np.where(first_rolls >= 4, second_rolls, 0)sum_of_rolls = first_rolls + second_rollsprint(f"Covariance:\n{np.cov(first_rolls, second_rolls)}")sns.histplot(sum_of_rolls, stat="probability", discrete=True)plt.title("Dependent Dice Rolls")plt.show()
Now covariance is not zero! The second roll depends on the first, creating correlation.
Let P(n) be the probability at least one student matches predefined day D.For one student: P(Sic)=1−3651 (doesn’t match)For n independent students:P(n)=1−(1−3651)nFor P(n)≥0.5:n≥ln(365/364)ln(2)≈253
You need 253 people for a 50% chance of matching a specific date!
Let Q be the probability NO two people match:Q=1⋅365364⋅365363⋯365365−(n−1)Desired probability: P=1−QUsing approximation 1−x≈e−x:Q≈e−730n(n−1)For P≥0.5:n(n−1)≥730ln(2)≈505Solving: n≥23
Only 23 people needed for a 50% chance that two share a birthday! This surprises most people.
def monty_hall(switch): # All doors have goats initially doors = np.array([0, 0, 0]) # 0 = goat, 1 = car # Randomly place car winner_index = np.random.randint(0, 3) doors[winner_index] = 1 # Player chooses door choice = np.random.randint(0, 3) # Host can open any door except choice and winner openable_doors = [i for i in range(3) if i not in (winner_index, choice)] door_to_open = np.random.choice(openable_doors) # If switching, pick the other unopened door if switch: choice = [i for i in range(3) if i not in (choice, door_to_open)][0] # Return 1 if won, 0 if lost return doors[choice]
Run many simulations:
# Simulate staying strategystay_wins = sum([monty_hall(switch=False) for _ in range(10000)])stay_rate = stay_wins / 10000# Simulate switching strategyswitch_wins = sum([monty_hall(switch=True) for _ in range(10000)])switch_rate = switch_wins / 10000print(f"Stay win rate: {stay_rate:.3f}")print(f"Switch win rate: {switch_rate:.3f}")
def generalized_monty_hall(switch, n=3, k=1): assert 0 <= k <= n-2, "k must be between 0 and n-2" doors = np.array([0] * n) winner = np.random.randint(0, n) doors[winner] = 1 choice = np.random.randint(0, n) openable_doors = [i for i in range(n) if i not in (winner, choice)] doors_to_open = np.random.choice(openable_doors, size=k, replace=False) if switch: choices = [i for i in range(n) if i != choice and i not in doors_to_open] choice = np.random.choice(choices) return doors[choice]
Analytical result:P(win∣switch)=nn−1⋅n−k−11Switching is always better unless k=0 (host opens no doors).