Probability Fundamentals
Probability is the foundation of statistical inference. In this module, we’ll explore the core concepts using a real study about healthy habits in university students.Learning Objectives
By the end of this lesson, you will be able to:- Define and calculate basic probabilities from sample data
- Understand random events and their relationships
- Apply probability rules (union, intersection, complement)
- Interpret probability in the context of real research questions
Real-World Context: Student Health Study
Throughout these examples, we’ll use data from a simulated study of 150 university students examining the relationship between:
- Sleep hours and quality
- Physical activity levels
- Nutrition scores
- Stress levels and academic performance
What is Probability?
Probability measures the likelihood that an event will occur. It ranges from 0 (impossible) to 1 (certain). In statistical studies, we often estimate probabilities using relative frequencies:Example: Sleep Duration
In our student health dataset, we defined the event:- Event A: Student sleeps ≥ 7 hours per night
- 56 students sleep ≥ 7 hours
- P(A) = 56/150 = 0.373
Defining Random Events
A random event is an outcome (or set of outcomes) from a random phenomenon. In our study, we defined several events:| Event | Definition | Probability |
|---|---|---|
| A | Student sleeps ≥ 7 hours | P(A) = 0.373 |
| B | High physical activity level | P(B) = 0.173 |
| C | Healthy nutrition (score ≥ 7) | P(C) = 0.393 |
| D | Academic average ≥8.0 | P(D) = 0.207 |
Probability Rules
1. The Complement Rule
The complement of event A (written as A’) is the event “A does not occur”.- P(sleeps < 7 hours) = 1 - 0.373 = 0.627
- About 63% of students sleep less than the recommended amount
2. The Intersection Rule
The intersection of events A and B (A ∩ B) means both events occur together. Example: Students who BOTH:- Sleep ≥ 7 hours (A) AND
- Have high physical activity (B)
3. The Union Rule
The union of events A and B (A ∪ B) means at least one event occurs.Why do we subtract P(A ∩ B)?
Why do we subtract P(A ∩ B)?
When we add P(A) + P(B), we count the students who have BOTH characteristics twice. Subtracting P(A ∩ B) corrects for this double-counting.Think of it like a Venn diagram: the overlapping region shouldn’t be counted twice.
Probability Trees
Probability trees help visualize sequential events and calculate complex probabilities.Example: Sleep and Activity Combined
We can organize our events in stages: First branch: Sleep duration- Sleeps ≥ 7h: P = 0.373
- Sleeps < 7h: P = 0.627
- P(High activity | Sleeps ≥ 7h)
- P(Low activity | Sleeps < 7h)
Probability trees are especially useful when events occur in sequence or when we want to apply conditional probability rules.
Random Variables
A random variable assigns a numerical value to each outcome of a random phenomenon.Types of Random Variables
Discrete Random Variables
Take on specific, countable values. Examples from our study:- Age (18, 19, 20, … years)
- Nutrition score (0, 1, 2, …, 10)
- Quality of sleep (coded as mala=1, regular=2, buena=3)
Continuous Random Variables
Can take any value within a range. Examples from our study:- Sleep hours per night (can be 6.5, 7.2, 8.15, etc.)
- Stress score (scale 0-40, with decimal values)
- Academic average (0-10 scale with decimals)
Practice: Calculating Probabilities
Let’s work through a comprehensive example using our student health data.Scenario
You want to understand the relationship between nutrition and sleep quality. Events:- E: Healthy nutrition (score ≥ 7)
- F: Good sleep quality
- 59 students have healthy nutrition: P(E) = 59/150 = 0.393
- 27 students have good sleep quality: P(F) = 27/150 = 0.180
- 15 students have both: P(E ∩ F) = 15/150 = 0.100
-
What’s the probability a student has good sleep OR healthy nutrition?
-
What’s the probability a student has neither?
Key Takeaways
- Probability quantifies uncertainty using values from 0 to 1
- Sample proportions estimate population probabilities
- Probability rules (complement, intersection, union) help us calculate complex event probabilities
- Random variables assign numbers to outcomes, enabling statistical analysis
- Real-world context makes probability meaningful - always interpret results in context
Next Steps
Now that you understand probability fundamentals, you’re ready to explore probability distributions - mathematical models that describe how probabilities are distributed across possible values of a random variable.In the next module, we’ll see how variables like sleep hours and stress scores follow specific probability distributions (like the normal distribution), which form the foundation for hypothesis testing.
Additional Resources
- Review the study design and variable dictionary to understand the context
- Practice calculating probabilities with different event combinations
- Sketch Venn diagrams to visualize event relationships
- Consider how sampling variability affects probability estimates