IB Math EE Guidance
4.11 Hypothesis testing
Contentspecific conceptual understandings
Formulation of null and alternative hypotheses, H0and H1.
Significance levels.
p values.
Students should express H0 and H1 as an equation or inequality, or in words as appropriate.
Expected and observed frequencies.
The χ2 test for independence: contingency tables, degrees of freedom, critical value.
The χ2 goodness of fit test.
In examinations:
• the maximum number of rows or columns in a contingency table will be 4
• the degrees of freedom will always be greater than one. At SL the degrees of freedom for the goodness of fit test will always be n…1
• the χ2 critical value will be given if appropriate
• students will be expected to use technology to find a p value and the χ2 statistic
• only questions on upper tail tests with commonlyused significance levels (1%, 5%, 10%) will be set
• students will be expected to either compare a p value to the given significance level or compare the χ2 statistic to a given critical value
• expected frequencies will be greater than 5.
Hand calculations of the expected values or the χ2statistic may enhance understanding.
If using χ2 tests in the IA, students should be aware of the limitations of the test for expected frequencies of 5 or less.
The t test.
Use of the p value to compare the means of two populations.
Using onetailed and twotailed tests.
In examinations calculations will be made using technology.
At SL, samples will be unpaired, and population variance will always be unknown.
Students will be asked to interpret the results of a test.
Students should know that the underlying distribution of the variables must be normal for the t test to be applied. In examinations, students should assume that variance of the two groups is equal and therefore the pooled twosample t test should be used.
What is the Chisquare test of independence?
The Chisquare test of independence is a statistical hypothesis test used to determine whether two categorical or nominal variables are likely to be related or not.
When can I use the test?
You can use the test when you have counts of values for two categorical variables.
Can I use the test if I have frequency counts in a table?
Yes. If you have only a table of values that shows frequency counts, you can use the test.
Using the Chisquare test of independence
The Chisquare test of independence checks whether two variables are likely to be related or not. We have counts for two categorical or nominal variables. We also have an idea that the two variables are not related. The test gives us a way to decide if our idea is plausible or not.
The sections below discuss what we need for the test, how to do the test, understanding results, statistical details and understanding pvalues.
What do we need?
For the Chisquare test of independence, we need two variables. Our idea is that the variables are not related. Here are a couple of examples:
 We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theater. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theater wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
 A veterinary clinic has a list of dog breeds they see as patients. The second variable is whether owners feed dry food, canned food or a mixture. Our idea is that the dog breed and types of food are unrelated. If this is true, then the clinic can order food based only on the total number of dogs, without consideration for the breeds.
For a valid test, we need:
 Data values that are a simple random sample from the population of interest.
 Two categorical or nominal variables. Don’t use the independence test with continous variables that define the category combinations. However, the counts for the combinations of the two categorical variables will be continuous.
 For each combination of the levels of the two variables, we need at least five expected values. When we have fewer than five for any one combination, the test results are not reliable.
Chisquare test of independence example
Let’s take a closer look at the movie snacks example. Suppose we collect data for 600 people at our theater. For each person, we know the type of movie they saw and whether or not they bought snacks.
Let’s start by answering: Is the Chisquare test of independence an appropriate method to evaluate the relationship between movie type and snack purchases?
 We have a simple random sample of 600 people who saw a movie at our theater. We meet this requirement.
 Our variables are the movie type and whether or not snacks were purchased. Both variables are categorical. We meet this requirement.
 The last requirement is for more than five expected values for each combination of the two variables. To confirm this, we need to know the total counts for each type of movie and the total counts for whether snacks were bought or not. For now, we assume we meet this requirement and will check it later.
It appears we have indeed selected a valid method. (We still need to check that more than five values are expected for each combination.)
Here is our data summarized in a contingency table:
Table 1: Contingency table for movie snacks data
Type of Movie  Snacks  No Snacks 
Action  50  75 
Comedy  125  175 
Family  90  30 
Horror  45  10 
Before we go any further, let’s check the assumption of five expected values in each category. The data has more than five counts in each combination of Movie Type and Snacks. But what are the expected counts if movie type and snack purchases are independent?
Finding expected counts
To find expected counts for each MovieSnack combination, we first need the row and column totals, which are shown below:
Table 2: Contingency table for movie snacks data with row and column totals
Type of Movie  Snacks  No Snacks  Row totals 
Action  50  75  125 
Comedy  125  175  300 
Family  90  30  120 
Horror  45  10  55 
Column totals  310  290  GRAND TOTAL = 600 
The expected counts for each MovieSnack combination are based on the row and column totals. We multiply the row total by the column total and then divide by the grand total. This gives us the expected count for each cell in the table. For example, for the ActionSnacks cell, we have:
125×310600=38,750600=65
We rounded the answer to the nearest whole number. If there is not a relationship between movie type and snack purchasing we would expect 65 people to have watched an action film with snacks.
Here are the actual and expected counts for each MovieSnack combination. In each cell of Table 3 below, the expected count appears in bold beneath the actual count. The expected counts are rounded to the nearest whole number.
Table 3: Contingency table for movie snacks data showing actual count vs. expected count
Type of Movie  Snacks  No Snacks  Row totals 
Action  50 65 
75 60 
125 
Comedy  125 155 
175 145 
300 
Family  90 62 
30 58 
120 
Horror  45 28 
10 27 
55 
Column totals  310  290  GRAND TOTAL = 600 
When using software, these calculated values will be labeled as “expected values,” “expected cell counts” or some similar term.
All of the expected counts for our data are larger than five, so we meet the requirement for applying the independence test.
Before calculating the test statistic, let’s look at the contingency table again. The expected counts use the row and column totals. If we look at each of the cells, we can see that some expected counts are close to the actual counts but most are not. If there is no relationship between the movie type and snack purchases, the actual and expected counts will be similar. If there is a relationship, the actual and expected counts will be different.
A common mistake with expected counts is to simply divide the grand total by the number of cells. For our movie data, this is 600 / 8 = 75. This is not correct. We know the row totals and column totals. These are fixed and cannot change for our data. The expected values are based on the row and column totals, not just on the grand total.
Performing the test
The basic idea in calculating the test statistic is to compare actual and expected values, given the row and column totals that we have in the data. First, we calculate the difference from actual and expected for each MovieSnacks combination. Next, we square that difference. Squaring gives the same importance to combinations with fewer actual values than expected and combinations with more actual values than expected. Next, we divide by the expected value for the combination. We add up these values for each MovieSnacks combination. This gives us our test statistic.
This is much easier to follow using the data from our example. Table 4 below shows the calculations for each MovieSnacks combination carried out to two decimal places.
Table 4: Preparing to calculate our test statistic
Type of Movie  Snack  No Snacks 
Action  Actual: 50 Expected: 64.58 
Actual: 75 Expected: 60.42 
Difference: 50 – 64.58 = 14.58
Squared Difference: 212.67 Divide by Expected: 212.67/64.58 = 3.29 
Difference: 75 – 60.42 = 14.58
Squared Difference: 212.67 Divide by Expected: 212.67/60.42 = 3.52 

Comedy  Actual: 125 Expected 155 
Actual 175 Expected 145 
Difference: 125 – 155 = 30
Squared Difference: 900 Divide by Expected: 900/155 = 5.81 
Difference: 175 – 145 = 30
Squared Difference: 900 Divide by Expected: 900/145 = 6.21 

Family  Actual: 90 Expected: 62 
Actual: 30 Expected 58 
Difference: 90 – 62 = 28
Squared Difference: 784 Divide by Expected: 784/62 = 12.65 
Difference: 30 – 58 = 28
Squared Difference: 784 Divide by Expected: 784/58 = 13.52 

Horror  Actual: 45 Expected 28.42 
Actual: 10 Expected 26.58 
Difference: 45 – 28.42 = 16.58
Squared Difference: 275.01 Divide by Expected: 275.01/28.42 = 9.68 
Difference: 10 – 26.58 = 16.58
Squared Difference: 275.01 Divide by Expected: 275.01/26.58 = 10.35 
Lastly, to get our test statistic, we add the numbers in the final row for each cell:
3.29+3.52+5.81+6.21+12.65+13.52+9.68+10.35=65.03
To make our decision, we compare the test statistic to a value from the Chisquare distribution. This activity involves five steps:
 We decide on the risk we are willing to take of concluding that the two variables are not independent when in fact they are. For the movie data, we had decided prior to our data collection that we are willing to take a 5% risk of saying that the two variables – Movie Type and Snack Purchase – are not independent when they really are independent. In statisticsspeak, we set the significance level, α, to 0.05.
 We calculate a test statistic. As shown above, our test statistic is 65.03.
 We find the critical value from the Chisquare distribution based on our degrees of freedom and our significance level. This is the value we expect if the two variables are independent.
 The degrees of freedom depend on how many rows and how many columns we have. The degrees of freedom (df) are calculated as:
df=(r−1)×(c−1)In the formula, r is the number of rows, and c is the number of columns in our contingency table. From our example, with Movie Type as the rows and Snack Purchase as the columns, we have:
df=(4−1)×(2−1)=3×1=3The Chisquare value with α = 0.05 and three degrees of freedom is 7.815.
 We compare the value of our test statistic (65.03) to the Chisquare value. Since 65.03 > 7.815, we reject the idea that movie type and snack purchases are independent.
We conclude that there is some relationship between movie type and snack purchases. The owner of the movie theater cannot estimate how many snacks to buy regardless of the type of movies being shown. Instead, the owner must think about the type of movies being shown when estimating snack purchases.
It’s important to note that we cannot conclude that the type of movie causes a snack purchase. The independence test tells us only whether there is a relationship or not; it does not tell us that one variable causes the other.
Understanding results
Let’s use graphs to understand the test and the results.
The sidebyside chart below shows the actual counts in blue, and the expected counts in orange. The counts appear at the top of the bars. The yellow box shows the movie type and snack purchase totals. These totals are needed to find the expected counts.
Figure 1: Bar chart showing the expected and actual counts for the different movie types
Compare the expected and actual counts for the Horror movies. You can see that more people than expected bought snacks and fewer people than expected chose not to buy snacks.
If you look across all four of the movie types and whether or not people bought snacks, you can see that there is a fairly large difference between actual and expected counts for most combinations. The independence test checks to see if the actual data is “close enough” to the expected counts that would occur if the two variables are independent. Even without a statistical test, most people would say that the two variables are not independent. The statistical test provides a common way to make the decision, so that everyone makes the same decision on the data.
The chart below shows another possible set of data. This set has the exact same row and column totals for movie type and snack purchase, but the yes/no splits in the snack purchase data are different.
Figure 2: Bar chart showing the expected and actual counts using different sample data
The purple bars show the actual counts in this data. The orange bars show the expected counts, which are the same as in our original data set. The expected counts are the same because the row totals and column totals are the same. Looking at the graph above, most people would think that the type of movie and snack purchases are independent. If you perform the Chisquare test of independence using this new data, the test statistic is 0.903. The Chisquare value is still 7.815 because the degrees of freedom are still three. You would fail to reject the idea of independence because 0.903 < 7.815. The owner of the movie theater can estimate how many snacks to buy regardless of the type of movies being shown.
Statistical details
Let’s look at the moviesnack data and the Chisquare test of independence using statistical terms.
Our null hypothesis is that the type of movie and snack purchases are independent. The null hypothesis is written as:
H0:Movie Type and Snack purchases are independent
The alternative hypothesis is the opposite.
H0:Movie Type and Snack purchases are not independent
Before we calculate the test statistic, we find the expected counts. This is written as:
Σij=Ri×CjN
The formula is for an i x j contingency table. That is a table with i rows and j columns. For example, E_{11 }is the expected count for the cell in the first row and first column. The formula shows R_{i} as the row total for the ith row, and C_{j} as the column total for the jth row. The overall sample size is N.
We calculate the test statistic using the formula below:
Σi,j=1n=(Oij−Eij)2Eij
In the formula above, we have n combinations of rows and columns. The Σ symbol means to add up the calculations for each combination. (We performed these same steps in the MovieSnack example, beginning in Table 4.) The formula shows O_{ij }as the Observed count for the ijth combination and E_{i}_{ j }as the Expected count for the combination. For the MovieSnack example, we had four rows and two columns, so we had eight combinations.
We then compare the test statistic to the critical Chisquare value corresponding to our chosen alpha value and the degrees of freedom for our data. Using the MovieSnack data as an example, we had set α = 0.05 and had three degrees of freedom. For the MovieSnack data, the Chisquare value is written as:
χ0.05,32
There are two possible results from our comparison:
 The test statistic is lower than the Chisquare value. You fail to reject the hypothesis of independence. In the moviesnack example, the theater owner can go ahead with the assumption that the type of movie a person sees has no relationship with whether or not they buy snacks.
 The test statistic is higher than the Chisquare value. You reject the hypothesis of independence. In the moviesnack example, the theater owner cannot assume that there is no relationship between the type of movie a person sees and whether or not they buy snacks.
Understanding pvalues
Let’s use a graph of the Chisquare distribution to better understand the pvalues. You are checking to see if your test statistic is a more extreme value in the distribution than the critical value. The graph below shows a Chisquare distribution with three degrees of freedom. It shows how the value of 7.815 “cuts off” 95% of the data. Only 5% of the data from a Chisquare distribution with three degrees of freedom is greater than 7.815.
Figure 3: Graph of Chisquare distribution for three degrees of freedom
The next distribution graph shows our results. You can see how far out “in the tail” our test statistic is. In fact, with this scale, it looks like the distribution curve is at zero at the point at which it intersects with our test statistic. It isn’t, but it is very, very close to zero. We conclude that it is very unlikely for this situation to happen by chance. The results that we collected from our movie goers would be extremely unlikely if there were truly no relationship between types of movies and snack purchases.
Figure 4: Graph of Chisquare distribution for three degrees of freedom with test statistic plotted
Statistical software shows the pvalue for a test. This is the likelihood of another sample of the same size resulting in a test statistic more extreme than the test statistic from our current sample, assuming that the null hypothesis is true. It’s difficult to calculate this by hand. For the distributions shown above, if the test statistic is exactly 7.815, then the p–value will be p=0.05. With the test statistic of 65.03, the p–value is very, very small. In this example, most statistical software will report the p–value as “p < 0.0001.” This means that the likelihood of finding a more extreme value for the test statistic using another random sample (and assuming that the null hypothesis is correct) is less than one chance in 10,000.