Professor: Howard B. Lee
Lecture Notes
Week 14: Chapter 9

Comparing Frequencies Using Chi Square
With chi square, we are looking at frequency counts, not scores as we did
in previous chapters.
Ex. The following require frequency data:
Ex.
Demographic Data:
| ethnic groups (%) | ethnic groups |
| 1990 | 2000 |
| LA County | LA County |
Toss coin 100 times:
The frequencies that you have observed as a result of the 100
tosses:
H/57, T/43
This "data" is empirical data gathered from an
experiment (like the kind of data you get in science when you perform an
experiment and then analyze the results to formulate a conclusion about that
which you have observed.)
In contrast to observed frequencies, expected or theoretical frequencies are what you think you will obtain when you conduct the experiment. The numbers we observe on the average if the null hypothesis is true.
ho: The coin is fair.
Ph = .5
Ph = Pt
The proportion of heads = the proportion of tails
h1: The coin is not fair.
Ph not = .5
Ph not = Pt
The theoretical or expected frequencies if the coin is in fact fair:
H/50, T/50
These expected frequencies are based on 100 tosses.
If the observed frequencies are not far away from the expected frequencies we can say that we have insufficient evidence of an unfair coin, and may need more information.
The test statistic for this problem:

Decision Rule:
alpha = .05
df = # of categories - 1 = "K - 1", where K is equal to
the number of categories.
Use table D on page 406.
Similar to the f
test, chi square is treated as a one-tailed, right-tailed test.
Chi square
value is NEVER negative.
For df = 1 and alpha = .05, the
critical value is 3.84. So the decision rule is to reject ho
if the Chi-Square test statistic is greater than 3.84, otherwise do not reject ho.
Decision: Since 1.96 is less than 3.84, Do not reject ho.
Conclusion: There is insufficient evidence of
an unfair coin.
Chi square is called "Goodness of Fit" because we want to see if observed frequencies fit the theoretical frequencies.
Ex. Given 200 consumers:
| Theoretical frequencies (percentages) | Observed frequencies |
|---|---|
| A = 38% | A = 80 |
| B = 27% | B = 50 |
| C = 35% | C = 70 |
| Total = 100% | Total = 200 |
Question: Does the observed data fit the theoretical data?
Percentages must be converted to frequencies by multiplying the percentage by the total number of consumers.
Expected (theoretical) frequencies:
A = 38% of 200 = 76
B = 27% of 200 = 54
C = 35% of 200 = 70
ho: Proportion of A = .38
ho: Proportion of B = .27
ho: Proportion of
C = .35
h1: Proportion of A not = .38
h1: Proportion of B not = .27
h1: Proportion of C not = .35
Chi square value:

Decision rule:
alpha = .05
df = K - 1 = 3 - 1 = 2, (where there were
3 categories - 1 = 2)
Using table D, find the critical value = 5.99
Reject ho if the chi square test statistic > 5.99, otherwise
do not reject ho.
Decision: Since 0.5068 < 5.99, do not reject ho.
Conclusion: There is insufficient evidence of the lack of fit, not enough evidence to refute the researchers proportional claims.
Contingency Tables with Chi Square
With "Pearson Product Moment Correlation"
x and y are not frequencies or categories.
With variables that are
categorical we need to use chi square to determine if they are related.
T-tests are used to determine if r is statistically significant, and is used with scored data.
Ex. Is gender related to political affiliation?
| Gender | Political |
|---|---|
| Male | D |
| Female | R |
These are
categorical data.
You must use chi square for a Contingency Table.
Data Table:
| People | Gender | Political |
|---|---|---|
| 1 | M | D |
| 2 | F | D |
| 3 | F | D |
| 4 | M | R |
| 5 | M | R |
| 6 | M | D |
| 7 | F | D |
| 8 | F | R |
| 9 | F | D |
| 10 | F | R |
| Demo | Repub | Total |
|---|
| Gender | Male | 2 | 2 | 4 |
| Female | 4 | 2 | 6 |
| Total | 6 | 4 | 10 |
Expected frequencies in a contingency table are computed using observed
frequency data.
| Political | Demo | Repub | Total |
|---|
| Gender | Male | 2 | 2 | R1= 4 |
| Female | 4 | 2 | R2= 6 |
| Total | C1 = 6 | C2 = 4 | N = 10 |
| Political | Demo | Repub | Total |
|---|
| Gender | Male | C1xR1/N = 2.4 | C2xR1/N = 1.6 | R1= 4 |
| Female | C1xR2/N = 3.6 | C2xR2/N = 2.4 | R2= 6 |
| Total | C1 = 6 | C2 = 4 | N = 10 |
In the expected frequency table the cells represent
counts one would expect if the two categorical variables are totally
unrelated.
The chi square says that if observed frequencies fit
the expected frequencies, we know that the variables are also not related or
are independent of one another.

Decision rule at alpha = .05.
df = (# rows - 1)(# columns - 1) = 1
Use table D for chi square critical value = 3.84
Reject ho if the chi square test statistic > 3.84, otherwise do not
reject ho.
Decision:Since 0.2778 < 3.84, do not reject ho.
Gender and politics are not related.
Is therapy and improvement related?
ho: Therapy and
improvement are not related (independent).
h1: Therapy and improvement
are related (dependent).
Observed data:
| Improvement | YES | NO | Total |
|---|
| Type | Therapy | 75 | 25 | R1= 100 |
| Placebo | 58 | 42 | R2= 100 |
| Total | C1 = 133 | C2 = 67 | N = 200 |
If you are interested in a relationship, use a contingency table.
Expected data:
| Improvement | YES | NO | Total |
|---|
| Type | Therapy | C1xR1/N = 66.5 | C2xR1/N = 33.5 | R1= 100 |
| Placebo | C1xR2/N = 66.5 | C2xR2/N = 33.5 | R2= 100 |
| Total | C1 = 133 | C2 = 67 | N = 200 |

Decision rule:
alpha = .05
df = 1, df = (# rows - 1)(# columns - 1)
Table D, critical value = 3.84
Reject ho if the chi square statistic is greater than 3.84,
otherwise do not reject ho.
Decision: Reject ho since 6.49 > 3.84
Conclusion: There is evidence that therapy is related to improvement.
What is the degree of this relationship?
Use Cramer's V

O < or = V < = 1
What is the magnitude of the relationship between
therapy and improvement? V = .18
This value is significantly different from zero due to our decision of reject ho for the hpothesis test.
