How we change what others think, feel, believe and do

# Chi-square test

Explanations > Social ResearchAnalysis > Chi-square test

## Description

The chi-square (c2) test measures the alignment between two sets of frequency measures. These must be categorical counts and not percentages or ratios measures (for these, use another correlation test).

Note that the frequency numbers should be significant and be at least above 5 (although an occasional lower figure may be possible, as long as they are not a part of a pattern of low figures).

### Goodness of fit

A common use is to assess whether a measured/observed set of measures follows an expected pattern.

The expected frequency may be determined from prior knowledge (such as a previous year's exam results) or by calculation of an average from the given data.

The null hypothesis, H0 is that the two sets of measures are not significantly different.

### Independence

The chi-square test can be used in the reverse manner to goodness of fit. If the two sets of measures are compared, then just as you can show they align, you can also determine if they do not align.

The null hypothesis here is that the two sets of measures are similar.

The main difference in goodness-of-fit vs. independence assessments is in the use of the Chi Square table. For goodness of fit, attention is on 0.05, 0.01 or 0.001 figures. For independence, it is on 0.95 or 0.99 figures (this is why the table has two ends to it).

## Calculation

Chi-squared, c2 = SUM( (observed - expected)2 / expected)

c2 = SUM( (fo - fe)2 / fe )

...where fo is the observed frequency and fe is the expected frequency.

Note that the expected values may need to be scaled to be comparable to the observed values. A simple test is that the total frequency/count should be the same for observed and expected values.

In a table, the expected frequency, if not known, may be estimated as:

fe = (row total) x (column total) / n

...where n is the total of all rows (or columns).

The result is used with a Chi Square table to determine whether the comparison shows significance.

In a table, the degrees of freedom are:

df = (R - 1) * (C - 1)

...where R is the number of rows and C is the number of columns.

## Example

### Goodness of fit

English test grade distributions have changed from last year, with grade B's somewhat lower. Is this significant?

The table below shows the calculation. First, the expected values are created by scaling last year's results to be equivalent to this year. Then the test statistic is calculated as SUM((O - E)^2/E).

 English test results Grade A Grade B Grade C Grade D Grade E Sum This year, O 23 32 20 15 10 100 Last year 25 20 15 25 10 95 Scaled last year, E 26 21 16 26 11 100 (O - E) -3.3 10.9 4.2 -11.3 -0.5 (O - E)^2 11.0 119.8 17.7 128.0 0.3 (O - E)^2/E 0.4 5.7 1.1 4.9 0.0 12.1

Chi-square is found to be 12.1 and the degrees of freedom are (5-1) = 4 (there are five possible grades). Looking this up in the Chi Square table shows the probability is between 5% (9.49) and 1% (13.28), so H0 is adequately falsified and a significant change can be claimed.

### Independence

A year group in school chooses between drama and history as below. Is there any difference between boys' and girls' choices?

 Observed Chose drama Chose history Total Boys 43 55 98 Girls 52 54 106 Total 95 109 204 Expected = (row tot * col tot)/overall tot Chose drama Chose history Total Boys 45.6 52.4 98 Girls 49.4 56.6 106 Total 95 109 204 (observed - expected)^2/expected Chose drama Chose history Total Boys 0.2 0.1 Girls 0.1 0.1 Total 0.55

Chi-square is 0.55. There are (2-1)*(2-1) = 1 degree of freedom. Checking the Chi Square table shows 0.55 is between 0.004 and 3.84, so no conclusion can be drawn about independence or similarity between boys' and girls' choices.

## Reporting

Chi-square is reported in the following form:

c2 (3, N = 125) = 10.2, p = .012

Where:
3 - the degrees of freedom
125 - subjects in the sample
10.2 - the c2 test statistic
.012 - the probability of the null hypothesis being true

## Discussion

This test compares observed data with what we would expect to get (if the null hypothesis of no difference was true). It is based on the principle that if the two variables are not related (for example gender is not related to deafness) then measures applied to each variable will give similar results (for example about the same proportion of men and women being found to use a hearing aid), with any variation between the results being purely caused by chance. If the experimental measures are significantly different, then some relationship can be claimed.

A reason that percentages do not work is because they are fractions and low numbers will not work. In practice, you can often get away with percentages by converting them into larger numbers.

The measurement is unusual in that it has a square on numerator and a non-square on the denominator. Squaring removes negatives and exaggerates outliers. This increases the effect that chi-square has in analyzing the difference between two data sets.

Note that the test only reports whether two sets of figures are similar. It says nothing about the nature of the similarity.

A chi-gram is a bar-chart plot of a set of chi-square calculations and can visually show how chi-square varies across a set of related measurements.

Where variables are dichotomous (ie. can have only one of two values), then McNemar's Q is a similar test that is customized for this circumstance.

Note that this test is called the 'Chi-square' test, not 'Chi-squared'.

The Chi-square test is non-parametric.

### You can buy books here

And the big
paperback book ### And

© Changing Works 2002-
Massive Content — Maximum Speed