How we change what others think, feel, believe and do
If a test is unreliable, then although the results for one use may actually be valid, for another they may be invalid. Reliability is thus a measure of how much you can trust the results of a test.
Tests often have high reliability – but at the expense of validity. In other words, you can get the same result, time after time, but it does not tell you what you really want to know.
Stability is a measure of the repeatability of a test over time, that it gives the same results whenever it is used (within defined constraints, of course).
Test-retest reliability is the repeatability of test over time to get same results with the same person and needs to be done to assure the stability of a test. Stability, in this case, is the variation in the scores that is taken. Problems with this include:
There is an assumption with stability that what is being measured does not change. Variation should be due to the test, not to any other factor. Sadly, this is not always true.
Consistency is a measure of reliability through similarity within the test, with individual questions giving predictable answers every time.
Consistency can be measured with split-half testing and the Kuder-Richardson test.
Split-half testing measures consistency by:
A problem with this is that the resultant tests are shorter and can hence lose reliability. Split-half is thus better with tests that are rather long in the first place.
Use Spearman-Brown’s formula to correct problems of shortness, enabling correlation as if each part were full length:
r = (2rhh)/(1 + rhh)
(Where rhh is correlation between two halves)
Kuder-Richardson reliability or coefficient alpha
The Kuder-Richardson reliability or coefficient alpha is relatively simple to do, being based on one administration of the test. It assesses inter-item consistency of test by looking at two error measures:
It assumes reliable tests contain more variance and are thus more discriminating. Higher heterogeneity leads to lower inter-item consistency.
For right/wrong scores that are non-dichotomous items:
Rkk = k / (k – 1(1 – Σσ2i/σ2t))
Where Rkk is alpha coefficient of test, k is
number of items, σ2i is item variance, σ2t is test
Equivalence of results (parallel form)
Seeks reliability through equivalence between two versions of the same test, comparing results from each version of test (like split-half). It is better than test-retest as it can be done the same day (reducing variation).
There is a danger of tests with high internal validity having limited coverage (and hence lower final validity).
Bloated specifics are where similar questions lead to apparent significance. This can be bad when unintended, but can be used to create deliberate variations.
Parallel versions are useful in such situations as with graduates who may do the same test
There are a number of procedural aspects that affect test reliability, including:
Kaplan, R.M. and Saccuzzo, D.P. (2001). Psychological Testing: Principle, Applications and Issues (5th Edition), Belmont, CA: Wadsworth