How we change what others think, feel, believe and do
When designing and using tests and other methods of assessing people, it is important that the test and its use is valid.
Validity has been described as 'the agreement between a test score or measure and the quality it is believed to measure' (Kaplan and Saccuzzo, 2001). In other words, it measures the gap between what a test actually measures and what it is intended to measure.
This gap can be caused by two particular circumstances:
(a) the design of the test is insufficient for the
intended purpose, and
Face validity is that the test appears to be valid. This is validated using common-sense rules, for example that a mathematical test should include some numerical elements.
A test can appear to be invalid but actually be perfectly valid, for example where correlations between unrelated items and the desired items have been found. For example, successful pilots in WW2 were found to very often have had an active childhood interest in flying model planes.
A test that does not have face validity may be rejected by test-takers (if they have that option) and also people who are choosing the test to use from amongst a set of options.
A test has content validity if it sufficiently covers the area that it is intended to cover. This is particularly important in ability or attainment tests that validate skills or knowledge in a particular domain.
Content under-representation occurs when important areas are missed. Construct-irrelevant variation occurs when irrelevant factors contaminate the test.
Underlying many tests is a construct or theory that is being assessed. For example, there are a number of constructs for describing intelligence (spatial ability, verbal reasoning, etc.) which the test will individually assess.
Constructs can be about causes, about effects and the cause-effect relationship.
If the construct is not valid then the test on which it is based will not be valid. For example, there have been historical constructs that intelligence is based on the size and shape of the skull.
Criterion-related validity is like construct validity, but now relates the test to some external criterion, such as particular aspects of the job.
There are dangers with the external criterion being selected based on its convenience rather than being a full representation of the job. For example an air traffic control test may use a limited set of scenarios.
Concurrent validity is measured by comparing two tests done at the same time, for example a written test and a hands-on exercise that seek to assess the same criterion. This can be used to limit criterion errors.
Predictive validity, in contrast, compares success in the test with actual success in the future job. The test is then adjusted over time to improve its validity.
The validity coefficient is calculated as a correlation between the two items being compared, very typically success in the test as compared with success in the job.
A validity of 0.6 and above is considered high, which suggests that very few tests give strong indications of job performance.
Kaplan, R.M. and Saccuzzo, D.P. (2001). Psychological Testing: Principle, Applications and Issues (5th Edition), Belmont, CA: Wadsworth