reliabilität validität paradoxon

Home ITSE \| Self-Evaluation		Das paradoxe Verhältnis von Reliabilität und Validität The "Attenuation Paradox"
Quelle: Engelhard G Jr. (1993). Reactions to the attenuation paradox. Rasch Measurement Transactions, 7:2 p.294. http://www.rasch.org/rmt/rmt72k.htm (28.6.13)		Reactions to the Attenuation Paradox G. Engelhard Jr. "Perhaps the most paradoxical aspect of the attenuation paradox is that Gulliksen, who appears to deserve credit for discovering it, failed to include any reference to it in his comprehensive summary of mental test theory" (Loevinger 1954 p. 501). In 1945, Gulliksen discovered that, under certain conditions, increasing the reliability of test scores decreases their validity. Professional reaction to the "attenuation paradox" of classical true-score theory (CTT) illustrates five typical reactions to challenges of familiar theories: 1) Gulliksen (1950) ignores the paradox because he decided that true-score theory provides useful results concerning test reliability anyway. All current psychometric texts follow Gulliksen's footsteps. 2) Lord (1952 p. 501) implies that the paradox is due to lack of skill on the part of psychometricians, rather than a deficient theory. He suggests a curvilinear index to produce a different summary of the relationship between reliability and validity. 3) Tucker (1946 p. 11) accepts the paradox in theory, but resists its implications for practice. "A result which seemed amazing was the low values of the item reliabilities which yielded best measurement... It is safer for the reliabilities to be too high." 4) Davis (1952 p.105) introduces an untestable hypothesis, that of "common sense" (Loevinger 1954 p.105) to save the theory. "It is not proper to deduce from Tucker's data that to obtain high test validity one should make items of low reliability. There is no inconsistency between high item reliability and efficient measurement." 5) Humphreys (1956) accepts the paradox and rejects true-score theory and its assumption that test scores are interval level data. He proposes an alternative theory based on the normal distribution in which test scores are ordinal. His theory explains and overcomes the attenuation paradox, but has its own anomalies. Though hardly a reaction to the attenuation paradox, Rasch theory does provide a useful perspective for understanding it, as will be shown in my next column. George Engelhard, Jr. References Davis F B (1952) Item analysis in relation to educational and psychological testing. Psychological Bulletin 49(2) 97-121 Gulliksen H (1945) The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika 10(2) 79-91 Gulliksen H (1950) Theory of mental tests. Wiley Humphreys L G (1956) The normal curve and the attenuation paradox in test theory. Psychological Bulletin 53(6) 472-3 Loevinger J (1954) The attenuation paradox in test theory. Psychological Bulletin 51 493-504 Lord F (1952) A theory of test scores. Psychometrika Monograph No. 7. Tucker L R (1946) Maximum validity of a test with equivalent items. Psychometrika 11(1) 1-13.
Quelle: G. Engelhard Jr, G. (1993). What is The Attenuation Paradox? Rasch Measurement Transactions, 6 (4), p. 257 http://www.rasch.org/rmt/rmt64h.htm (28.6.13)		What is The Attenuation Paradox? G Engelhard Jr "A paradoxical property of a test is a property such that the validity of the test is not a monotonic function of that property... Is it not intuitively valid, however, to demand that the most basic concept of psychometrics shall be a non-paradoxical property of tests? Reliability is paradoxical." J. Loevinger, 1954, pp. 500-501 The Attenuation Paradox was named by Loevinger (1954). It was recognized earlier by Gulliksen: "The criterion of maximizing test variance [reliability] cannot be pushed to extremes. Test variance is a maximum, if half of the population makes zero scores, and the other half makes perfect scores. Such a score distribution is not desirable for obvious reasons, yet current [true-score] test theory provides no rationale for rejecting such a score distribution. Obviously the best test score distribution is one which accurately reflects the true ability distribution in the group, but there is perhaps little hope of obtaining such a distribution by the current procedure of assigning a score based upon the sheer number of correct answers. At present the only solution to such difficulties seems to lie in some type of absolute scaling theory (Thurstone, 1925), to replace the number correct score" (1945 pp. 90-91). Gulliksen,however, ignores Thurstone and perpetuates the paradoxical true-score tradition: "In order to maximize the reliability and variance of a test, the items should have high inter-correlations, all items should be of the same difficulty level, and the level should be as near to 50% as possible" (1945 p. 79). Tucker (1946) provides an excellent analysis of the "inconsistencies between higher reliability and better measurement" (p.1). He observes that "if the reliability of the items were increased to unity, all correlations between the items would also become unity and a person passing one item would pass all items and another failing one item would fail all items. Thus the only possible scores are a perfect score or one of zero... Is a dichotomy of scores the best that can be expected from a test with items of equal difficulty?" (p. 2). Using scaling theory (in current terminology, a two-parameter item response theory model based on the normal ogive and random normal probabilities), Tucker shows how increasing test reliability must lead to decreasing test validity. Laudan, Laudan and Donovan (1988) describe seven empirically testable hypotheses regarding how scientists react to data-dominated empirical anomalies and theory-dominated conceptual paradoxes. The reactions of psychometricians to the attenuation paradox of true-score theory provide instructive case studies of how scientists function. In the next columns, I will examine how measurement theorists of the 1940s, 1950s and today react to the Attenuation Paradox. Gulliksen, H. (1945). The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika, 10(2), 79-91. Laudan, R., Laudan, L., & Donovan, A. (1988). Testing theories of scientific change. In A. Donovan, L. Laudan, & R. Laudan (Eds.), Scrutinizing science: Empirical studies of scientific change (pp. 3- 44). Dordrecht, The Netherlands: Kluwer Academic Publishers. Loevinger, J. (1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493-504. Thurstone, L.L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16(7), 433-451. Tucker, L.R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11(1), 1-13.
Engelhard G Jr. (1993). Reactions to the attenuation paradox. Rasch Measurement Transactions, 7:2 p.294		Reactions to the Attenuation Paradox G. Engelhard Jr. "Perhaps the most paradoxical aspect of the attenuation paradox is that Gulliksen, who appears to deserve credit for discovering it, failed to include any reference to it in his comprehensive summary of mental test theory" (Loevinger 1954 p. 501). In 1945, Gulliksen discovered that, under certain conditions, increasing the reliability of test scores decreases their validity. Professional reaction to the "attenuation paradox" of true-score theory illustrates five typical reactions to challenges of familiar theories: 1) Gulliksen (1950) ignores the paradox because he decided that true-score theory provides useful results concerning test reliability anyway. All current psychometric texts follow Gulliksen's footsteps. 2) Lord (1952 p. 501) implies that the paradox is due to lack of skill on the part of psychometricians, rather than a deficient theory. He suggests a curvilinear index to produce a different summary of the relationship between reliability and validity. 3) Tucker (1946 p. 11) accepts the paradox in theory, but resists its implications for practice. "A result which seemed amazing was the low values of the item reliabilities which yielded best measurement... It is safer for the reliabilities to be too high." 4) Davis (1952 p.105) introduces an untestable hypothesis, that of "common sense" (Loevinger 1954 p.105) to save the theory. "It is not proper to deduce from Tucker's data that to obtain high test validity one should make items of low reliability. There is no inconsistency between high item reliability and efficient measurement." 5) Humphreys (1956) accepts the paradox and rejects true-score theory and its assumption that test scores are interval level data. He proposes an alternative theory based on the normal distribution in which test scores are ordinal. His theory explains and overcomes the attenuation paradox, but has its own anomalies. Though hardly a reaction to the attenuation paradox, Rasch theory does provide a useful perspective for understanding it, as will be shown in my next column. Davis F B (1952) Item analysis in relation to educational and psychological testing. Psychological Bulletin 49(2) 97-121 Gulliksen H (1945) The relation of item difficulty and inter-item correlation to test variance and reliability. Psychometrika 10(2) 79-91 Gulliksen H (1950) Theory of mental tests. Wiley Humphreys L G (1956) The normal curve and the attenuation paradox in test theory. Psychological Bulletin 53(6) 472-3 Loevinger J (1954) The attenuation paradox in test theory. Psychological Bulletin 51 493-504 Lord F (1952) A theory of test scores. Psychometrika Monograph No. 7. Tucker L R (1946) Maximum validity of a test with equivalent items. Psychometrika 11(1) 1-13

© Copyright by Georg Lind. Letzte Änderung: 28.6.2013