Last revision: Jan. 8, 2019 (c) Georg Lind | Home
Why must the MCT not be used for high-stakes testing but for research and program evaluation purposes only?
The MCT has been designed to answer research questions like "What fosters moral judgment competence?" "How relevant ist moral judgment competence for other kinds of behavior like cheating, helping, learning or decision-making?" etc. And it has been designed for evaluating programs of moral and character education (see Lind, 2016) ... more.
The MCT has not been designed for, and must not be used for, selecting or sanctioning people or groups of people ("high stakes testing"). The latter use would be a clear instance of misuse ... more.
The main reason not to allow the MCT to be used for selection and sanctioning purposes is that the test would be rapidly de-valued as an research and evaluation instrument. Using the MCT for individual diagnosis and selection would quickly trigger activities to cheat the MCT and thus invalidate it as an instrument for research and program evaluation. See Bracey, 2006; Nichols & Berliner, 2006).
Because we have carefully protected the MCT against abuse, it was possible for over 30 years now to keep up its integrity.
The second reason is that we do not believe that it is possible to measure the moral judgment competence of a n individual person in a reliable way because its manifestation depends very much on situational factors like fatigue, involvement, prior experience etc.. In research and evaluation studies with groups of people, those sources of measurement error tend to cancel each other. According to the central limit theorem, error variance decreases as the size of the sub-sample increases (cf. Hays, 1963). I feel that the average C-scores of a sub-sample can be reliably interpreted as the "true" level of moral judgment competence if the subsample has at least a N = 10 or larger. (Note 1)
the measurement error within each sub-sample should be as small as possible,
the design of the study should make sure that the variance of the C-score in
the total sample is as large as possible to detect correlations if they
statistical reasoning should be used with great caution. As Hays (1963)
notes, "it is a sad fact that if one knows nothing about the probability
of occurrences for particular samples of units for observation, we can use very
little of the machinery of statistical inference. This is why the assumption
of random sampling is not to be taken lightly. ... Inferential methods [of statistics]
apply to random samples, and there is no guarantee of their validity in other
circumstances" (p. 217).
This precaution is especially important when talking about the "significance" of a finding. Usually, this word is used to signify statistical significance only, which has no direct relation to the psychological or educational significance, in which one is actually interested (see Carver, 1993; Thompson, 1996).
Bracey, G.W. (2006).Reading Educational Research: How to Avoid Getting Statistically Snookered. Heinemann.
Carver, R.P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61(4), 287-292.
Hays, W. (1963). Statistics for psychologists. New York: Holt, Rinehart & Winston.
Nichols, S.L. & Berliner, D. (2006). Collateral damage: How high-stakes testing corrupts schools. Cambridge, MA: Harvard Education Press.
B. (1996). AERA editorial policies regarding statistical significance testing:
three suggested reforms. Educational Researcher, 25(2), 26-30.