The count of right marks on a test is the raw material fed into statistical calculations. All right marks do not have the same value or meaning; though traditional multiple-choice (TMC) ignores this fact (see prior posts). The following model, operating in both a perfect world and also with real data, will do the same.
Able and inspired students and teachers see mastery as their goal. In this perfect world example, all of these students receive the same test score (85%). There is no variation in the scores.
Unable and uninspired students and teachers see passing as their goal. In this perfect world, all of these students receive the same test score (65%). There is no variation in the scores.
In a perfect world with 10 passing the test and 10 mastering the lesson, a new statistic appears (the mean or average of the 20 scores) of 75%. Each test score is 10 points away from (above or below) the class average score of 75%.
Even though no student earned a score of 75%, this value represents the on-average score for the entire test. This model score distribution from 20 students, in no way, looks like the normal curve; the distribution expected when it includes random error.
Random error injects variation into test results. Let’s say one lucky student scored 90% right (an increase of 5%) instead of 85%. To keep the example balanced would require one unlucky student to score 60% right (a decrease of 5%) instead of 65%. This would stretch out the distribution (Chart 5).
But stretching increases the variation in the distribution. The increase in variation can be balanced by two students scoring 70% (an increase of 5%) instead of 65% and another two students scoring 80% (a decrease of 5%) instead of 85%.
[It takes moving two scores closer to the mean to balance one score further from the mean since the variation is expressed in squared values. Score counts change linearly from the mean, such as, 1, 2, 3, 4, 5, but the values for deviation from the mean change as squared values, such as, 1, 4, 9, 16, 25.
Squaring was resorted to so all values are positive, but it results in a distorted distribution. The distance between 2 and 4 is a difference of 2. The squared deviations from 4 to 16, vary a difference of 12.]
Doubling the amount of error (Chart 6) brings the score distribution closer to the normal distribution of error (the normal curve). Again the standard deviation remains 10. The distribution now looks more like traditional multiple-choice classroom test results. A bi-modal distribution was very common in my remedial biology class. The score distribution can be made to look even more like the normal curve by tweaking additional clusters of scores.
The normal curve does not describe the actual observed score distribution. The normal curve always views a distribution through the lens of three points: the mean, plus 1 SD and minus 1 SD.
A small SD means the distribution is short. A large SD means the distribution is more spread out.
The SD is never concerned with the location of your individual test score. Plus and minus 1 SD on the score scale is the region where about 2/3 of the test scores are expected to fall. There is no way to specifically predict where your score will actually fall, only the region in which it will fall. To find your test score, you must take the test.
The Nursing124 test data (Table 2) will now be used to apply the above concepts. In Chart 7, the normal curve includes 15 of the 22 scores within one SD of the mean. That is 2/3 or 68%, which is the same as the most expected value of 68%.
[I learned from Chart 8 that a calculated normal curve for discriminating items ignores the extreme values of 20% and 40% as well as zero percent and 100%, however these extreme values are the main contributors when calculating the SD in the next post. The actual distribution has been reduced to a numerical abstraction. I used the Excel function NORM.DIST that only refers to the mean and the SD.]
The uniqueness of each mark, student score, and item difficulty has now been reviewed. Unless some strongly biasing factor is involved, most factors are ignored using traditional multiple-choice (TMC). Provision is made in Break Out (Sheet 2), and PUP 5.22 (Table 2), to edit and rescore the test when an item is found to just be too bad to use or a spirited class discussion earns a point for everyone on the item. Otherwise, the only thing that counts using TMC is right and wrong: 1 and 0.
[PCM values counts as 0, 1, and 2 for wrong, judgment, and right counts. KJS values counts as 0, 0.5, and 1 for wrong, judgment, and right counts. Both scoring methods maintain the same value ratio for wrong, judgment and right counts. Both promote student development of high quality judgment.
TMC uses 0, 0.25, 1 for four-option items but this fact is hidden by forcing students to mark all items or accept a 0 for blank. This promotes guessing. Knowldge Factor uses 0, 0.75, and 1 which inverts the value for TMC judgment. This demands high quality judgment in high-risk occupations and in serious preparation for standardized tests.]
The normal curve can only be accurately drawn from large score distributions. It can be calculated for tests of any size, based on the test mean and SD.
- - - - - - - - - - - - - - - - - - - - -
Free software to help you and your students experience and understand the change from TMC to KJS (tricycle to bicycle):