The last post stated, “Lower individual PBR values result from mixing right and wrong marks in an item pattern. Wider score distributions make possible longer item mark patterns.” I was curious about just how does this happen?
I marked Item 30 in Table 19 with five locations. The top location contained four right marks (1s). This location was then changed to wrong marks (0s) and the four right marks were moved one count below. A visual education statistics engine (VESE) table was developed. This process was then repeated in each of the three lower locations.
The above process took an item with an unmixed mark pattern (14 right and 26 wrong) and mixed wrong marks into four lower locations, each with a one right count lower score. I moved four marks as it took this many to get a measurable result with all six statistics with the standard deviation (SD) set at 4 or 10% on a test with 40 students and 40 items (Chart 40).
I did the same thing with the SD set at 2 or 5% (Chart 41) where the effect on lowering the item PBR is greater. But a SD of 5% is not a realistic value. The effect of mixing right and wrong marks would be even less with the SD set at 8 or 20% with 40 students and 40 items. My assumption, at this point, is that the mixing of right and wrong marks will be of little concern in large tests such as standardized traditional multiple-choice (TMC) tests.
Chart 42 shows an interesting observation. Mixing just one count makes no change in the individual PBR for item 30. The reason for this can be seen in Table 19. When a right mark with a related student raw score of 30 is mixing with the next lower location of 29, the math is 30 -1 = 29 and 29 + 1 = 30. The student scores do not change. The students getting the scores do change.
The deeper the mixing, the further the right marks are moved down the student score scale, the lower the individual PBR. But the individual PBR increases the further an unmixed mark pattern descends or lengthens, up to a point.
Items 26 to 31 in Table 19 show how this happens. An S-shaped or sigmoid curve is etched into Table 19 with bold 1’s. Each item is less difficult as you go from item 31 to 26 (0.25 to 0.75). Each mark pattern lengthens linearly.
[The number of mark patterns was 10 at 5% student score SD and 20 at 10% student score SD.]
The PBR and individual variance increase to a point and then decrease (Chart 43). That point is the 70% average student score set for the test. The test score sets the limit for individual item PBRs. In this table, based on optimum conditions, that is 0.73 PBR which provides plenty of room for classroom tests that generally run from 0.10 to 0.50.
Item 29 shows a difficulty of 0.45 and variance of 0.25. Item 28 shows a difficulty of 0.55 and a variance also of 0.25. They fall equidistant from the item difficulty mean of 20 or 0.50. The junction of mean student score and mean item difficulty set the PBR limit.
This has practical implications. The further away the average student score is from 50%, the lower the limit on item discrimination (PBR).
In Table 19 an unmixed marking pattern can only be 12 counts long before it decreases. If the test score had been 50%, the marking pattern could have been 20 counts long and the PBR 100% (as shown in previous posts).
This all comes back to the need for discriminating items to produce efficient tests; tests using the fewest items to rank students using TMC. The problem is, we do not create discriminating items. We can create items, but it is student performance that develops their PBR. This provides useful descriptive information from classroom tests. The development of PBR values is often distorted with standardized tests under conditions that range from pure gambling to being severely stressful.
It does not have to be that way. By offering Knowledge and Judgment Scoring (KJS), or its equivalent, students can report what they actually know and can do; what they trust as the foundation for further learning and instruction. The test then reveals student quantity and quality, misconceptions, the classroom level of thinking, and teacher effectiveness; not just a ranking.
Most students can function with high quality even though the quantity can vary greatly. The quality goal of the CCSS movement can be assessed using current efficient technology once students are permitted to make an individualized, honest and fair report of their knowledge and skills using multiple-choice; just like they do on most other forms of assessment.
- - - - - - - - - - - - - - - - - - - - -
Free software to help you and your students experience and understand how to break out of traditional-multiple choice (TMC) and into Knowledge and Judgment Scoring (KJS) (tricycle to bicycle):