Multiple-Choice Reborn: Visual Education Statistics

The visual education statistics engine (VESE) is now capable of producing a statistical signature for a course using traditional multiple-choice (TMC) and Knowledge and Judgment Scoring (KJS).

I selected two scenarios that explore three consecutive tests in each one. All items are set for maximum discrimination (right and wrong marks are not mixed). All student score distributions are normal. Both courses start with an average score of 50% and end with an average score of 70%. A standard deviation of 10% is considered normal and convenient for setting grades.

The first scenario is a class that starts with students of relatively equal abilities (Chart 36). As the course progresses the score distribution widens. This is the natural consequence of the better students doing better and the poorer students lagging behind; a typical result when using TMC that primarily only ranks students. [A good example of how evolution actually works: the self-empowered survive.]

The second scenario is a class that starts with students spread out widely (Chart 37). As the course progresses the score distribution narrows. This is the natural consequence of good student development; one of the results from switching from TMC to KJS where students are empowered to report what they actually know and trust as the basis for further instruction and learning.

The statistical signatures I found are Charts 38 and 39. In a traditional class the test reliability (KR20), the average item discrimination (PBR), the standard deviation (SD) and the standard error of measurement (SEM) all increased in value. The controlling factor was the spread of student scores.

The SD captures the spread of student scores. In these two scenarios the SD was set to increase or decrease with the average student score, as required by the score distributions in Charts 36 and 37. [The two signatures are not perfect continuations due to rounding errors and my inability to fit the 40 x 40 = 1600 marks under smooth normal curves.]

Individual item discrimination (PBR) is not the controlling factor as it has been set to the maximum for each item. [A visualization of individual item PBR and average item PBR is needed here. Lower individual PBR values result from mixing right and wrong marks in an item mark pattern. Wider score distributions (larger SDs) make possible longer item mark patterns. An item mark pattern is visualized in the next post.]

These statistical results are interesting. A traditional class ends with a test with increasing test reliability and a decreasing ability to separate student performance with the SEM. A class that ends with most students empowered (to question, to find answers, and to verify) shows low lower test reliability and an increasing ability to separate student performance with the SEM. This makes sense.

These two scenarios also shed light on teacher effectiveness. Both classes reached the traditional goal of mastery for schools designed for failure. The first, I would imagine, under the direction of traditional instruction aimed at the center of the class. The second would require either special attention to lower performing students or empowering most students to become self-correcting, high-achieving learners; the goal of the Common Core State Standards (CCSS) movement.

- - - - - - - - - - - - - - - - - - - - -

Free software to help you and your students experience and understand how to break out of traditional-multiple choice (TMC) and into Knowledge and Judgment Scoring (KJS) (tricycle to bicycle):