Wednesday, February 5, 2014

Test Scoring Mathematical Model

The seven statistics reviewed in previous posts need to be related to the underlying mathematics. Traditional multiple-choice (TMC) data analysis has been expressed entirely with charts and the Excel spreadsheet VESEngine. I will need a TMC math model to compare TMC with the Rasch model IRT that is the dominant method of data analysis for standardized tests.

A mathematical model contains the relationships and variables listed in the charts and tables. This post applies the advice in learning discussed in the previous post. It starts with the observed variables. The mathematical model then summarizes the relationships in the seven statistics.



The model contains two levels (Table 25). The first floor level contains the observed mark patterns. The second floor level contains the squared deviations from the score and item means; the variation in the mark patterns. The squared values are then averaged to produce the variance. [Variance = Mean sum of squares = MSS]

1. Count

The right marks are counted for each student and each item (question). TMC: 0-wrong, 1-right captures quantity only. Knowledge and Judgment Scoring (KJS) and the partial credit Rash model (PCRM) capture quantity and quality: 0-wrong, 1-have yet to learn this, 2-right.
Hall JR Count = SUM(right marks) = 20   
Item 12 Count = SUM(right marks) = 21  

2. Mean (Average)

The sum is divided by the number of counts. (N students, 22 and n items, 21)
The SUM of scores / N = 16.77; 16.77/n = 0.80 = 80%
The SUM of items / n = 17.57; 17.57/N = 0.80 = 80%

3. Variance

The variation within any column or row is harvested as the deviation between the marks in a student (row) or item (column) mark pattern, or between student scores, with respect to the mean value. The squared deviations are summed and averaged as the variance on the top level of the mathematical model (Table 25).
Variance = SUM(Deviations^2)/(N or n) = SUM of Squares/(N or n) = Mean SS = MSS

4. Standard Deviation

The variation within a score, item, or probability distribution expressed as a normal value that +/- the mean includes 2/3 of a normal, bell-shaped, distribution: 1 Standard Deviation = 1SD.

SD = Square Root of Variance or MSS = SQRT(MSS) = SQRT(4.08) = 2.02

For small classroom tests the (N-1) SD = SQRT(4.28) = 2.07 marks

The variation in student scores and the distribution of student scores are now expressed on the same normal scale.

5. Test Reliability

The ratio of the true variance to the score variance estimates the test reliability: the Kuder-Richardson 20 (KR20). The score (marginal column) variance – the error (summed from within Item columns) variance = the true variance.

KR 20 = ((score variance – error variance)/score variance) x n/1-n)
KR 20 = ((4.08 – 2.96)/4.08) x 21/20 = 0.29

This ratio is returned to the first floor of the model. An acceptable classroom test has a KR20 > 0.7. An acceptable standardized test has a KR20 >0.9.

6. Traditional Standard Error of Measurement

The range of error in which 2/3 of the time your retest score may fall is the standard error of measurement (SEM). The traditional SEM is based on the average performance of your class: 16.77 +/- 1SD (+/- 2.07 marks).

SEM = SQRT(1-KR20) * SD = SQRT(1- 0.29) * 2.07 = +/-1.75 marks

On a test that is totally reliable (KR20 = 1), the SEM is zero. You can expect to get the same score on a retest.

7. Conditional Standard Error of Measurement

The range of error in which 2/3 of the time your retest score may fall based on the rank of your test score alone (conditional on one score rank) is the conditional standard error of measurement (CSEM). The estimate is based (conditional) on your test score rather than on the average class test score.

CSEM = SQRT((Variance within your Score) * n number of questions) = SQRT(MSS * n) = SQRT(SS)
CSEM = SQRT(0.15 * 21) = SQRT(3.15) = 1.80 marks

The average CSEM values (1.75) for all of your class (light green) also yields the test SEM. This confirms the above calculation for 6. Traditional Standard Error of Measurement for the test.

This mathematical model (Table 25) separates the flat display in the VESEngine into two distinct levels. The lower floor is on a normal scale. The upper floor isolates the variation within the marking patterns on the lower floor. The resulting variance provides insight into the extent that the marking patterns could have occurred by luck on test day and into the performance of teachers, students, questions, and the test makers. Limited predictions can also be made.

Predictions are limited using traditional multiple-choice (TMC) as students have only two options: 0-wrong and 1-right. Quantity and quality are linked into a single ranking. Knowledge and Judgment Scoring (KJS) and the partial credit Rasch model (PCRM) separate quantity and quality: 0-wrong, 1-have yet to learn, and 2-right. Students are free to report what they know and can do accurately, honestly, and fairly.

- - - - - - - - - - - - - - - - - - - - - 

Free software to help you and your students experience and understand how to break out of traditional-multiple choice (TMC) and into Knowledge and Judgment Scoring (KJS) (tricycle to bicycle):



No comments:

Post a Comment