The precision of the average test score can be obtained from the math model in two ways: directly from the mean sum of squares (MSS) or variance, and traditionally, by way of the test reliability (KR20).
I obtained the precision of each individual student test score from the math model by taking the square root of the sum of squared deviations (SS) within each score mark pattern (green, Table 25). The value is called the conditional standard error of measurement (CSEM) as it sums deviations for one student score (one condition), not for the total test.
I multiplied the mean sum of squares (MSS) by the number of items averaged (21) to yield the SS (0.15 x 21 = 3.15 for a 17 right mark score) (or I could have just added up the squared deviations). The SQRT(3.15) = 1.80 right marks for the CSEM. Some 2/3 of the time a re-tested score of 17 right marks can be expected to fall between 15.20 and 18.80 (15 and 19) right marks (Chart 70).
The test Standard Error of Measurement (SEM) is then the average of the 22 individual CSEM values (1.75 right marks or 8.31%).
The traditional derivation of the test SEM (the error in the average test score) combines the test reliability (KR20) and the SD (spread) of the average test score.
The SD (2.07) is from the SQRT(MSS, 4.08) between student scores. The test reliability (0.29) is the ratio of the true variance (MSS, 1.12) to the total variance (MSS, 4,08) between student scores (see previous post).
The expectation is that the greater the reliability of a test, the smaller the error in estimating the average test score. An equation is now needed to transform variance values on the top level of the math model to apply to the lower linear level.
SEM = SQRT(1 – KR20) * SD = SQRT(1 – 0.29) * 2.07 = SQRT(0.71) * 2.07 = 0.84 * 2.07 = 1.75 right marks.
The operation of “1 – KR20” aligns the value of 0.71 to extract the portion of the SD that represents the SEM. If the test reliability goes up, the error in estimating the average test score (SEM) goes down.
Chart 70 shows the variance (MSS), the SS, and the CSEM based on 21 items, for each student score. It also shows the distribution of the CSEM values that I averaged for the test SEM.
The individual CSEM is highest (largest error, poorer precision) when the student score is 50% (Charts 65 and 70). Higher student scores yield lower CSEM values (better precision). This makes sense.
The test SEM (the average of the CSEM values) is related to the distribution of student test scores (purple dash, Chart 70). Adding easy items (easy in the sense that the students were well prepared) decreases error, improves precision, reduces the SEM.
- - - - - - - - - - - - - - - - - - - - -
The Best of the Blog - FREE
- The Visual Education Statistics Engine (VESEngine) presents the common education statistics on one Excel traditional two-dimensional spreadsheet. The post includes definitions. Download as .xlsm or .xls.
- This blog started seven years ago. It has meandered through several views. The current project is visualizing the VESEngine in three dimensions. The observed student mark patterns (on their answer sheets) are on one level. The variation in the mark patterns is on a second level.
- Power Up Plus (PUP) is classroom friendly software used to score and analyze what students guess (traditional multiple-choice) and what they report as the basis for further learning and instruction (knowledge and judgment scoring multiple-choice). This is a quick way to update your multiple-choice to meet Common Core State Standards (promote understanding as well as rote memory). Knowledge and judgment scoring originated as a classroom project, starting in 1980, that converted passive pupils into self-correcting highly successful achievers in two to nine months. Download as .xlsm or .xls. Quick Start