The 2012 NeSA Technical Report contains the information needed to complete (and make corrections on) the Grade 3 Reading Performance chart. The reported portion passing was 76% for Grade 3 (Nebraska Accountability/NeSA Reading/Grade 3).
The observed average score reported was 70%. The estimated expected score was about 66% [No calibration values were given for 10 fairly easy items that were at the beginning of the 2012 test].
The students did better in 2012 on a test that may have been more difficult than in 2011. [The lack of the calibration data on 10 easier items is critical to verifying what happened.]
[All students were presented with all 45 questions. Although the test was taken online, it was not a computer adapted test (CAT). The test design item difficulty was 65%, which is 15% above CAT design (50%).
“Experience suggests that multiple choice items are effective when the student is more likely to succeed than fail and it is important to include a range of difficulties matching the distribution of student abilities (Wright & Stone, 1979).” (2012 NeSA Technical Report, page 31)
The act of measuring should not alter the measurement. The Nebraska test seems to be a good compromise between what psychometricians want to optimize their calculations and what students are accustomed to in the classroom. CAT at 50% difficulty is not a good fit.]
Fifteen common (core) items were used in all three years: 2010, 2011, and 2012. They are remarkably stable. It testifies to the skill of the test creators to write, calibrate, and select items that present a uniform challenge over the three years.
It also shows that little has changed in the entire educational system (teach, learn, assess) with respect to these items. [Individual classroom successes are hidden in a massive collection of several thousand test results.]
My challenge to Nebraska to include student judgment on standardized tests resulted in about the same number of hits on this blog as the letters mailed. No other contact occurred.
This means that standardized testing will continue counting right marks that may have very different meanings. At the lowest levels of thinking, good luck on test day will be an important contributing factor for passive pupils to pass a test where passing requires a score of 58% on a scale with a mean at 70%.
Students able to function at higher levels of thinking but with limited opportunity to prepare for the test will not be able to demonstrate the quality of what they do know or can do. Both groups will be ranked by right marks that have very different meanings.
The improvement in reading seen in the lower Nebraska grades (Nebraska Accountability/NeSA Reading) failed to carry over into the higher grades. Effective teachers can deliver better prepared students functioning at lower levels of thinking at the lower grades. [Student quality becomes essential at higher levels of thinking in the higher grades.]
Typically the rate of increase in test scores decreases with each year (average Nebraska Grade 3 scores of 65%, 68% and 69% on the 15 common items) where classrooms and assessments function at lower levels of thinking. Students and teachers need to break out of this short-term-success trap.
[And state education officials need to avoid the temptation many took in the past decade of NCLB testing to produce results that looked right. It is this troubled past that makes the missing expected item difficulty values for 10 of the easier 2012 test items so critical.]
The Common Core State Standards movement is planning to avoid the short-term-success trap. Students are to be taught to be self-correcting: question, answers, and verify. Students are to be rewarded for what they know and can do and for their judgment in using that knowledge and skill.
Over the long term students are to develop the habits needed to be self-empowering and self-assessing. These habits function over the long term, in school and in the workplace. They provide the quality that is ignored with traditional right count multiple-choice tests. In school, if you do not reward it, it does not count.
The partial credit Rasch model and Knowledge and Judgment Scoring allow students to elect to report what they trust they know and can do as the basis for further instruction and learning. Quantity and quality are both assessed and rewarded.
Nebraska can still create a five star standardized test.
Seasons Greetings and a Happy New Year!