The Alaska Reading Standards Based Assessments contain three
features worthy of a star. In 2011, they show a matched comparison analysis
that provides an insight into the dynamic nature of student assessment. In
2001, they also contain traditionally set cut scores and questions that are
easy enough to provide actually measurement of what students know and can do.

ONE STAR: Alaska recorded the scores of students who obtain
an increased, decreased or the same score (stable) this year as last year on
the reading test for 2008-2009, 2009-2010 and 2010-2011 in a matched
comparison analysis. The charts present static and dynamic views.

The portion of students in the Far Below Proficient and
Below Proficient Stable group remained the same for all three comparisons. The
portion of students in the Proficient and Advanced Stable group show a very
small decline from year to year. The portion of students showing a decrease in
performance matched the portion showing an increase in performance. This is a
static view.

The dynamic view shows much more is going on in this
assessment system. The reason the two above Stable views were stable is that about
the same number of students who tested Below Proficient last year, this year
tested Proficient (improved in proficiency), and the same number who tested Proficient
last year, tested Below Proficient this year (decreased in proficiency).

This balanced exchange also took place between Proficient
and Advanced levels of performance. In total, about 26% of all students changed
proficiency levels each year (about 6% of the students crossed each of the two
cut scores in both directions).

There are several reasons for this churning. The most
obvious is variation in student preparation from year to year (any one set of
questions will match one portion of the students better than the rest of the
examinees). Another is how lucky each student was on test day. This brings up
test design.

TWO STARS: The Alaska test compares student performance
(norm-referenced). This is the most common and least expensive way to create a
standardized test. It also forces students to mark answers even when they
cannot read or understand the questions. This is called right count scoring,
the traditional way of scoring classroom tests. It produces a score that can be
used to validly rank student performance.

THREE STARS: The 2001
Alaska Technical Report, page 18, shows the average test scores for Reading
ranged from 67% to 72% for grades 3, 6, and 8. Scores above 60% can indicate
what students actually know and can do rather than their luck on test day. (The
publication of average raw test scores is now considered essential to permit
validation of the test results and comparison with other states using the same Common
Core State Standards test.) [The Spring 2006
Alaska Standards Based Assessments, Chapter 8, did not list the average raw
test scores:

**no star**.]
SCORE VARIATION: The 2001
report, page 25, also shows the standard error of measurement (SEM), an
estimate of where each student’s score would land on the cut score divided
distribution, if the student could repeat the test. The example for Reading
grade level 3 shows that 2/3rds of the time the repeated test scores of student
“A” would fall within the range of 388 and 442 scale score units (415 original
score +-27 SEM). That is 27/351 or 7.7% of the test mean, or 27/600 or 4.5% of
the full-scale score. (The SEM is derived from the test reliability and the
standard deviation in scale score units. A smaller, more desired, SEM can be
produced by a higher test reliability and a lower standard deviation.)

The standard deviation, of the raw scores and the scale
scores, provides a more direct view of the variation in the student test scores,
page 18. The standard deviation is the sum of the deviations of each student
score from the test mean, that is squared, and is then divided by the number of
scores (variance) which is then returned to a normal number by obtaining the
square root (squaring makes all the deviations positive values otherwise they
would add up to zero).

The average standard deviation for the nine, grade 3, 6, and
8, test raw scores was 8.8/30.1 or 29% of the test means; that is, 2/3rds of
the time a student with an average score of 30.1 would be expected to have
repeated test scores fall between 30.1 +-8.8 or 21.3 to 38.9 on a test with 42
points total. Converting all of this into log ratio (logit) units used by
psychometricians produces slightly different results.

The average standard deviation for the nine, grade 3, 6, and
8, test scale scores was 83/349 or 24% of the test means; that is 2/3rds of the
time a student with an average scale score of 349 would be expected to have
repeated scale scores fall between 349 +- 83 or 266 to 432 on a scale score
range of 500 points (100 to 600).

Both SEM and standard deviations show a large amount of
uncertainty in test scores. The documentation of this churning is worth a

**third star**. This inherent variation in an attempt to capture student performance in a number accounts for much of the churning observed from year to year. Scoring these tests for quantity and quality instead of just counting right marks would yield much more useful information in line with the philosophy of the Common Core State Standards.
THREE OTHER STARS: Alaska places emphasis on cut scores on a
single score distribution (norm-referenced). Nebraska (see previous post) places
emphasis on two other score distributions (two stars): It groups scores both by
asking questions needed to assess specific knowledge and skills
(criterion-referenced) and by teacher judgment into which group each student
they know well fits. Cut scores fall where a student score has an equal
probability of falling into either group.

Both Alaska and Nebraska have yet to include student
judgment in their assessments (one star). When that is done, Alaska will have
an accurate, honest, and fair test that better matches the requirements of the
Common Core State Standards.

Most right marks will also represent right answers instead
of luck on test day and less churning of student performance rankings. The
level of thinking used by students on the test and in the classroom can also be
obtained. All that is needed is to give students the

**option**to continue**guessing**or to**report**what they trust they know.
* Mark
every question even if you must guess. Your judgment of what you know and can
do (what is meaningful, useful, and empowering) has no value.

** Only mark to report what you trust you know or can do.
Your judgment and what you know have equal value (an accurate, honest, and fair
assessment).

Including student judgment will add student development (the
ability to use all levels of thinking) to the Alaska test. The Common Core State
Standards needs students who know and can do, but also who have experienced
judgment in applying knowledge and skills.

Routine use of quantity and quality scoring in the classroom
promotes student develop. It promotes the sense of responsibility and reward
needed to learn at all levels of thinking, a requirement of the Common Core
State Standards.

Software to do quantity and quality scoring has been
available for over two decades. Alaska is already using Winsteps. Winsteps
contains the partial credit
Rasch model routine that scores quantity and quality.

Power Up Plus (PUP) scores multiple-choice tests by both
methods: traditional right count scoring and Knowledge
and Judgment Scoring. Students can elect which method they are most
comfortable with in the classroom and in preparation for Alaska and Common Core
State Standards standardized tests.

Starting in 2005, Knowledge Factor has a patented learning
system that guarantees student
development. High quality students generally pass standardized tests. All
three programs promote the sense of responsibility and reward needed to learn
at all levels of thinking, a stated requirement of the Common Core State
Standards movement.

Please encourage
Nebraska to allow students to report
what they trust they know and what they trust they have yet to learn. Blog.
Petition. We need
to foster innovation wherever it may take hold.

## No comments:

## Post a Comment