Wednesday, November 30, 2011

Smart Wallpaper Testing

The idea for wallpaper came from a simple fact. Students need protection from predatory testing. Know or not, they must mark an answer to each question. Birds fly in flocks and fish swim in schools. They do the same thing at the same time to avoid predators. Wallpaper lets students mark the same option when they cannot use the test to report what they know.

Two wallpaper patterns can be used to extract higher levels of thinking (Smart Testing) information. Dumb wallpaper is based on one of the answer options. Smart wallpaper can be based on the most frequent wrong mark for each question, for example.  Dumb wallpaper pays no attention to student performance. Smart wallpaper is based on expected student performance.

Wallpaper extracts higher levels of thinking (Smart Testing) information using Knowledge and Judgment Scoring (KJS). The assumption is that students omit or use the wallpaper pattern when not using the question to report what is known and trusted. This can be seen in the progression from KJS without wallpaper, Table 3bST, 

KJS with Dumb wallpaper, Table 3bSD,

and KJS with Smart wallpaper, Table 3bSS.

The student counseling mark matrix analysis (the test taker view of the test) changes from nonsense, to a better performance with Dumb wallpaper, to a typical Knowledge and Judgment Scoring (KJS) printout with Smart wallpaper.

Test scores increase as the simulated quality increases. The distributions (Standard Deviations) of scores and item difficulty decrease. Test reliability declines!  Oops!  “Houston, we have a problem!” Test companies optimize (brag about) their test reliability based on poor quality data. KJS optimizes student judgment to produce accurate, honest, and fair data.

This table clearly captures this conflict in numbers. High test reliability is needed to obtain similar consecutive average test scores. It follows the lower the quality of student scores and the lower the average test score, the more chance determines the average test score. It is also known that the normal curve is highly reproducible by chance alone. High test reliability can become an artifact of test design rather than student performance.

To the fact that the starting score on a multiple-choice test is 1/(number of options) rather than zero, we can now add a second form of self-deception (psychometricians refer to these as simplifications). They made some sense when everything was done with paper and pencil. Today there is no need to still lock quality and quantity together on a multiple-choice test, especially now that one (KJS) can measure what students actually know and trust rather than just rank students (RMS).

The misconceptions in Table 3bST are artifacts created by forcing students to mark when they have no answer of their own. They were not given the option to omit (to mark an accurate, honest and fair answer sheet). Table 3bSS, using Smart wallpaper, shows all four groups of questions (expected, discriminating, guessing, and misconception – EDGM). Higher quality students earn higher test scores that are more accurate, honest and fair.
The scores in Table 3bSS are only obtainable if students omitted instead of marking the most frequent wrong mark for each question. This simulation fails to capture what students would actually do, if given the opportunity to only mark, when marking reports something they know and trust (can confirm).  Given that opportunity, some quality scores would be higher and some lower. Also there is no way to know which wrong mark will be the most frequently marked for each question. Wallpaper must be created BEFORE the test, not after the test.

This simulation again demonstrates there is no way of equating RMS and KJS results from one set of data. To know what students actually know they must be give the opportunity to report what they know that is meaningful and useful as the basis for further learning, instruction, and use on the job. Traditional RMS only does this when test scores are near 90%. Knowledge and Judgment Scoring (Smart Testing) yields a valid quality score (%RT) for every test score, a valid test score for every high quality (%RT) student performance.

Saturday, November 26, 2011

Wallpaper Modified Testing

The minimum requirement for traditional multiple-choice tests is to mark one option on each question, right mark scoring (RMS). The student is not given the option to omit. The test score indicates luck, guessing, and what the student may know. The score only ranks the student. After experiencing Knowledge and Judgment Scoring (KJS) my students called traditional testing Dumb Testing. Dumb Testing is easy and fast. Reading all the test questions is optional.

Smart Testing (KJS) requires that each question stem is read and visualized (a web of relevant relationships) before looking at the answer options. If the student’s answer matches one of the answer options, that option is probably the right answer for the question. The student has brought something to the test that can be reported using this question.

Knowledge and judgment can be given equal value. The test score is a combination of the knowledge and judgment scores (the quantity and quality scores). Forced guessing is not required. The result is an accurate, honest and fair test score.

Changing from Dumb Testing to Smart Testing requires some experience. This is much like changing from a tricycle to a bicycle. It is scary the first few times. After that it is fun. Over 90% of students voluntarily switch from Dumb Testing to Smart Testing after two experiences.

Until Smart Testing is offered on NCLB standardized tests, there is a way to modify Dumb Testing to obtain Smart Testing information. It comes from wallpapering the answer sheet. It requires a third key (WP KEY   ) for the wallpaper.

The trick is to assign one answer option on each question as the “omit” option BEFORE seeing the test. Students mark only if they can trust the answer to be correct. Instead of, “mark a best answer on each question”, now students only, “mark answers you can use to report what you trust you know or can do”. Near the end of the test, they fill in the remaining marks following the wallpaper design.

The simplest design is the age-old advice: “If you do not know an answer, just mark C”. Any letter option can be selected for the class PRIOR to seeing the test. 

The next most frequent design students have used is the “Christmas tree”: ABCDABCD . . .  and AABABCABCDAAB . . .. Random designs can be used if the pattern is posted for all the students to use at the end of the test.

Wallpapering does not change Dumb Testing (RMS) test scores. Changing a wrong mark to a wallpapered omit is still “wrong” with traditional right mark scoring (RMS).  

Right Mark Scoring clicker data with no wallpaper.   

Right Mark Scoring clicker data with Dumb wallpaper (based on any single answer option).

Right Mark Scoring clicker data with Smart wallpaper (based on student judgment).

Commercial testing companies can still score the tests to produce traditional Dumb Testing student and school rankings.

Wallpapering does change Smart Testing (KJS) test scores. Power Up Plus (PUP) then extracts quantity and quality Smart Testing values (including test maker and test taker views). (See next post.)