A Review of Two Books about Standardized Testing

NOTE: The article below is mirrored from the JALT Testing & Evaluation SIG website.

SHIKEN: JALT Testing & Evaluation SIG Newsletter Vol. 11 No. 1. Mar. 2007. (p. 26 - 29) [ISSN 1881-5537]

A Review of Two Books about Standardized Testing

Standardized Minds:
The High Price of America's Testing Culture and What We Can Do To Change It
by Peter Sacks (1999)
New York: Da Capo Press (Perseus Books)

The Truth about Testing: An Educator's Call to Action
by W. James Popham (2001)
Alexandria, VA: Association for Supervision and Curriculum Development

Both these books share a common stance with respect to standardized testing. Though focusing on assessment practices in the United States, they raise issues relevant to testing worldwide. After systematically pointing out how prevalent testing policies undermine the ideals of egalitarian education and deep learning, ways that high-stakes educational assessment should take place are suggested.

Sacks, a journalist-turned-professor best known for Generation X Goes to College, starts off by raising a rhetorical question: when it comes to testing, what factors account for the "profound disconnection between knowledge and practice" (p. xi)? Despite ample evidence that most standardized tests undermine overall educational quality, the multi-billion dollar educational assessment enterprise remains firmly entrenched in the USA. Sacks and Popham urge citizens to examine closely what's taking place under the banner of "accountability through assessment". The avowed goals of many assessment programs, they contend, run curiously counter to their actual outcomes. Popham does not believe in any conspiracy to "dumb down" the general population so that they become easier to manipulate. He simply feels that most educators and lawmakers lack assessment literacy and at one point contends:

. . . if educators were to be completely candid, most of us would probably admit that our understanding of educational measurement doesn't extend much beyond the care and feeding of teacher-made classroom tests." (p. 27)

Sacks is less forgiving. He suggests testing is entrenched because it's an efficient way to promote control and surveillance in the educational system as well as for schools to display their social status. He regards standardized tests as status markers and thinly disguised forms of social Darwinism. Sacks quotes a university dean as saying:

Low-income kids are largely being hammered by testing. It's an old song that SAT scores correlate with the number of cylinders in the family car or the number of books on the family bookshelf. (p. 262)

[ p. 26 ]

Test Misuse

Sacks and Popham both devote ample space to pointing out the ills associated inappropriately using high-stakes tests. These include a narrowing of curricular focus, promoting unsound skill-and-drill instructional activities, prejudiced judgments about students/teachers/schools with low scores, as well as outright cheating by teachers/school administrators in an effort to raise their baseline scores. One subtle difference is that Popham doesn't feel that state-mandated high-stakes tests per se are bad, but simply that off-the-shelf variety of standardized tests are poorly-designed. "If all high-stakes tests were properly constructed, we'd find that a high-stakes testing program would typically have a positive effect on educational quality" (p. 1) he contends. Popham should be in a position to know: he once designed a number of state-mandated literacy tests.

Popham in particular cautions that standardized tests should not be used to measure teacher effectiveness or school quality. One reason is the frequent mismatch between local curricular content and standardized test content. "If you look carefully at what the items in a standardized test are actually measuring," Popham adds, you'll often find that half or more of what's tested wasn't even supposed to be taught in a particular district or state" (p. 43) the author contends. He likens the use of standardized tests by to one-size-fits-all clothing: such merchandise ill-fits large portions of the population.

A second reason standardized tests often lack validity has to do with washback. Popham conjectures that a curious interplay takes place between teachers and test developers:

. . . the more important the content, the more likely teachers are to stress it. The most that teachers stress important content, the better students will do on an item measuring that content. But the better students do on an item, the more likely it is that an item will disappear from the test. (p. 48)

The reasoning behind this phenomena has to do with item facility (IF). Test items with poor IFs tend to be dropped since they adversely impact score distributions. Market forces impel test developers to create tests with high reliability indices, for which well-spread score distributions are requisite.

"Although the both books are arguably dated, the central themes they raise are as relevant today as ever."

Both authors point out standardized tests are poor measures of teacher effectiveness or educational quality because factors other than instruction impact test performance. Socio-economic status and student IQ in particular appear to be salient factors. Popham shows how many so-called "objective" test items actually contain subtle biases favoring high-income social groups and/or those with strong verbal and mathematical skills. Sacks hints at the mind-numbing effects of such tests by suggesting:

Speeded, multiple-choice tests well serve the entrenched system of passive learning. Indeed, when learning becomes passive it is easily standardized. The ecology of the American merit system places most value on people with particular thinking styles that shine on fast-paced, logical, and reflexive tasks. The merit system devalues individuals who strive to deeply understand and prefer to create something new rather than repeat something already told to them. (p. 219)

Sacks even suggests that most standardized tests are akin to thinly veiled IQ tests. However, since the nature of intelligence itself is not yet well understood, it's unclear what standardized tests are in fact measuring. In both author's view one point stands amply clear: standardized tests are generally ineffective predictors of academic success.

[ p. 27 ]

Appropriate High-Stakes Testing

How should high stakes testing be conducted? Sacks echoes Hart (1994) and Ryan (1995) in calling for more authentic performance-based assessment which, depending on the context, would include portfolios or evidence of what students can do. He criticizes most tests for focusing too much on what students cannot do and trivializing performance. As a result, Sacks suggests:

In recent years . . . educators ha[ve] noticed many, many students were bored and disengaged from academics, believing school work was completely irrelevant to their particular circumstances and aspirations. (p. 240)

This process of disengagement from school is, in Sacks' view, partly because of shoddy testing-driven instruction in which students lack any sense of ownership in what they're learning. Sacks argues in favor of more project-based learning which requires students to master certain skills in order to complete projects. Instead of focusing on bottom-up instruction, he concurs with Reich (1990) in advocating more problem-based, top-down learning that involves multiple skills to accomplish lifelike tasks. Mentioning how schools across the USA have incorporated such learning models into their curricula, Sacks concedes that teachers themselves are often adverse to adopting such learning approaches since their teaching approaches and philosophies are so deeply rooted in practice-drill modes of instruction. "Rooted in the very act of . . . doing authentic assessment", Sacks suggests "[is] a simple belief about children's great capacities to learn." (p. 247) and he emphasizes for project-based learning to work well, teacher education is essential.

Popham suggests four guidelines for creating "instructionally illuminating" large-scale achievement tests.

First, he believes tests should focus on a few high-priority items which can be successfully taught and accurately measured in the time frames available. Popham criticizes many education agencies for advocating too many unattainable standards which teachers are apt to ignore. It would be wiser, this former UCLA professor contends, to focus on a few core standards reflecting things ordinary teachers have time to teach and that short exams can in fact measure. In short, Popham wants test content to focus more on core skills rather than peripheral material.

A second, closely related point, is that Popham feels each item be subjected to a task analysis so that the skills typically required to answer any given question actually reflect skills which are supposedly taught in class. Many poorly constructed test items, in his view, draw on skills which learners are not presumed to have. Test designers, Popham asserts, should be required to "think instructionally" about what skills the test items actually draw on. Unfortunately, that is much easier said than done. Our knowledge of what sort of thinking protocols students go through to answer test items is fragmentary at best.

Third, Popham feels it is important for test developers to communicate more clearly with teachers about their tests. Teachers are often inadequately informed about test content specifications. "The better teachers understand the nature of the assessed content standard, the more effective they'll be at promoting that content standard." (p. 89) Popham maintains.

Finally, Popham calls for regular test review at a level of rigor appropriate for the test. For large-scale, high-stakes tests Popham suggests a panel of 15-25 of reviewers, with a few instructional specialists and curriculum experts, but mainly front-line teachers.

Without offering any evidence to support his claim, Popham then posits that "instructionally illuminating" tests will "help teachers teach better" (p. 102). Looking more closely at washback research by Qi (2004, pp. 171-190), however, that statement seems both naive and optimistic. Many teaching behaviors are fossilized and relatively immune to change. That is why Datnow, Hubbard, and Mehan (2002) contend that for lasting educational reform to work, fundamental shifts in the attitude towards testing are needed by all stakeholders in the system.

[ p. 28 ]

Conclusion

Are these books worth reading for foreign language teachers outside of the USA? There's little reason to go through either book cover-to-cover, but sections of both books have merit. In particular, Chapters 11 and 12 of Sacks' book are insightful and the Preface and Chapter 6 of Popham's book are stimulating. On the other hand, the final chapters of both works are of limited relevance to those outside America.

Although the both books are arguably dated, the central themes they raise are as relevant today as ever. Indeed, with this year's decision to renew standardized testing in Japan after a long hiatus (MEXT Curriculum Council, 1998), it's worth reflecting deeply on the politics and perils of large-scale mandated assessment.

- Reviewed by Tim Newfields
Toyo University

References

Datnow, A. Hubbard, L. & Mehan, H. (2002). Extending Educational Reform (Educational Change and Development Series). London: Routledge Falmer.

Hart, D. (1994). Authentic assessment. New York: Addison-Wesley.

Nakamura, A. (2007, April 21). National achievement test due next week. The Japan Times Online. Accessed April 27, 2007 at http://search.japantimes.co.jp/cgi-bin/nn20070421f1.html [Expired Link]

Sacks, P. (1996). Generation X goes to college. Chicago: Open Court.

Lemann, N. (1999). The big test. New York: Farrar, Straus, and Giroux.

Qi, L. (2004). Has a high-stakes test produced the intended changes? In L. Cheng & Y. Watanabe (Eds.). Washback in language testing: Research contexts and methods. Mahwah, N.J.: Laurence Erlbaum & Associates. pp. 171-190.

Reich, R. (1990). Redefining good education: Preparing students for tomorrow. In S. B. Bacharach (Ed.) Education reform: Making sense of it all. Boston: Allyn and Bacon.

Ryan, C. R. (1995). Authentic assessment. Westminster, CA: Teacher Created Materials.

NEWSLETTER: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Join

www.tnewfields.info/Articles/reviewST.htm

[ p. 29 ]

NOTE: The article below is mirrored from the JALT Testing & Evaluation SIG website. SHIKEN: JALT Testing & Evaluation SIG Newsletter Vol. 11 No. 1. Mar. 2007. (p. 26 - 29) [ISSN 1881-5537]

SHIKEN: JALT Testing & Evaluation SIG Newsletter Vol. 11 No. 1. Mar. 2007. (p. 26 - 29) [ISSN 1881-5537]