NOTE: The article below is mirrored from the JALT Testing & Evaluation SIG website.
SHIKEN: JALT Testing & Evaluation SIG Newsletter Vol. 11 No. 1. Mar. 2007. (p. 26 - 29) [ISSN 1881-5537]
A Review of Two Books about Standardized Testing
The High Price of America's Testing Culture and What We Can Do To Change It
by Peter Sacks (1999)
New York: Da Capo Press (Perseus Books)
The Truth about Testing: An Educator's Call to Action
by W. James Popham (2001)
Alexandria, VA: Association for Supervision and Curriculum Development
Both these books share a common stance with respect to standardized testing. Though focusing on assessment
practices in the United States, they raise issues relevant to testing worldwide. After systematically pointing out how prevalent
testing policies undermine the ideals of egalitarian education and deep learning, ways that high-stakes educational assessment
should take place are suggested.
Sacks, a journalist-turned-professor best known for Generation X Goes to College, starts off by raising a rhetorical question:
when it comes to testing, what factors account for the "profound disconnection between knowledge and practice" (p. xi)? Despite
ample evidence that most standardized tests undermine overall educational quality, the multi-billion dollar educational assessment
enterprise remains firmly entrenched in the USA. Sacks and Popham urge citizens to examine closely what's taking place
under the banner of "accountability through assessment". The avowed goals of many assessment programs, they contend, run
curiously counter to their actual outcomes. Popham does not believe in any conspiracy to "dumb down" the general population
so that they become easier to manipulate. He simply feels that most educators and lawmakers lack assessment literacy and at
one point contends:
. . . if educators were to be completely candid, most of us would probably admit
that our understanding of educational measurement doesn't extend much beyond
the care and feeding of teacher-made classroom tests." (p. 27)
Sacks is less forgiving. He suggests testing is entrenched because it's an efficient
way to promote control and surveillance in the educational system as well as for schools
to display their social status. He regards standardized tests as status markers and thinly
disguised forms of social Darwinism. Sacks quotes a university dean as saying:
Low-income kids are largely being hammered by testing. It's an old song that SAT
scores correlate with the number of cylinders in the family car or the number of
books on the family bookshelf. (p. 262)
Sacks and Popham both devote ample space to pointing out the ills associated inappropriately
using high-stakes tests. These include a narrowing of curricular focus, promoting unsound skill-and-drill
instructional activities, prejudiced judgments about students/teachers/schools with low scores, as well
as outright cheating by teachers/school administrators in an effort to raise their baseline scores.
One subtle difference is that Popham doesn't feel that state-mandated high-stakes tests per se are
bad, but simply that off-the-shelf variety of standardized tests are poorly-designed. "If all high-stakes
tests were properly constructed, we'd find that a high-stakes testing program would typically have a
positive effect on educational quality" (p. 1) he contends. Popham should be in a position to know:
he once designed a number of state-mandated literacy tests.
Popham in particular cautions that standardized tests should not be used to measure teacher effectiveness or school quality.
One reason is the frequent mismatch between local curricular content and standardized test content. "If you look carefully
at what the items in a standardized test are actually measuring," Popham adds, you'll often find that half or more of
what's tested wasn't even supposed to be taught in a particular district or state" (p. 43) the author contends. He
likens the use of standardized tests by to one-size-fits-all clothing: such merchandise ill-fits large portions of the population.
A second reason standardized tests often lack validity has to do with washback. Popham conjectures that a curious interplay takes
place between teachers and test developers:
. . . the more important the content, the more likely teachers are to stress it.
The most that teachers stress important content, the better students will do
on an item measuring that content. But the better students do on an item, the
more likely it is that an item will disappear from the test. (p. 48)
The reasoning behind this phenomena has to do with item facility (IF). Test items with poor
IFs tend to be dropped since they adversely impact score distributions. Market forces impel test developers
to create tests with high reliability indices, for which well-spread score distributions are requisite.
"Although the both books are arguably dated, the central themes they raise are as relevant today as ever."
Both authors point out standardized tests are poor measures of teacher effectiveness or educational
quality because factors other than instruction impact test performance. Socio-economic status and student
IQ in particular appear to be salient factors. Popham shows how many so-called "objective" test items
actually contain subtle biases favoring high-income social groups and/or those with strong verbal and
mathematical skills. Sacks hints at the mind-numbing effects of such tests by suggesting:
Speeded, multiple-choice tests well serve the entrenched system of passive learning.
Indeed, when learning becomes passive it is easily standardized. The ecology of the
American merit system places most value on people with particular thinking styles that
shine on fast-paced, logical, and reflexive tasks. The merit system devalues individuals
who strive to deeply understand and prefer to create something new rather than repeat
something already told to them. (p. 219)
Sacks even suggests that most standardized tests are akin to thinly veiled IQ tests. However, since the nature of intelligence
itself is not yet well understood, it's unclear what standardized tests are in fact measuring. In both author's view one point
stands amply clear: standardized tests are generally ineffective predictors of academic success.
Appropriate High-Stakes Testing
How should high stakes testing be conducted? Sacks echoes Hart (1994) and Ryan (1995) in calling for more
authentic performance-based assessment which, depending on the context, would include portfolios or evidence of
what students can do. He criticizes most tests for focusing too much on what students cannot do and trivializing
performance. As a result, Sacks suggests:
In recent years . . . educators ha[ve] noticed many, many students were bored and disengaged from academics,
believing school work was completely irrelevant to their particular circumstances and aspirations. (p. 240)
This process of disengagement from school is, in Sacks' view, partly because of shoddy testing-driven instruction
in which students lack any sense of ownership in what they're learning. Sacks argues in favor of more project-based learning
which requires students to master certain skills in order to complete projects. Instead of focusing on bottom-up instruction,
he concurs with Reich (1990) in advocating more problem-based, top-down learning that involves multiple skills to accomplish
lifelike tasks. Mentioning how schools across the USA have incorporated such learning models into their curricula, Sacks concedes
that teachers themselves are often adverse to adopting such learning approaches since their teaching approaches and philosophies
are so deeply rooted in practice-drill modes of instruction. "Rooted in the very act of . . . doing authentic assessment",
Sacks suggests "[is] a simple belief about children's great capacities to learn." (p. 247) and he emphasizes for project-based
learning to work well, teacher education is essential.
Popham suggests four guidelines for creating "instructionally illuminating" large-scale achievement tests.
First, he believes tests should focus on a few high-priority items which can be successfully taught and accurately measured
in the time frames available. Popham criticizes many education agencies for advocating too many unattainable standards which teachers
are apt to ignore. It would be wiser, this former UCLA professor contends, to focus on a few core standards reflecting things ordinary
teachers have time to teach and that short exams can in fact measure. In short, Popham wants test content to focus more on core
skills rather than peripheral material.
A second, closely related point, is that Popham feels each item be subjected to a task analysis so that the skills typically
required to answer any given question actually reflect skills which are supposedly taught in class. Many poorly constructed test
items, in his view, draw on skills which learners are not presumed to have. Test designers, Popham asserts, should be required to
"think instructionally" about what skills the test items actually draw on. Unfortunately, that is much easier said than done.
Our knowledge of what sort of thinking protocols students go through to answer test items is fragmentary at best.
Third, Popham feels it is important for test developers to communicate more clearly with teachers about their tests. Teachers
are often inadequately informed about test content specifications. "The better teachers understand the nature of the assessed
content standard, the more effective they'll be at promoting that content standard." (p. 89) Popham maintains.
Finally, Popham calls for regular test review at a level of rigor appropriate for the test. For large-scale, high-stakes tests
Popham suggests a panel of 15-25 of reviewers, with a few instructional specialists and curriculum experts, but mainly front-line teachers.
Without offering any evidence to support his claim, Popham then posits that "instructionally illuminating" tests will "help teachers
teach better" (p. 102). Looking more closely at washback research by Qi (2004, pp. 171-190), however, that statement seems both naive and
optimistic. Many teaching behaviors are fossilized and relatively immune to change. That is why Datnow, Hubbard, and Mehan (2002)
contend that for lasting educational reform to work, fundamental shifts in the attitude towards testing are needed by all stakeholders
in the system.
Are these books worth reading for foreign language teachers outside of the USA? There's little reason to go through either book
cover-to-cover, but sections of both books have merit. In particular, Chapters 11 and 12 of Sacks' book are insightful and the Preface
and Chapter 6 of Popham's book are stimulating. On the other hand, the final chapters of both works are of limited relevance to those
Although the both books are arguably dated, the central themes they raise are as relevant today as ever. Indeed, with this year's decision
to renew standardized testing in Japan after a long hiatus (MEXT Curriculum Council, 1998), it's worth reflecting deeply on the politics and perils of
large-scale mandated assessment.
- Reviewed by Tim Newfields
Datnow, A. Hubbard, L. & Mehan, H. (2002). Extending Educational Reform (Educational Change and Development Series). London: Routledge Falmer.
Hart, D. (1994). Authentic assessment. New York: Addison-Wesley.
Nakamura, A. (2007, April 21). National achievement test due next week. The Japan Times Online.
Accessed April 27, 2007 at http://search.japantimes.co.jp/cgi-bin/nn20070421f1.html [Expired Link]
Sacks, P. (1996). Generation X goes to college. Chicago: Open Court.
Lemann, N. (1999). The big test. New York: Farrar, Straus, and Giroux.
Qi, L. (2004). Has a high-stakes test produced the intended changes? In L. Cheng & Y. Watanabe (Eds.).
Washback in language testing: Research contexts and methods. Mahwah, N.J.: Laurence Erlbaum & Associates. pp. 171-190.
Reich, R. (1990). Redefining good education: Preparing students for tomorrow. In S. B. Bacharach (Ed.)
Education reform: Making sense of it all. Boston: Allyn and Bacon.
Ryan, C. R. (1995). Authentic assessment. Westminster, CA: Teacher Created Materials.