The New Jersey Department of Education (NJDOE) released some important information about the AchieveNJ teacher evaluation system last month. We now know how a teacher’s student growth percentile score will be converted into a number used to calculate that teacher’s evaluation rating. The NJDOE also provided the summative rating performance ranges. For a teacher to be rated “highly effective,” for example, he or she must score at 3.5 or above. Turn to Page 26 for more information about teacher evaluation scoring.
The NJDOE memo underscores that October was a busy month for the subject of teacher evaluation. Rutgers professor Bruce Baker made a compelling case against the use of student growth percentiles in teacher evaluations during his keynote address at the NJEA Jim George Collective Bargaining Summit. In addition, a new report from two Boston College professors warned of the dangers of the insidious data-driven improvement and accountability movement in education. And all of this came on the heels of an article in Educational Researcher that questioned the ability of teacher evaluation systems to generate significant school improvement.
There are, in fact, two essential questions at the heart of the teacher evaluation debate. Can student test scores be used to calculate accurate and reliable measures of an educator’s performance? And further, is targeting teacher performance the best way to raise student achievement?
A system based on “egregious misrepresentations”
The basic assumption made by proponents of value-added measures (VAM) of teacher evaluation is that an educator’s effectiveness equals student test scores after having that teacher minus student test scores before. Depending on the specific evaluation model, those scores are “controlled” for a variety of conditions such as a student’s socioeconomic status, special needs, etc.
But as Dr. Baker pointed out, common sense tells us that a lot can and does happen to a child outside a teacher’s classroom. And no researcher has yet to create the model that fully addresses questions like “How do a child’s summer activities affect learning?” or “What role does home life play in a child’s ability to grow academically?”
These factors represent just a few of the reasons why value-added measures of teacher performance have shown little stability to date. Anecdotes of teachers who were ranked at the top of the performance scale one year and at the bottom the next year abound. Baker contends that of the thousands of New York City teachers for whom ratings exist for each year, there are only 14 math teachers and five English language arts teachers that stay in the top 20 percent of teacher rankings each year.
“I sure hope they don’t leave,” quipped Baker during his conference presentation.
The Rutgers professor of educational theory, policy and administration went even further when he challenged the NJDOE’s use of student growth percentiles (SGPs) in teacher evaluation.
New Jersey Commissioner of Education Chris Cerf has repeatedly cited the Gates Foundation MET (Measures of Effective Teaching) Project as proof that student test data can be a reliable component in determining a teacher’s effectiveness. But as Baker and other researchers have noted, the MET Project did not use student growth percentiles; it studied a VAM model of teacher evaluation. So, not only is it misleading for the NJDOE to hold up the MET Project as proof it is headed in the right direction, Baker maintains that “SGPs are even worse” for inferring teacher influence on student outcomes since SGPs “do not (even try to) control for various factors outside of the teacher’s control.”
Student growth percentiles are not backed by research on estimating teacher effectiveness.” Baker continued. “By contrast, research on SGPs has shown them to be poor at isolating teacher influence.” He calls SGP measures at the school level, “significantly statistically biased.”
In short, SGP scores fail to accurately attribute student gains and losses to their teachers. How, then, can teachers be held accountable for them?
Baker’s bottom line is this: policymakers seem to display a baffling ignorance of basic statistical principles. “one simply cannot draw precise conclusions—and thus make definitive decisions—based on imprecise information.”
Accountability and improvement are not the same thing?
Let’s assume for a moment that all of the educator effectiveness data being generated around the country is accurate. Can it actually help us improve our students’ ability to learn?
Boston College education professors Andy Hargreaves and Harry Braun aren’t so sure. In their newly released report, “Data-driven Improvement and Accountability (DDIA),” Hargreaves and Braun contend that high-stakes measures tend to put adverse and perverse pressure on administrators, teachers and students. As a result, DDIA programs set back real and sustainable improvement opportunities.
When it is used thoughtfully, DDIA provides educators with valuable feedback on their students’ progress by pinpointing where the most useful interventions can be made. Thoughtful uses of DDIA also give parents and the public accurate and meaningful information about student learning and school performance. In other words, data does have a place in education reform.
But the professors contend that measures of learning are usually limited in number and scope, and the consequences for schools and teachers of apparently poor performance are often punitive. In those cases, accountability actually impedes improvement. Under pressure to avoid poor scores and unpleasant consequences, many educators concentrate their efforts on narrow tasks such as test preparation and coaching targeted at those students whose improved results will contribute most to their school’s test-based indicators. Because the metric that ultimately counts are test scores and measurable data, schools and teachers now teach to the test at the expense of real learning, resulting in neglect of high-needs students, and even incidences of cheating.
“When accountability is prioritized over improvement, data neither helps educators make better pedagogical judgments nor enhances educators’ knowledge of, and relationships with, their students,” the author cautions “Instead of being informed by the evidence, educators become driven to distraction by narrowly defined data that compel them to analyze dashboards, grids and spreadsheets in order to bring about short-term improvements in results.”
So, rather than planning more engaging lessons, the attention of teachers is diverted from students to scores. It’s hard to imagine this will advance learning.
The case of the missing clothes
Hargreaves and Braun aren’t the only academics who are skeptical about data-driven accountability and improvement. Although there is an abundance of evidence that teachers are an important in-school factor affecting student achievement, many question whether the focus on (or some would say obsession with) teacher evaluation is the best way to improve our schools.
In “Leading via Teacher Evaluation: The Case of the Missing Clothes,” professors Joseph Murphy (Vanderbilt University), Philip Hallinger (Hong Kong Institute of Education) and Ronald H. Heck (University of Hawaii-Manoa) conclude that teacher evaluation may not yield the improvement in teaching practice that many assume it does. There are several reasons for this, including that some administrators are not qualified to help teachers improve (especially if they lack experience in the grade level or subject area of the teachers they are evaluating). And, for a variety of reasons, even well-qualified and trained administrators simply don’t have the time to devote to teacher evaluation.
That is why, the authors conclude, that “if school improvement is the goal, school leaders would be advised to spend their time and energy in areas other than teacher evaluation.” These other areas include establishing a powerful sense of vision, enhancing student opportunity to learn; creating personalized learning environments in which all youngsters are cared for, participating in and having ownership of the school; and developing a school culture conducive to learning.
Murphy, Hallinger and Heck maintain that there is far more evidence linking a principal’s efforts as a school leader to learning outcomes than those that tie teacher evaluation to student growth.
What can be done?
If some student test data is misleading at best, and if the belief that teacher evaluation can significantly improve student learning is a red herring, why are we heading down this path?
Richard Rothstein, a research associate at the Economic Policy Institute, was among the first to identify “an overemphasis on teachers” in the education reform debate. Not only does this approach “demoralize good teachers by exaggerating their responsibility for student outcomes,” Rothstein believes it also ignores those strategies that can change the lives of young people, such as early childhood education, small class size, and safe and healthy school environments.
Experience has taught us that saying “no” to an education reform strategy—even when credible and significant research says it won’t achieve the desired outcome—isn’t good enough. So, how can data boost student learning? Hargreaves and Braun believe there are better ways to use data to improve our schools. Some of these ideas represent good sense; all of them are used in high-performing countries and educational systems.
Among their recommendations are:
- If you’re going to collect data, collect all of it. Everyone knows that student test scores represent just one piece of the puzzle. Hargreaves and Braun call for a “balanced scorecard,” which should include the time allocated to each subject by grade, suspension rates, staff turnover rates, teacher absenteeism, diagnostic assessments, survey results of student engagement, teacher certification, student mobility, and so on.
- Insist on high quality data. Much progress has been made, but questions about the accuracy, validity and reliability of data persist.
- When it comes to testing, less is more. The Boston College researchers call for a reduction in the scope and frequency of testing. “This can be achieved by testing at just a few grade levels (as in England, Canada and Singapore), rather than at almost every grade level. Another option is to test a statistically representative sample of students for monitoring purposes (as in Finland), rather than a census of all students. Yet another route is to test different subjects in a rotating cycle (e.g. math is centrally tested and scored once every three or four years), with moderated teacher scoring of assessments occurring during the intervening years (as in Israel).”
- Narrow the opportunity gap. Out-of-school factors, such as socio-economic levels, are far more accurate predictors of student achievement than in-school factors. Other nations have realized that “in education, quality cannot be achieved without equity.” If we truly want to do something about low-performing students and school systems, we must do something about poverty.
- Let educators use their professional judgment. “Numbers must be the servant of professional knowledge, not its master,” argue Hargreaves and Braun.
Policymakers and educators must readjust their collective view of the capabilities and usefulness of the mountains of data being generated by school districts and state education agencies. Our love for data is nothing new, but now it is driving adults to distraction and harming our efforts to educate children.