5 Reasons Teachers Can’t Be Judged By Test Scores

How Test Scores and Climate Affect Our Judgements of Teachers

Why are my test scores so low? My scores were higher last year. I don’t know what I did differently last year to get the scores up? I feel like I’m doing the same if not better, but the scores aren’t showing it!

I’m pulling my hair out about these scores! I feel like a failure!

Have you said these things to yourself?

If you’ve been in a classroom in the current milieu, you have.

If you’ve led teachers in the past 5 years, they have.

Using a test score to judge a teacher does nothing to describe the impact she has on learning. On human lives. @mafost

These are the thoughts of teachers, coaches, and principals when scores don’t match the efforts, practices, and strategies – we always expect great results, but sometimes the scores paint an ugly picture.

I want to offer 5 research-based, statistical reasons, and common-sense reasons why teachers can’t be judged by scores. Then I’ll provide one simple way to measure your impact on learning instead of judging using scores.

Using a test score to judge a teacher does nothing to describe the impact she has on learning. On human lives. Here's a solution instead. @mafost

1. Behind Every Score is a Human

Think about the stories – backstories and futurestories. They are countless.

I did a quick search of the highest and lowest rated schools in my area*. This is what I found.

Highest RatedLowest Rated

*School ratings are best on state test scores, demographic data from greatschools.org | Bush Elem; Spring Branch Elem

Highest Rated School: 13% of homes have low-income; 21% of homes are non-native English speakers.

Lowest Rated School: 94% of homes have low-income; 71% of homes are non-native English speakers

It’s just two quick samples and we see two vastly different backstories.

One story is of privilege, financial well-being, and a system built for success like theirs. Another story is of language barriers, income challenges, and a system that indicates a lack of success.

It’s the old rule of thumb: zip codes = test scores.

But is it really a lack of success?

We could pull dozens of other samples each with their own unique backstories, but if you drill deeper into actual classroom makeups we see a synonymous scenario.

And that’s just the backstories.

If your classrooms have any type of demographic differences, these factors alone will make some classroom scores look higher or lower. Yet, none of these factors are in the purview of the teacher’s control.

If you don’t believe me, control the teacher variable and look at different classes with the same teacher. The scores tell different stories.

Before you stop reading and claim I have “low expectations” – let me say I’m aware and thrilled by the Effective Schools Research (read about it here in Episode 7 and here High Expectations and Excuses).

Yes, any school, any class, and any student can learn at high levels. But I’m talking about scores – not learning.

Scores are not the same as learning. Scores are subjective. They are estimates. They can be easily misread.

Can individual teachers overcome outside factors? Absolutely. Outlier teachers do it all the time (but not all of the time).

2. Teacher Potential vs. Teacher State

Remember the visible learning research from John Hattie? The updated research shows “teacher collective efficacy” as one of the highest factors in student achievement.

The key factor with teacher collective efficacy is time. Allow me to explain.

Teacher potential is immense, but it’s not the same as teacher state.

Teacher knowledge, both pedagogy and content, are only two factors involved with teacher efficacy. A great teacher last year might look like a terrible teacher this year if you only judge using test scores.

Why might this be?

  • New to a course
  • New student population
  • Unique challenges specific to a small set of students
  • Lawsuits that distract
  • School climate
  • Team climate
  • Curricular changes
  • Mandated pedagogy changes

…and many more reasons can impact teacher state. If you judge a teacher based on test scores, you actually might be measuring the factors in this list instead of the teacher’s actual performance.

3. Appeasement

Did you notice climate and mandated pedagogy changes in the list above? These are huge.

Teachers often appease decision-makers who do not know the needs of the students. Well-meaning school leaders can actually disempower teachers and force bad decisions for students.


Most often it’s for job security. The natural response for most people is fear when a superior asks a teacher to do something. Compliance = job security.

It’s so ingrained in school cultures because that’s what teachers do all day with students. Rules. Directions.

So even if a mandated pedagogical change is not having a positive impact on learning, many teachers will continue to comply (and some will close their door and teach – for better or for worse). Some may even be threatened if they don’t comply.

Regardless of the impact on learning.

Instead of judging teachers with test scores, these scenarios actually point to the decision-making or micro-management of school leaders.

Read more on why the poor decisions are made in this post on Multifinality.

4. Starting Lines

Imagine you and I coach different track teams. Your team runs the 100-meter dash, and my team runs the same race. Let’s look at our scores:

  • Your team = average score of 9.5 seconds to the finish line.
  • My team = average score of 12.5 seconds to the finish line.
  • Your team beat my team by over 30%!

Clearly, your team won (test ratings); however, that doesn’t mean you’re the better coach. Your team started at the 20-meter marker, but my team started on the 0-meter marker.

You only had to run 80 meters, but my team had to run 100 meters!

That’s clearly not a fair race – not a fair comparison. It’s not a good judgment. But that’s what happens with test scores.

Different starting lines negate the judgments we can make about teachers using test scores unless those judgments are based on growth rates.

What’s the solution then?

I suggest using A/B Testing when using test scores.

5. Instructional Climate

Take a flea. It’s a tiny creature only millimeters in length. If you place that flea on your coffee table, it can leap upwards of 6 feet!

Put the same flea in a 4-inch container with a lid.

That same flea will leap over and over, hitting its body against the lid.

It won’t be long before the flea (whether conscious or not) will stop leaping so with such force.

Remove the lid, and the flea will never be able to leap out of the jar.

That’s the amazing power of climate.

Instructional climate can limit potential. It can unleash potential.

The same is true for you, for me, for everyone – including teachers. Take a star teacher from an amazing instructional climate in one district and place her under the “lid” of a below average climate and she will perform the same.

Using a test score to judge and evaluate her does nothing to adequately describe who she is as a professional, who she is as a teacher, or the impact she has on learning…Her impact on human lives.

One Solution to Make Better Judgements About Teaching

Split testing is probably the best use of test scores. It’s even better when using sensitive data as opposed to lethargic data.

What is split testing for schools?

It’s simply taking two groups with the same teacher and administering pre- and post-assessments to quantify growth rates. The key is to make a substantial change with one group (experimental) and not the other group (control).

The group with the highest growth rate is likely to have had a better learning experience, which informs us what to do next.

For more details, including free resources, I recommend visiting the Mafost daily blog using the links below.

One thought on “5 Reasons Teachers Can’t Be Judged By Test Scores

Comments are closed.