Everyone has heard the complaints about testing students too much. Since the early 2000s, standardized testing has become the norm of the American education system. Each state wants a score, a rating, and a list that says who’s the best and who’s the worst. Testing is the current means for determining that, but not everyone wants too much testing. So how do you know if you’re testing too much?
Who Wants More Tests?
Parents typically don’t want their kids tested too much. Teachers are aware of the loss of instructional time because of an overload of testing. Anyone that’s on campus is astutely aware of testing’s negative impacts on the morale of staff, the enthusiasm of young learners, and the overall tension in the workplace.
Are you testing too much?
But are you testing too much? How much is too much? How much is not enough? Where’s that sweet spot in testing, assessment data, and accountability?
In this post, I will share with you one way to know if you are testing too much, and what to do about it.
Tests that Are Too Big
I’m using the term “big tests” to mean criterion-referenced tests that do not give you accurate data. A criterion-referenced test is one in which you mix multiple standards into one test. You link 1-3 test questions to each of the learning standards. You see these all the time in classrooms, schools, and districts.
This is a normal practice. These tests show up under the names common assessment, curriculum-based assessment, benchmark, final exams…
But there’s a major problem with big tests. These types of tests don’t really tell you what students can do.
I’ll spare you the technical discussion of test-retest reliability, content validity, or construct validity (here’s a good article from the College Board), but it’s worth making just a few comments about test reliability.
Big Tests, Big Problems in Reliability
Imagine watching a movie in fast forward…say, at 10x speed. And then writing a movie review about it. You have to rate the movie and give your critique. However, you really can’t make a good judgment about the movie, can you?
That’s the reliability problem.
When your test is too “big” with multiple standards, you end up watering down the accuracy of the information you get from the test. You can’t pack enough test items for each standard. You’re trying to test too much in one test.
With only 1-4 questions per standard or skill, you barely get a sample of student ability. Like the fast forward movie, you can’t see each scene. You can’t really give a rating and critique. You certainly don’t get enough information for a high-stakes judgment about each student’s abilities.
You can’t Make Reliable Judgments about Learning
If the test items are all complex, then you miss out on determining any scale of student mastery. Too many difficult questions mean you’re going to have binary evaluations. On or off, with no continuum. Even the best-designed tests on the market have this problem.
Here’s how to know. Take a student and administer the odd-numbered test items from a test. Give that same student the even-numbered test items from the same test on the next day. In a highly reliable test, the student’s score will be nearly identical on each day.
Of course, the student’s scores should be nearly identical, no learning occurred overnight in the student’s sleep. Therefore, there should be no change in test score. This is called reliability.
But this is not the case with “big” tests.
Even the best tests created by the largest publishers max out at a reliability of 0.6 or 0.75. This score means a student’s score could fluctuate on any given test by 10-40%.
This is the problem of “big tests”. Even if given infrequently, they really test too much. They only scratch the surface.
There are two ways to correct this:
- Make the test even longer…like 10-20 questions per standard.
- Use unidimensional tests.
I don’t think anyone would agree with the first option. Most tests measure multiple concepts and skills. Multiply those by 10-20 questions and you’ll have an unearthly sized test!
…Unless there’s only one learning standard or skill to measure on the test. And that is the heart of unidimensional tests.
Unidimensional Tests are the Answer to the Big Test Problem
You’re going to be spared the theory and arguments behind the concept, and I’ll jump straight into what it is and how to use unidimensional tests.
A unidimensional test is one that assesses one topic or skill. It doesn’t group multiple standards into the test. It doesn’t try to “test it all”.
It focuses on one group of concepts and skills that point to your courses overarching ideas.
I hear what you’re thinking, “Wait, that’s what we already do!”
Hold on. In the next post, I will show you:
- How to know if you’re using unidimensional tests.
- Examples of how to create unidimensional tests for different subjects.
- Reasons why you should start using unidimensional tests now!
- The second way you could be testing too much.
Before we get to the next post, take a moment to view the benefits of unidimensional tests.
What are the Benefits of Unidimensional Tests?
The benefit of this type of test is that they can be given more frequently than big tests because they are shorter. That helps teachers, school leaders, and students in several ways:
- Track growth rather than improvements in passing rates (an important topic for another article).
- Increased quality and reliability of the scores.
- Sensitive to student growth. (Here’s a great article on the difference between Lethargic Data and Sensitive Data).
Increased quality of data is another benefit.
The data has higher reliability, both in terms of statistics and in classroom application. I’ll the next post, you will see how this is the case.
Thanks for reading…see you on part 2.