The Assessment Trap

1. Legitimate Uses Of Assessment

Assessment, of course, plays an important part in instruction. I will start by discussing how it can be used to improve student learning.

1. Fine-tuning the course

This is by far the most important use, because it affects the most students: not only the ones in the current class, but also the ones who will take future iterations of the class. If a quiz reveals a widespread misunderstanding, it is a sign that something was wrong with the way I taught that topic. Is there another representation of the concept that would help? A different tool that can get the idea across? A different sequencing of topics that would provide a stronger foundation?

2. Diagnosing individual students' understanding and skills
3. Helping students realize what they know and can do

Assessment makes it possible for the teacher and student to celebrate what has been learned, and zero in on the areas that require the most attention. Ideally, some combination of student agency and teacher / school support can help address those challenges.

4. Providing learning opportunities

Many teachers separate assessment from learning. That is neither necessary nor desirable. Assessment can be an integral part of the curriculum: not an endpoint, but a point along the way. This is especially true of at-home assignments and test corrections. (More on those below.)

A quiz or test can contribute to instruction with the inclusion of one or two questions that extend a given topic, or require students to apply what they know in an unexpected context. A challenging question on a quiz can be the launching pad for deeper learning, or even provide an introduction to a new concept. This use of assessments is, to say the least, controversial.

When a student or parent or administrator objects that it is "unfair" to include challenging questions on assessments, they are betraying a certain point of view on education. Unfortunately, it is a widespread point of view: assessments are opportunity to regurgitate ideas that were deposited in their brains by the teacher. If instead you prioritize student understanding over student recall, as I do, you are obligated to provide opportunities for students to engage intellectually with the material, even on an assessment, perhaps especially on an assessment.

Concerns about fairness are of course legitimate. They can be addressed, for example, by appropriate weighting of different parts of the assessment, by labeling questions as "bonus", or in some other way. In my view, removing challenging questions altogether is an unacceptable lowering of expectations. I will return to this.

Note that all four uses of assessment I have mentioned above can be thought of as formative, in that they focus on student learning, not student ranking.

↑Top

2. Problematic Uses of Assessment

Here are five uses of assessment which I find problematic.

1. Assigning grades

We need to give grades to "let students know where they stand", to enter them into transcripts, and of course, to rank students. Grades are the basic currency of education in most schools, and assessment is how we figure out grades.

I am uncomfortable with comparing students to each other, as this sort of pressure can interfere with learning, and given different backgrounds and experiences it is intrinsically unfair. Some make the claim that grades are about standards, not comparison, but I don't buy it, and neither do students, parents, or college admissions officers. Everyone knows grades are about ranking students, no matter what efforts are made to disguise this obvious fact. (I will return to this below.)

2. Justifying the grades

Because grades are so important to students' status and opportunities, they are contentious. If a student, parent, or administrator wants to challenge us, or if we ourselves are unsure, we need solid evidence that the grade was assigned fairly. Thus we need some sort of objective-seeming way to justify the grade. From a certain point of view, this is the main purpose of assessment. It provides cover for the teacher and the school if and when grades are questioned, and it attempts to address our concerns about fairness.

3. Preparing students for future assessments.

I am not joking. "We have to do multiple choice tests to prepare you for such-and-such a standardized test." "You have to learn to work under time pressure, because that's what you'll have to do in college." (Or high school, or middle school.) And so on.

There's a bit of truth to this, of course, but only a bit. Giving assessment as the reason for assessment fails to answer any fundamental questions, and reflects how deeply corrupt the system is.

4. Manipulating student motivation

Note that the above three uses of assessment have nothing to do with student learning. Many educators try to compensate for that by emphasizing assessments as tools for manipulating student motivation. Since not all students are enthusiastic about carrying out teacher directives or pursuing education for its own sake, it is widely believed that grades (along with the points or rubrics that lead to the grades) are the key tool in motivating students to do schoolwork. Unfortunately, this does not work as well as is widely believed. In fact, grades, points, and rubrics shift students' attention away from the subject matter, towards "how they are doing" which in fact undermines their intellectual and emotional engagement with the work. Assessment anxiety can sour a student's entire relationship to the subject matter.

5. Rewarding obedience

For some of us, assessment policies reward docility more than understanding. I kid you not: some teachers take points off for a staple in the wrong location. Many will penalize students irrelevantly by having their attendance or punctuality affect their grade. Yes, there's a place for that, but there are better ways to achieve those results. Our job is to teach math, not unquestioning obedience to authority figures. Moreover, there are all sorts of biases built into this, because students from different backgrounds (and in fact, different genders) often have different relationships to authority, and that has little to do with their ability to do math.

In the world we live in, there is no easy way to escape these problematic uses of assessment, but they should not dominate our thinking. Far from supporting learning, an emphasis on grades, points, and rubrics in fact undermines both motivation and achievement.

↑Top

3. The Meaning of Grades

For many students, parents, teachers, and administrators the key purpose of assessment is to assign grades. Before going any further, we need to think about the meaning of grades.

Grades have no intrinsic, absolute meaning. An A at an elite private school does not mean the same thing as an A at a public school that serves a poor neighborhood. An A at my own school today does not mean the same thing as an A meant 20 years ago. An A in Science does not mean the same thing as an A in History. And on it goes. The one feature of grades that is quite reliable is that an A in a given department at a given school is better than a B, which in turn is better than a C. In other words, the meaning of grades is relative. They are how we compare students to each other.

One thing that reveals the subjectivity of grades is the fact that the points that are its ingredients get added up even though they represent incommensurable things. x points for class participation, y points for homework, z points for quizzes, etc. It's like adding a student's height, weight, and temperature, in the hope of getting a meaningful sum.

Not surprisingly, almost all teachers will fix how they compute their grades if the outcome does not sort the students correctly. If a student deserves an A, and your calculation yields a B, you will find a way to tweak the percents, or the scores, or the participation points, or the extra credit, or something, to make sure the student does not get cheated by a pseudo-objective algorithm. (Admittedly, if the calculations yield an A, rather than the B we expected for a given student, most of us would let it be.) This makes sense, because teaching is as much an art as a science. Given a small enough class and enough of the right sort of contact with the students, a competent teacher knows better how to sort the students than any formula. (Yes, better assessments yield more accurate grades — that’s what I meant by “the right sort of contact”.)

In the rare case of the teacher who delivers much worse or much better grades than expected by their school, they will be taken aside by an administrator, and told that their practice is out of line. This does not require looking at the students’ work — more evidence that grades are strictly a relative measure.

In short, grades compare students to each other. They have no other meaning. This is why college admission officers are interested in grades. If grades were not about sorting students, they would be useless. Just to be clear: grades do not compare students only to others in the same section of the same class, but with the somewhat broader group of students in the same cohort at the same or similar school. And moreover, it is of course true that the rankings are only meaningful if you accept the teacher's and the school's assumptions and values.

One might argue that grades are a measurement of how well a student meets the standards of a given class, as in this example:

"If I select a finite set of conceptual understandings, math skills, and problem-solving dispositions, then I can strive to create assessments that measure student progress in these areas. For example, a student can score 95/100 points on my Logarithms test regardless of what any other student scores. In fact, my student can score 95/100 even if no other student takes the test. Furthermore, ALL of my students could conceivably score 95/100 pts. When I publicly post to parents and colleges that all of my students received an "A", then I am not using grades to rank students, at least not in a way that helps the sorting process. Rather, I am comparing each of them to my absolute standard of mastery in the area of logarithms."

Unfortunately, the belief that there is an objective 95% on a logarithm test is naive. The same test could have questions weighted differently, or have partial credit assigned differently, or have more of this type of question and less of that kind, and so on. Even if that were not the case, one could argue that the test is too easy or too hard -- that is strictly a matter of context and selection of goals. One teacher may think that students correctly switching to another base via a memorized formula is important in Algebra 2, another teacher may think it's fine to use any base as long as it makes sense to the student and they can apply it to real world problems, a third teacher may not have an opinion about this, and just want their students to be able to answer SAT questions about logs. The same test would yield different results in each of these teachers' classes. Just because the teacher assigns a percent score doesn't make it objective.

Standards only exist in relation to the specific students currently enrolled at the school. If almost every student met a given set of standards, no matter how valid those are, it could not and would not be used as a way to assess achievement in the class and determine the grade. In fact, such a set of standards would make for a course that is too easy for the given population. Conversely, a set of standards that is met by almost no one makes for too difficult a course. The only standards worth aiming for are precisely the ones that sort students into A, B, C bins.

One could argue that this is an argument against a system with no grades at all. Without grades, it would be easier to set your expectations too high or too low, or to have a bimodal distribution, with some students doing very well, others clueless, and little in between. Giving grades can help us calibrate challenge and access in the classes we teach. In other words, giving grades is not per se wrong. In moderation, it can be useful.

But that does not negate this fact: grades are about comparing students to each other. They have no other meaning. Students know it, parents know it, teachers in practice know it. Educators who believe otherwise are deceiving themselves.

Grades: What Does the Research Show?

Many of us like to complain about the grade grubbing culture at our school. We like to imagine a world where all students are strictly motivated by their interest in what we are teaching. I sympathize, but I don't blame the students: they reflect the broader culture, and especially the culture and structures of our own school, and our own complicity in those. If you want to reduce grade grubbing, you'll have to find ways to de-emphasize grades.

↑Top

4. De-emphasizing Grades

When students learn their grade for a given course, what they are learning is how they compare with their peers, which is one indicator of "how they are doing". Grade or no grade, many students know exactly where they fit in the classroom hierarchy, though some may not admit it to their parents or even to themselves. It is true that some (often boys) overestimate themselves, and others (often girls) underestimate themselves. For those students, knowing the grade may be a helpful corrective. But is it a good idea, educationally, to dwell on comparisons between students?

Like many teachers, I am reluctant to make such comparisons. They are unfair and unproductive. Unfair, because students come from many different family and educational backgrounds. Comparisons between students end up being largely about that. Unproductive, because it is not realistic, in most cases, to expect major changes in the short run. A hard-working C student may need years, not weeks, to become a hard-working B student. We can point them in the right direction, offer them intellectual tools, help them to improve their work habits, and over the course of their high school career we can see spectacular changes. And we often do -- this is one of the most satisfying parts of working in a strong department.

But paradoxically, the way to get there is not to dwell on the grades. (It's a bit like searching for happiness -- you're more likely to find it if you don't dwell on that as a goal.) At most schools, the conversation is about "what do I need to do to get an A?" (or a B), and of course, that is the subtext of many conversations at any school. The teacher's responsibility is to deflect that conversation towards the specifics of this particular student's needs at this stage. Perhaps the A is already guaranteed, but the student needs to focus on their ability to communicate their ideas better. Perhaps the A is just not going to happen this term, but the student needs to work on developing their symbol manipulation skills, or their ability to write a logical argument. There is always work to do, and a time to stop working, irrespective of where the student stands in the grades distribution at this particular time.

A grades-focused conversation means that in these very common situations (the A is guaranteed, or the A is unattainable at this point) there is little to discuss. It can also lead to grade inflation in a variety of ways: in order to motivate students with the grade, we might make it easier to attain. Or in order to not be hassled, we might make A's more plentiful. Grade inflation is not the end of the world, but if we want to inflate grades, we ought to do it deliberately and not as an unexpected consequence of uncomfortable conversations.

If a student's place on the academic ladder is constantly harped on by the school culture, students can internalize the label and stop striving. This is what is now known as a fixed mindset. Skillful teaching is in part about bringing out students' different strengths to the fore, and building on them, whether or not those lead to a better grade in the short run. For example, a strongly visual student can contribute a lot to a discussion, even if he or she is not yet ready to translate that talent into points-earning write-ups. Over time, such engagement does lead to better grades.

Bottom line: intrinsic motivators (such as interest in the subject matter) are more powerful, longer-lasting, and more meaningful than extrinsic motivators (such as grades.) Our task, as teachers, is to move students from the latter to the former. It is a challenging enterprise, but we must try to keep the focus on the discipline we teach and our own passion for it, rather than on the lines separating our students into A, B, and C. Teaching students to be self-motivated learners, and modeling that relationship to the subject, is a vastly more useful contribution to them as lifelong learners than the Pythagorean Theorem or the quadratic formula.

↑Top

5. The Perils of Backward Design

For many students, over-emphasis on grades has a corrosive impact on learning. Alas, it also corrupts curriculum and pedagogy. In other words, it not only affects students individually, it also affects all students as a group.

Backward design, an idea pioneered by Grant Wiggins and others, is based on the idea that curriculum should be designed by first discussing what you want the student to be able to do at the end of the course or unit, and then build a curriculum that will provide a path to that destination. In principle, that makes a lot of sense, and I am sure it can be done well. However there are many ways this approach can backfire.

First of all, some of the most important destinations are hard to specify. Say that your goal is deep understanding of a given topic. What is that? How do you know whether a student has achieved it? Likewise, if you aim for an appreciation of the beauty of math, or student self-confidence, or increased curiosity, or social responsibility, or an ethical stance. In spite of the pseudo-scientific managerial theories adored by some school administrators, none of those things lend themselves to straightforward measurement. The most important goals of education are hard to pin down, and this makes it difficult to design backward from there.

Even within a narrow definition of the discipline and even if one accepts to set basic goals within such a framework, backward design can lead to a demand for easy-to-measure outcomes. Tell us what you want the students to be able to do, and we can see whether you were successful. We need data! And thus starts the descent into lists of micro-skills, items that can readily be checked off, or not checked off. Such lists are powerful, because they are easy to communicate to students, to parents, and to administrators. Pretty soon, the essential goals fade away, and the pressure is on to produce results in the form of check marks on a rubric.

This in turn affects how the subject is taught. Take equation-solving. Instead of empowering the student with the essential concepts they might use in solving an equation, we ask them to memorize many cases (one-step equations, two-step equations, etc.) The advantage of atomizing the subject like that is that those micro-skills are easy to assess, and the resulting assessments yield "data". Simultaneously, it relieves the students from having to think, which is something they appreciate if that is how they have been taught math in the past. The disadvantage of this approach is, well, that it does not work. It is not effective in helping students develop understanding, self-confidence, or an appreciation for mathematics.

This atomization of a subject can result from many different assessment policies, ranging from the standardized test mania which has done so much damage, to well-intentioned standards-based rubrics and grading schemes. Overemphasis on assessment inexorably pushes curriculum towards "how to do" things. There's a place for that, of course, but too much of it and you're treating the student as a programmable device, and preventing them from engaging with the subject matter intellectually. The message you are communicating is "Since I've already given up on your ability to think, I will have you memorize these easy-to-remember steps..." and your de facto low expectations will be self-fulfilling, as they reinforce students' fixed mindset about their abilities.

Read a blog post on the dangers of "How To" and concept atomization.

Teaching for understanding is hard to assess, it is hard to capture in a checklist, and it is hard to define in a few words. In spite of all that, it is the most important part of our job. How can assessment help us rather than hinder us as we strive to do it? How can we resist the temptation to demean our students with low expectations, and our discipline by reducing it to simple recipes?

↑Top

6. Assessment Strategies

An overemphasis on assessment undermines curriculum, pedagogy, and student learning. Of course, it is politically impossible to avoid assessment, as it is a major preoccupation of students, parents, and administrators. Moreover, it is impossible to teach without assessing student understanding. As I mentioned at the beginning of this article, there are legitimate, even essential uses of assessment: fine-tuning the course; helping both teacher and student know what the student understands and can do; and offering learning opportunities. All these are best served by decreasing the stakes. Lower stress translates into more accurate assessment. Ungraded formative assessments can serve the most important goals of assessment, and should play a bigger role than they do in many classes.

Tests and Quizzes

Still, much can be said in defense of traditional tests and quizzes: they provide a lot of information, they are easy to grade, and they are expected by all constituencies. Given that they are here to stay, how can we make them more effective? Here are some suggestions:

- Within reason, give students as much time as they want. If a student is not fast, or does not do well under time pressure, so what? It does not mean they don't understand the material. Racing belongs in PE, not in the classroom.

- Do not over-penalize students for small computational errors that could be eliminated by the use of technology such as calculators and computer algebra systems. Prioritize evidence of understanding, not nit-picking accuracy. (Yes, sometimes computational errors reveal lack of understanding, and of course that is not what I'm talking about here.)

- Getting the right answer matters, which is why you might give less than full credit in the case of small errors. But if accuracy really matters to you, allow any and all technology during most tests. (Yes, there is a place for no-technology tests, but they should not be the default.)

- Offer "points" for test corrections. This lowers the stakes in a good way. Everyone knows that doing well the first time is better, but if learning is the goal, what difference does it make if the learning occurs a week later? In my version of this, I told students they could get half-way to a perfect score by turning in high-quality corrections. I allowed my students to get help from anyone, but all writing must be their own. This assumes a standard of explanation that is higher than on the test itself. (See this blog post on Retakes vs. Test Corrections vs. Neither)

- Lag the quizzes: give new topics a chance to settle into your students' consciousness before testing them. (See this blog post on extending exposure.)

- Periodically, administer cumulative tests, which include topics from earlier in the course. This way, you communicate the message that students are learning concepts for the long haul. Especially in combination with test corrections, this helps to reduce the stakes: students get more than one chance to show their understanding of a given topic, and midterms and finals are not so exceptional and intimidating.

- Include "bonus" or "extra credit" questions, which are important to challenge your strongest students, and which can be used to deepen or extend understanding. I usually required those of all students in the test corrections. This gives the message that getting 100% is not easily achievable, and keeps everyone from getting complacent. It also helps to communicate that a test is a learning opportunity. There will be pushback on this ("This is not fair!") but in fact, what would not be fair would be to limit tests to questions everyone can answer, as it would lower course expectations. Working hard on those items as part of the test corrections makes everything else more accessible. Of course, such problems should not carry much weight, points-wise.

- Use participation quizzes, during which you watch the class work and make notes on students' desirable behaviors. This is an amazingly effective technique to clarify what you consider the most productive ways to function in a math class. Students work on a reasonably accessible assignment, while you sit or stand at the front of the class, writing notes where all students can see them. For example: "Julia took out her materials right away and started working. Chuy is helping Jared with a challenging problem. Lucy is moving closer to the group so she can hear his explanations." And so on. Students are being assessed on work habits, not math understanding, but one leads to the other.

Other Assessments

In addition to better tests and quizzes, it is important to have significant at-home assignments, including especially the test corrections mentioned above. (In schools where homework is not a realistic expectation, such assignments can be completed in study halls or in special class sessions. The main thing is to relieve the time pressure. Read more on this on my blog.)

Here are some possibilities:

- Reports. Ask students to summarize a unit in their own words and with illustrations. Keep those to a reasonable length: one or two pages, or a poster. I have found this works well with 9th and 10th graders.

- Projects. For example, write a (very) short science-fiction story involving exponential growth, with an appendix explaining the underlying calculations. Or use GeoGebra or Cabri 3D to construct an Archimedean solid. Projects are harder to think of, but I have used them successfully, especially with 11th and 12th graders.

- I have also used take-home tests. Those can and should be more difficult, and require more time, than in-class tests. My policy was the same as the one for test corrections: students can get help but they must write everything in their own words. But, you say, in that case, how can you use them as part of a student's grade? If your view is that sorting students is more important than teaching them, you have a point. But I have found that on balance, test corrections and take-home tests do help student learning. A somewhat less differentiated set of grades is a price I'm willing to pay.

My experience is that at-home assignments reveal that some of the students who do exceedingly well in a classroom test do poorly when the assignment requires a more thoughtful approach. This is important information for the teacher, and moreover, as long as we're trying to be fair, it levels the playing field some. In practice, test corrections are quick to grade, but reports and projects are not. One of those per grading period suffices.

Those were the approaches I used. However, there are other options. I will not say a lot about those, as I am not an expert, but here are a few ideas:

Group tests, with the score determined by a random drawing among each group's papers.
Observing and evaluating students' class work.
Notebook checks, which give you a different window on student understanding.
Holistic scoring of student written work (a lot faster than rubrics.)
Quick Yes/No rubrics (see Algebra: Themes, Tools, Concepts, Teachers' Edition p. 560 for an example.)
Portfolios: a student-compiled folder containing the student's best work.

Student self-evaluations in journals or other formats can help round out the picture, and provide the basis of a productive conversation.

Anita Wah and I elaborated on some of these ideas in Algebra: Themese, Tools, Concepts, on pp. 552-555 of the Teachers' Edition.

Depending on your school and department culture and values, not all of these ideas may work for you. Perhaps totally different strategies are in order at your school. In any case, you should do what you can to reduce the stakes, vary the assessments, and make sure assessment does not dominate your thinking.

↑Top

7. Forward Design

It's time to wrap up this article. Here is a summary of the main points:

From the point of view of teaching and learning, the most useful assessments are formative, not summative. They allow both teacher and student to make the best use of upcoming instruction.
No matter how much people deny it, grades have a single purpose: comparing students to each other and ranking them.
Trying to use grades to manipulate student motivation and behavior is counterproductive, because grades can reinforce a fixed mindset, because measurable progress for a learner may take longer than any one grading period, and because one cannot improve beyond an A+.
Overemphasis on grades not only sabotages individual student learning, but also undermines curriculum and pedagogy by pressuring teachers to quantify everything in search of an illusory objectivity, and making us lose sight of important but hard-to-measure goals.

Therefore, it is incumbent upon us to

Prioritize formative assessments
Lower the stakes on summative assessments by any means available: awarding points for corrections, varying the types of assessments, de-emphasizing grades as much as possible, etc.

Finally, in planning a course, a unit, or a lesson, we should practice forward design. (I am indebted to Carlos Cabana for this concept. See this blog post.) Start your planning by asking these questions, preferably in conversation with colleagues:

What are the big ideas? If you can only come up with specific microskills, think some more: what underlying concepts connect these skills? Can different representations throw light on this? (For example, looking at an algebraic topic geometrically, or graphically, or in a "real world" context.)
What tools are available to provide a way for students to engage in thinking and exploring? This can include manipulatives, technology, and/or pencil-paper tools. The right tool can make it possible to formulate a question that all students can engage in; it can support reflection and discussion; and it can add variety to your course. (See For a Tool-Rich Pedagogy.)
What contexts (themes) are there, whether "real world" or not, that can provide useful problems?
What curricular resources can complement or replace the textbook? Look on your shelves, search the Web, ask your colleagues. This step is crucial, as you most likely do not have time to create everything from scratch, and moreover, freshly-minted activities usually require some classroom testing and tweaking.
How will the students be working at different stages? individually? in pairs? in groups? in whole-class discussions? Different modes are appropriate to different activities, and doing everything in a single one of those is a costly mistake if you aim to avoid lethal boredom and want to reach the full range of students.

Elsewhere, I have referred to the first three ingredients of forward design as "themes, tools, concepts". To see how they interact, check out this blog post on proportional relationships (centered on a concept), or this piece on area (centered on a theme — scroll down to page T23.) For a diagram of how they are connected to traditional teaching methods, see this curricular development model.

After you've done this preliminary work, you can resort to some backward design strategies such as designing your assessments in advance, making lists of specific learning goals, and so on. Starting with forward design will help you keep those practices under control and prioritize what is most important. Forward design will save you from losing perspective.

Admittedly, this requires a lot of work. Collaboration is key: who can help you? Colleagues at your school are in the best position to work with you, but alas many teachers have told me that such collaboration is not possible at their school, as their colleagues are not interested, or (in small schools) they have no colleagues. In that situation, you'll need to develop offsite collaborations, perhaps through the MTBoS. But remember: you do not have to do all this at once. Get started now, do what you can, and do a little more each year. One step at a time. As soon as you start this forward motion, you'll start to see the signs of improvement in your classroom.

Societal pressures often push in the opposite direction, relentlessly. By resisting the bean-counting culture and by trusting that your students can enjoy learning, you can help them gradually move from extrinsic to intrinsic motivation. Few things are more depressing to me than educators who have given up on that, and choose instead to treat their students as if they were programmable devices, or customers haggling over grades at a flea market.

The beauty of the subject matter, the power of the ideas, the thrill of problem solving: the same things that motivate you to learn can work for your students, but only if you avoid falling in the assessment trap.

Table of Contents

1. Legitimate Uses Of Assessment

2. Problematic Uses of Assessment

3. The Meaning of Grades

4. De-emphasizing Grades

5. The Perils of Backward Design

6. Assessment Strategies

7. Forward Design