“Not everything that counts can be counted, and not everything that can be counted counts.” Albert Einstein
A first-generation college student stays after class to talk with his professor about something he found particularly interesting, even though this topic will not be on the test. An introverted student slowly gains enough confidence to start raising her hand and speaking her mind. A group of formerly disengaged and apathetic students are outraged over the injustice of a particular policy and become personally and professionally committed to righting this wrong.
Many of us teachers live for such moments, when for whatever usually intractable combination of factors, we help inspire our students to open their eyes a bit wider, take ownership of their education, and connect their academic learning to the larger world outside their hermetic bubble of tests and grades and program requirements. Yet we have created an educational system that increasingly evaluates and even prides itself on the basis of that which can ostensibly be “objectively and rigorously” standardized and quantified. Consequently, we often ignore or discount some of our greatest achievements, and design our courses and programs around a series of assessment goals whose main attributes are that they can be measured. Thus after the relevant academic committees and outside accrediting agencies have spoken, our initial passion to, say, increase our students’ general interest in and knowledge and enjoyment of music and the fine arts has been transformed and codified into a series of trite bullets such as “students will attend three approved and verifiable concerts and two art exhibits by the end of the semester.”
The desire to quantify and assess educational effectiveness is often paved with good intentions. Because there are poor teachers, pointless classes, and unsuccessful programs out there, we need some way of evaluating what we are doing, rewarding and expanding the best approaches, and weeding out or improving the worst. Moreover, because we are already too busy and too stressed trying to get our “real” work done, the last thing most of us want is to have to devote even more precious time and energy towards onerous and potentially divisive new assessment procedures. Hence we accept standardized tests and guidelines that are quick, easy, and statistically comparable across grade levels, disciplines, and institutions.
This mentality increasingly permeates our assessment procedures even when we are evaluating “alternative” assignments such as capstone projects and applied internships. We dutifully develop universal grading rubrics capable of magically transforming any product or activity into a series of numbers that can be efficiently analyzed and compared. Thus in place of a more personalized and nuanced “subjective” assessment of, say, the actual quality of a thesis project, we subject it to a checklist of objectively quantifiable criteria: “The student produced a paper (double-spaced, 12 point font, one-inch margins at the top, bottom and sides) between 25 and 30 pages long;” “The student followed the approved bibliographic format and cited at least 15 peer-reviewed papers;” “The student dressed appropriately, maintained eye contact with the audience, spoke clearly, and finished within the allotted time period.”
Inevitably, we encounter the student whose work is dreadful, yet somehow manages to fulfill 94% of the items on the checklist and thus has technically earned an “A.” Or conversely, if we are lucky, we get the oddball student who produces a brilliantly original, creative, and insightful project that the spreadsheet says is a “C-.” So we groan and revise the rubric one more time. After too many hours of mind-numbing meetings, we decide to add some additional criteria such as “The student’s work was original, creative, and insightful.” Then we appoint a task force to articulate the official definition of each of these qualities to facilitate their subsequent objective quantification.
The net result of this process is that we wind up with either
1) complex and time-consuming rubrics that generate results at least as variable as the more holistic, subjective assessments they replaced, or
2) rubrics comprised of the kinds of concrete yet meaningless criteria that can be consistently assessed and quantified by any half-functional idiot or semi-intelligent machine.
Whether it bubbles up from within academe or is shoved down our throat from without, our increasingly fervent worship of the god of standardized assessment is leading us astray. It is not making the good in education better, or weeding out or improving the bad. On the contrary, it is cheapening our work and suppressing our students’ and our own individuality and passion, and causing us to at least implicitly design our programs around the checklists and teach to the rubrics. Consequently, despite our best intentions, we wind up devoting too little time and energy towards cultivating the kinds of skills and attributes we like to claim education is all about, such as critical thinking, integrity, curiosity, tolerance, creativity, and service.
The pursuit of objectivity in educational assessment was a subjective decision that biased our subsequent thinking and activities towards that which could be standardized and quantified. The time has come to deliberately begin replacing the present high-stakes, big bucks standardized assessment landscape with a more organic cottage industry of wonderfully diverse, qualitative, and subjective approaches tailored to the specific institutions and situations they will serve. For example, some might choose to assess the effectiveness of their courses and programs by conducting qualitative interviews of their students, alumni, and relevant local community and business leaders. Others might invite outside assessment teams to sit in on their classes, hold candid discussions with their faculty, staff, and students, and simply wander around getting the feel of the campus and its educational culture. The “deliverables” from such activities would undoubtedly provide poor fodder for rigorous quantification, standardization, and competitive ranking systems. However, they just might turn out to actually be highly informative, useful, and even inspiring.