News

‘Deal with the Devil’: Harvard Medical School Faculty Grapple with Increased Industry Research Funding

News

As Dean Long’s Departure Looms, Harvard President Garber To Appoint Interim HGSE Dean

News

Harvard Students Rally in Solidarity with Pro-Palestine MIT Encampment Amid National Campus Turmoil

News

Attorneys Present Closing Arguments in Wrongful Death Trial Against CAMHS Employee

News

Harvard President Garber Declines To Rule Out Police Response To Campus Protests

Professors Call Q Guide "Worthless" Tool for Assessing Courses

By Radhika Jain and Kevin J. Wu, Crimson Staff Writers

Christopher A. Hopper ’13 rates his classes in the Q guide based on how they “make him feel.” If a class is painful or makes him sad, he is more likely to give it a low score.

Madeline S. Peskoe ’14 uses the Q guide as a procrastination tool before exams. She is more likely to give a course a good score if the professor was enjoyable—even if she did not learn as much.

And while Christopher Z. F. Husch ’13 spends twenty minutes filling out the Q guide for each of his courses, he said he has no doubt that most students are not nearly as diligent.

Harvard administrators hope the Q Guide can serve as a fair measurement of student course satisfaction and a credible metric for evaluating teaching quality. But faculty and administrators interviewed for this article almost unanimously admit that the reliability of Q scores is questionable at best. Some even argue that what the Q Guide measures—a student’s satisfaction with a particular teacher or course—is completely unrelated to how much he or she actually learns in the course.

“[The scores] are totally worthless,” said former Dean of the College Harry R. Lewis ’68. “Everybody knows they’re worthless.”

As the Faculty of Arts and Sciences embarks on a campaign to prioritize undergraduate teaching and place itself at the cutting edge of higher education, it continues to lack a definitive system of assessing the quality of the teaching done by its faculty members.

THE OBJECTIVE

In the fall of 2006, former FAS Dean Jeremy R. Knowles charged nine of the school’s faculty members with examining how FAS could better “support and reward a commitment to the steady improvement of teaching.”

The resulting committee—known as the Task Force for Teaching and Career Development—made almost a dozen recommendations in its eventual report. It called for the development of alternative methods of teacher evaluation and asked that teaching performance be given greater consideration in the decisions ranging from the appointment and promotion of junior faculty members to salary raises for tenured professors.

In the publicly available document, the authors of the report admitted that student evaluations were “at best incomplete and imperfect ways to assess the quality and impact of faculty teaching.”

Over five years after the Task Force filed its report, professors say that teaching has indeed taken on a greater significance in the lives of faculty as an important component in performance reviews, promotion decisions, and salary adjustments. But the Q Guide continues to remain the primary means of assessing teaching quality.

“[The Q Guide] has become something on which the administration relies heavily for promotions, and it’s troublesome,” said Ali Asani, chair of the department of Near Eastern Languages and Civilizations. “So much emphasis is put on Q scores as a measure of good teaching that I think it ultimately has a negative effect on teaching.”

Ladder faculty, however, are not the only ones who have reason to be concerned about their Q scores.

The Derek Bok Center for Teaching and Learning rewards, for example, teaching fellows, lecturers, preceptors, and course assistants who receive high Q scores. Nearly 40 percent of the workforce receives some sort of award based on positive student evaluations, according to Bok Center director Terry Aladjem. Teaching fellows who get very low Q scores receive a letter from the Dean of Undergraduate Education Jay M. Harris and receive additional training from the Bok Center.

‘WORTHLESS’

Despite FAS’ reliance on the Q Guide in assessing the quality of teaching in Harvard’s classrooms, many administrators and professors in FAS said that the goals of student evaluations are not in line with the school’s recent and increased commitment to improving the quality of undergraduate pedagogy.

“It’s fairly clear that asking people right at the end of a class, ‘How do you feel about this class,’ doesn’t necessarily answer the questions we are asking: have you learned this well? Has this changed your life?” chemistry department Chair Eric N. Jacobsen said.

In fact, professors said, the current system that students use to rate their instructors and courses falls far short of an objective analysis of teaching quality.

“What if a student is lazy, and the professor goes down on them hard? The student might take out their resentment on the Q Guide,” Asani said.

More significantly, the importance of the Q Guide could skew the way professors approach their teaching.

“For graduate students and junior faculty, for whom these numbers may matter a great deal, it has to unconsciously have an impact on how they grade,” Harris said. “You’re more likely to give someone a 4 or 5, if they’ve given you an A.”

For this reason, some faculty are encouraging a rigorous reevaluation of the system Harvard employs to assess teaching quality—and are questioning the current method, which they said purely evaluates student satisfaction.

“Faculty are just worried about the Q Guide score,” Asani said. “If that becomes the driving force behind teaching, then good teaching has been left behind because then people are teaching to get a better score.”

EVALUATING THE EVALUATION

Finding methods to correlate student evaluations with student learning has yielded abundant scientific literature. But the jury is still out on whether a student’s satisfaction with a course indicates the teacher’s overall effectiveness.

Richard J. Light, a professor at the Graduate School of Education, cited a host of research studies that showed significant correlations between a class’s average test score at the end of the course and the class’s average ratings of the teacher and course overall, suggesting that the Q guide produces some

positive feedback on teaching effectiveness from its polling of s

“The Q Guide is not perfect—it can be improved—but these correlations are nowhere near 0, so it’s worth something,” Light said.

In a separate 2010 paper, researchers found that of students randomly assigned to professors teaching identical calculus courses, those who performed better in the class gave their professors the highest ratings at the end of the semester. But the study also found that higher student ratings were negatively correlated with “deep learning”—the ability to apply material taught in one class to more advanced classes. In other words, the people who rated their classes higher in student satisfaction tended to learn less effectively. In fact, these students sometimes performed less effectively in subsequent classes.

“It’s unknown what precisely [the Q Guide] is measuring,” Dean of Undergraduate Education Jay M. Harris said.

Further, the scientific community is split on the value of these studies due to the difficulty of designing experimental studies in the complex college environment.

“At the college level, there are tons of courses. It is unclear what learning goals are across different types of courses, and there is whole bunch of self-selection going on,” said sociology professor Christopher Winship.

Indeed, students are rating courses on factors that are not necessarily related to what they have learned. A 2006 memo presented to the Task Force on Teaching and Career Development cited studies that identified over a dozen possible sources of bias in student evaluations—including the course’s grading policy, whether it was an elective, its workload, class size, and a category entitled “instructor expressiveness and showmanship.”

In one well-known example, Nalini Ambady and Robert Rosenthal, two psychology researchers, found that the Q Guide scores of Harvard students who have taken a course for a full semester strongly correlate with scores given by a separate group of Harvard students after watching the course instructor lecture for 30 seconds–with the sound off.

“Students may think they’re really answering the question about whether their homeworks were always returned on time, but they’re really just giving their gut feeling on whether they liked the person or not,” Lewis said.

INNOVATING ON ASSESSMENT

Faculty and administrators said that innovation in teaching and learning must also be accompanied by innovation in methods of assessing teaching.

“It will be important for FAS to stay on the cutting-edge of measures of teaching if, indeed, teaching is going to be a basis for decisions about promotions, tenure, and salary,” sociology professor Mary C. Brinton said.

“The mandatory use of the Q scores as evidence that we’re serious about teaching is not credible,” Lewis said.

Many faculty members suggested peer evaluation between professors, a method used at the Harvard Business School and recently adopted by the life sciences division in FAS, as an alternative system.

“You set out the criteria on which they are going to be evaluated. You tell them this. You observe them in the classroom. You give suggestions. You do it again in a year,” Lewis said.

But Jacobsen, who also mentioned peer evaluation as a potential assessment technique, said that faculty members may not be comfortable with collegial feedback on individual pedagogy. “Faculty are very used to peer review in our research, but we’re much less accustomed to peer review in our teaching,” he said.

Faculty members also questioned whether students should evaluate a course immediately upon its completion.

“What is more interesting, perhaps, is to know how students feel several years after the course,” Jacobsen said.

Currently, there is no system in place for retrospective student evaluation. Although annual senior surveys ask members of the graduating class to name their most positive and negative academic experiences, there are no questions about specific classes or professors.

“We have talked in the General Education Committee about trying to bring together students in four-month or eight-month increments—to bring students from a course back, debrief, and get a feel for [the ongoing quality of learning],” Harris said.

In the meantime, the College strongly encourages course instructors to give mid-term evaluations; the Bok Center offers a template to all instructors. Although Harris denied that the Q Guide itself would be administered mid-semester, he said that personalized evaluations given by instructors partway through a term would provide an opportunity for professors to ask more specific questions.

COMPLETE OVERHAUL?

But according to some faculty members, these small fixes fall short of the larger goal of devising a new and better school-wide system of teaching assessment.

“In many ways, you would need a committee of people who would be willing to invest several years to find out what constitutes best practice,” Winship said. “This is not a situation where we need a little marginal improvement.”

For all its flaws, the Q Guide will probably never be overhauled completely, according to Harris. And some faculty agree that it would be more expensive to make changes than leave it how it is.

“It’s easier just to assume the Q is doing a valid job of evaluating teaching effectiveness,” Winship said.

—Staff writer Radhika Jain can be reached at radhikajain@college.harvard.edu.

—Staff wrtier Kevin J. Wu can be reached at kwu@college.harvard.edu.

Want to keep up with breaking news? Subscribe to our email newsletter.