Monday, December 15, 2014

Evaluating students' evaluations

Student evaluations are an important part of the undergraduate experience at most universities. But how effective are they?

You would think that universities might have studied this question and applied critical thinking and evidence-based reasoning to the question. You might think that the popularity of student evaluations at universities is largely because they have proven to be reliable indicators of teaching effectiveness.

Think again. The pedagogical literature contains dozens of studies indicating that student evaluations are not reliable indicators of teaching effectiveness. Here are some recent papers that show you what the experts are thinking and what kind of evidence is being published.

Gormally, C., Evans, M. and Brickman, P. (2014) Feedback about Teaching in Higher Ed: Neglected Opportunities to Promote Change. CBE-Life Sciences Education 13, 187-199. [doi: 10.1187/cbe.13-12-023]
Despite ongoing dissemination of evidence-based teaching strategies, science teaching at the university level is less than reformed. Most college biology instructors could benefit from more sustained support in implementing these strategies. One-time workshops raise awareness of evidence-based practices, but faculty members are more likely to make significant changes in their teaching practices when supported by coaching and feedback. Currently, most instructional feedback occurs via student evaluations, which typically lack specific feedback for improvement and focus on teacher-centered practices, or via drop-in classroom observations and peer evaluation by other instructors, which raise issues for promotion, tenure, and evaluation. The goals of this essay are to summarize the best practices for providing instructional feedback, recommend specific strategies for providing feedback, and suggest areas for further research. Missed opportunities for feedback in teaching are highlighted, and the sharing of instructional expertise is encouraged.
Osborne, D. and Janssen, H. (2014) Flipping the medical school classroom: student acceptance and student evaluation issues (719.4). The FASEB Journal 28, 719.4.
Flipping the classroom has generated improvement on summative exam scores in our first-year medical students; however, faculty members adopting this teaching methodology often receive lower satisfaction rating on student evaluations. In previous years, these same professors received outstanding evaluation ratings when the same topics were presented using standard didactic lectures. We feel that this decreased student satisfaction may be result of two distinct causes. First, students who have been accustomed to didactic lectures often come to class unprepared and therefore are incapable of the critical thinking and problem solving skills need in the flipped classroom. Second, the evaluation tool which was appropriate for didactic lectures is inappropriate for the analyzing the flipped classroom methodology. The student evaluations have improved in that last several years; however, the transition was not accomplished without difficulty. Anyone planning to pursue this teaching approach should be prepared to weather the storm of sub-standard student evaluations and, if possible, prepare their administrators for this potential outcome. Our experience suggests that faculty persistence targeted at changing student culture and expectations can help in this process. Accurately determining student’s acceptance of this relative new teaching methodology is important. Improvements in the teaching methodology can be made only when the evaluation tool is valid. It is felt a new evaluation tool can be developed based on results obtained in student focus-groups coupled with cognitive assessment outcomes.
Wilson, J.H., Beyer, D. and Monteiro, H. (2014) Professor age affects student ratings: halo effect for younger teachers. College Teaching 62, 20-24. [doi: 10.1080/87567555.2013.825574]
Student evaluations of teaching provide valued information about teaching effectiveness, and studies support the reliability and validity of such measures. However, research also illustrates potential moderation of student perceptions based on teacher gender, attractiveness, and even age, although the latter receives little research attention. In the present study, we examined the potential effects of professor age and gender on student perceptions of the teacher as well as their anticipated rapport in the classroom. We also asked students to rate each instructor's attractiveness based on societal beliefs about age and beauty. We expected students to rate a picture of a middle-aged female professor more negatively (and less attractive) than the younger version of the same woman. For the young versus old man offered in a photograph, we expected no age effects. Although age served as a detriment for both genders, evaluations suffered more based on aging for female than male professors.
Blair, E. and Valdez Noel, K. (2014) Improving higher education practice through student evaluation systems: is the student voice being heard? Assessment & Evaluation in Higher Education, 1-16. [doi: 10.1080/02602938.2013.875984]
Many higher education institutions use student evaluation systems as a way of highlighting course and lecturer strengths and areas for improvement. Globally, the student voice has been increasing in volume, and capitalising on student feedback has been proposed as a means to benefit teacher professional development. This paper examines the student evaluations at a university in Trinidad and Tobago in an effort to determine whether the student voice is being heard. The research focused on students’ responses to the question, ‘How do you think this course could be improved?’ Student evaluations were gathered from five purposefully selected courses taught at the university during 2011–2012 and then again one year later, in 2012–2013. This allowed for an analysis of the selected courses. Whilst the literature suggested that student evaluation systems are a valuable aid to lecturer improvement, this research found little evidence that these evaluations actually led to any real significant changes in lecturers’ practice.
Braga, M., Paccagnella, M., and Pellizzari, M. (2014) Evaluating students’ evaluations of professors. Economics of Education 41:71–88. [doi: 10.1016/j.econedurev.2014.04.002]
This paper contrasts measures of teacher effectiveness with the students’ evaluations for the same teachers using administrative data from Bocconi University. The effectiveness measures are estimated by comparing the performance in follow-on coursework of students who are randomly assigned to teachers. We find that teacher quality matters substantially and that our measure of effectiveness is negatively correlated with the students’ evaluations of professors. A simple theory rationalizes this result under the assumption that students evaluate professors based on their realized utility, an assumption that is supported by additional evidence that the evaluations respond to meteorological conditions.
Stark, P.B. and Freishtat, R. (2014) An Evaluation of Course Evaluations. [PDF]
Student ratings of teaching have been used, studied, and debated for almost a century. This article examines student ratings of teaching from a statistical perspective. The common practice of relying on averages of student teaching evaluation scores as the primary measure of teaching effectiveness for promotion and tenure decisions should be abandoned for substantive and statistical reasons: There is strong evidence that student responses to questions of “effectiveness” do not measure teaching effectiveness. Response rates and response variability matter. And comparing averages of categorical responses, even if the categories are represented by numbers, makes little sense. Student ratings of teaching are valuable when they ask the right questions, report response rates and score distributions, and are balanced by a variety of other sources and methods to evaluate teaching....

  1. Drop omnibus items about “overall teaching effectiveness” and “value of the course” from teaching evaluations (SET): They are misleading.
  2. Do not average or compare averages of SET scores: Such averages do not make sense statistically. Instead, report the distribution of scores, the number of responders, and the response rate.
  3. When response rates are low, extrapolating from responders to the whole class is unreliable.
  4. Pay attention to student comments—but understand their limitations. Students typically are not well situated to evaluate pedagogy.
  5. Avoid comparing teaching in courses of different types, levels, sizes, functions, or disciplines.
  6. Use teaching portfolios as part of the review process.
  7. Use classroom observation as part of milestone reviews.
  8. To improve teaching and evaluate teaching fairly and honestly, spend more time observing the teaching and looking at teaching materials.


  1. We find that teacher quality matters substantially and that our measure of effectiveness is negatively correlated with the students’ evaluations of professors. A simple theory rationalizes this result under the assumption that students evaluate professors based on their realized utility, an assumption that is supported by additional evidence that the evaluations respond to meteorological conditions."

    Forgive me, I'm a little slow. Did the Braga et al. article just say these things?
    1) the more popular a teacher, the less effective they were.
    2) the easier the grade ("realized utility," the more popular the professor.
    3) and the weather also affected evaluations.

    I looked at the calendar, it's not April 1.

  2. I only realized that I could actually read the evaluations for the classes I have TA-ed online after I graduated. Reading them was eye-opening to say the least....