Friday, May 18, 2012

On the Significance of Student Evaluations

I been interested in the efficacy of students evaluations for a long time [Student Evaluations] [Student Evaluations] [Student Evaluations Don't Mean Much].

There's an extensive pedagogical literature on the issue [Student Evaluations: A Critical Review] [New Research Casts Doubt on Value of Student Evaluations of Professors] [Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors] [Evaluating Methods for Evaluating Instruction: The Case of Higher Education] [Of What Value are Student Evaluations?] [Why Student Evaluations of Teachers are Bullshit, Some Sources ].

Much of this literature points out what is obvious to any university instructor; students evaluations are not very good at measuring teaching effectiveness.

Most of you won't have time to read several thousand papers in the pedagogical literature but that doesn't mean you should ignore the conclusions of the experts.1 After all, we are scientists so shouldn't we base our policies on evidence? If there's no evidence to support the claim that student evaluations measure teaching effectiveness then why in the world would university departments use them in tenure decisions, promotions, and salary increases? And why would universities use student evaluations almost exclusively to give out teaching awards?

In order to get some feel for one of the problems with student evaluations, I suggest you read an article by Stanley Fish in the New York Times (2010): Deep in the Heart of Texas. And here's an opinion piece by "Dr. Myron L. Fox," famous for the Dr. Fox Effect—the pedagogical equivalent of the Sokal Hoax.

No longer is it possible for an intelligent, well-informed person to believe that summative student evaluations accurately measure student learning or teaching effectiveness. But the widespread, almost universal use of summative student course evaluations of teachers, together with the nonchalant manner in which professors ritualistically distribute them at the end of every semester, can give the impression to both students and professors that the majority of professors support the idea that course evaluations are good indicators of how well the teachers are teaching and of how well their students are learning. The fact is that the majority of professors, across all disciplines, believe that summative student evaluations aren't good or fair measures of such things (see Birnbaum; Crumbley). This is one of academia's dirty little secrets. Dirty, because this belief, though well-founded, manifests a cynicism about contemporary higher education that tragically lies right at the core of the educators' relationship and interaction with students. It is secret, because - outside of a specialized academic literature which virtually preaches to the choir - the matter is almost never discussed candidly, and because many professors - believers and unbelievers alike - so often actually pay lip service to the view that student evaluations are valuable in these ways. The unbelievers who talk the orthodox talk do so mainly to keep their jobs or simply to avoid rocking the boat. But many have spoken out. They come from every department and every type of institution large and small, research-oriented and teaching-oriented. Below are some examples. Most of them warn that if summative course evaluations continue to be used in an attempt to measure teaching-effectiveness, then academic standards and ideals will be, perhaps irreparably, subverted: grades will continue to inflate and college courses will continue to be dumbed down and excellent professors will continue to be denied tenure and promotions, or jobs altogether, simply because they failed to pander to the increasing desire of students to be entertained or at least to be relieved of the hard work that genuine higher education requires; for there are many who are willing - if not happy or eager - to cave to these pressures and the job market in academia remains tight and the competition high.
The CAUT (Canadian Association of University Teachers) is one group of academics who recognize the problem. They have fomulated a Policy on the Use of Anonymous Student Questionnaires in the Evaluation of Teaching
  1. Any procedure initiated by the administration or the senior academic body to evaluate teaching performance, including any proposal to employ anonymous student questionnaires, should have the agreement of, or have been negotiated with the academic staff association, and should be incorporated in the collective agreement or faculty handbook. Academic staff associations should be aware, when negotiating the use of student questionnaires, that anonymous student evaluations of teachers may serve as vehicles for transmitting popular misconceptions, expectations and prejudices, to the disadvantage of, for example, women and visible minorities. Such procedures should be fair and include an appropriate procedure for an academic staff member to comment on any set of ratings and to contest any assessment or decision made on the basis of those ratings. Academic staff associations should provide expert advice and counsel to academic staff members in reviewing their own results, and should also support academic staff members in whose cases student ratings are being used inappropriately.
  2. Procedures for the evaluation of teaching should take into account all relevant sources of information about teaching. Anonymous student ratings should never be the primary measure of teaching performance. Rather, the systematic use of a teaching dossier should be encouraged. Unless negotiated as discussed under Article 1, results of anonymous student ratings should be placed in that dossier only with the consent of the academic staff member.
  3. Surveys of student opinion about teaching should not be characterized or described as if they measure teaching effectiveness. While students are uniquely placed to comment on their own reactions to what happens in the classroom, they are not in a position to assess all of the components of teaching effectiveness.
  4. In post-secondary institutions where the results of student surveys are considered to be part of the individual's confidential personnel file, the results of such surveys should be accorded the same degree of protection as students' academic records. When student comments and/or survey results are published, they should not be included in the personnel file.
  5. Where/when student organizations conduct anonymous student surveys and publish the results in order to assist students in the selection of their courses, academic staff participation should be optional, and no penalties direct or indirect should follow a refusal to participate. Such student-organized evaluations should not be used by post-secondary institution administrations as a means of assessing teaching performance.
Wouldn't it be nice if universities in Canada (and elsewhere) adopted this policy? Wouldn't it be nice if they went even further? Wouldn't it be nice if my own department (Biochemistry) acted like scientists?


1. I hang out with some of these experts when I go to conferences on university education. It's quite rare to find someone who defends student evaluations. Many of my colleagues at these meetings are teaching professionals who have suffered from college administrators who use student evaluations to make hiring and firing decisions.

29 comments:

  1. Several years ago, I looked up the student evaluation report on Lehigh's eminent IDiot, Prof. Michael Behe. As I recall, he was awarded a 5, the highest score attainable.

    ReplyDelete
    Replies
    1. For all we know, he may be an excellent teacher of introductory biochemistry.

      Delete
  2. The real problem is that we don't have ANY measures of 'teaching effectiveness' that most teachers and learners would agree on. Given the absence of objective measures, I don't think we should assume that the teachers are necessarily better judges than the students.

    ReplyDelete
    Replies
    1. Here in Ecuador we see teachers fighting any kind of evaluation procedure, because so many teachers are poorly trained and would not survive evaluation. Some of the proposals in the CAUT policy sound very much like the proposals of the Ecuadorian teachers, and do not seem very reasonable. For example, "Unless negotiated as discussed under Article 1, results of anonymous student ratings should be placed in that dossier only with the consent of the academic staff member." How is that reasonable? Even if ratings are biased or systematically unfair, it should be possible for a good administrator to learn how to interpret them properly, or at least extract some useful information form them. If a teacher mumbles through the class and frequently makes serious mistakes and is often late and fails to grade homework on time, his or her evaluations will probably say so. Should that teacher have the right to keep these evaluations out of his or her dossier?

      Delete
    2. A lot of mud has been thrown on standardized tests as tool for doing so, and in the form they are used in the USA, justifiably so. But one should not be quick to dismiss standardized tests in principle as bad way of assessing learning - the problem right now is not with using tests, but with the tests being horrible at measuring learning. Nothing prevents us from designing tests that assess actual knowledge and thinking ability and either can not be "taught to" or teaching to them would actually be a good thing. I have taken many horrible disgusting stupid tests, but I have also had the fortune to take a few that it was real pleasure to take, that stimulated my thinking, that I could have in no way taken successfully if I didn't know the material and in the same time if the material discussed in class was all I knew, I would have failed them. So they do exist, but my bet is most people haven't seen them in their experience, they are usually very hard to design and if such tests were given, the fail rate would probably be somewhere in the 90% neighborhood, which is not in anyone's interest at present.

      Delete
    3. Rosie said,

      I don't think we should assume that the teachers are necessarily better judges than the students.

      I agree that we should not assume this but I think you go too far. There really are teachers who are experts in education and their opinion is likely to be meaningful. There really are scientists who know the fundamental principles and concepts that should be taught in an introductory course so their judgment can't be summarily dismissed as worthless.

      I may be entirely off base but I think I'm more capable than the average student of evaluating the content of a biochemistry course.

      Delete
    4. Instructors are certainly better at evaluating the content of their courses, but good teaching is a lot more than content. I think we're not very good at imagining our courses from the students' perspective, especially since we were almost certainly very atypical students.

      Delete
    5. I think we're not very good at imagining our courses from the students' perspective, especially since we were almost certainly very atypical students.

      That may be true for beginners but after a few decades you can kind of get the hang of it. If you care.

      The real problem is that students are not very good at imagining the course from the instructors perspective, especially in the first two years.

      We've probably all had the same experience. Students in 4th year courses are a heck of a lot better at providing good feedback. And they're a whole lot better at appreciating some kinds of professors—like the ones who don't get good student evaluations in large introductory courses.

      Delete
  3. If there's no evidence to support the claim that student evaluations measure teaching effectiveness then why in the world would university departments use them in tenure decisions, promotions, and salary increases?
    Easy - otherwise they'd actually have to do some work - i.e. send people to lectures to make sure they're of quality, follow up with previous students, maybe correlate success in your course to success in more senior courses, etc.

    But why do that, when you can ask the most biased group of individuals (except, perhaps, for the professor her/himself) what they think.

    I agree with Rosie - there is no objective measure.

    ReplyDelete
  4. Well, it should be obvious from first principles that student evaluations would be at best useless, and possibly even anti-correlated with good teaching. It just follows from the basic motivations of the participants in the process.

    Students in our society go to school so that they get into college, and then go to college so that they can get a well-paying job (which may require additionally going to med/law/business school school, whatever) and that's the reason they want to get good grades, because it makes the chances of achieving that career advancement better. That's a very different motivation from actually wanting to learn something, and if that's your motivation, your objective will be to maximize grades earned while minimizing the effort expended on earning them and actually learning something is not something you care much about. But there is no such thing as the teacher-magician who can instill real knowledge into students without the students making any effort, so it directly follows that in that kind of motivational environment there will be no positive correlation between what the students think about the course, and what the course really is. And that's without even going into the discussion on whether students can actually accurately assess professor performance if they cared about learning; even though it should be obvious that if you don't know the stuff then you can't judge how well it is being taught to you, but it is precisely because you don't know that you are in class to learn it, that's a pointless discussion because we don't live in a world in which students care about learning.

    ReplyDelete
  5. I always laugh at the question on most of these evaluations that goes something like "Rate this professor's knowledge of the subject matter." How the hell are students qualified to determine this?

    ReplyDelete
  6. Of course student evaluations are a terrible measure, but it's too extreme to argue that they're worse than nothing. The flavour of this discussion seems to be in the direction of just removing student evaluations rather than which alternative it can be replaced with - I find this worrying.

    Fact: some lecturers really are terrible, to the point that the best way for the majority (perhaps even all) of their students to learn the material they teach is to avoid all of their classes and spend the time doing self-study. This is understandable because most lecturers have no formal training in lecturing, were appointed primarily based on criteria such as their publication and fundraising record (which has nothing to do with lecturing) and are evaluated for promotion based on these same criteria.

    Fact: re-allocating research time towards lecture preparation tends to have a _huge_ impact on the quality (and usefulness to students) of lectures. But student evaluations are pretty much the _only_ measure of the performance of a typical academic on which this has a positive impact. On the whole, the system encourages academics to minimize time spent on lecture preparation. If teaching is important, you shouldn't criticize the only incentive to improve it (even if deeply flawed) without proposing a better (and practically feasible) incentive with which to replace it.

    ReplyDelete
    Replies
    1. konrad says,

      Fact: some lecturers really are terrible, to the point that the best way for the majority (perhaps even all) of their students to learn the material they teach is to avoid all of their classes and spend the time doing self-study.

      I'm sure this is true. What does it have to do with student evaluations? Are you implying that the only way to detect a bad teacher like this is to have student evaluations?

      Delete
    2. I'm saying it may be the only way that's actually in use in a given environment. In which case removing it without a sensible replacement would be bad.

      Delete
  7. One way of removing the correlation between student evaluations and course difficulty would be to use one person to set the curriculum and tests/exams and another to do all of the student interaction (and use student evaluations only for the latter). That way the adversarial relationship between lecturer and students is removed and helping students learn more rather than less should get positive evaluations. Of course there are many potential problems in such a setup (e.g. excessive "exam coaching" would still be rewarded), but it should be worth trying - all it needs is for a pair of lecturers with similar teaching philosophies and knowledge of each other's course content to agree to exchange exam-setting duties.

    ReplyDelete
  8. Re the cartoon with the premed student: this points to a much deeper problem, which academics (in science, anyway) don't seem to want to admit. The system is funded by the economy, for the purpose (ultimately) of benefitting the economy. That primarily means career training for professionals, and specifically for professionals we need many of (medical practitioners, engineers, computer programmers; NOT scientists). We don't have to like it, but that's the main reason why funders (both government and private) subsidize universities, and why students pay fees.

    A large part of the argument against student evaluations is that students are evaluating whether _their_ goals are met ("does this course prepare me for my planned professional career?"), rather than those of the lecturer ("is the student learning science / gaining a general education?"). But isn't the problem the mismatch between the service universities see themselves as
    providing vs the service the funders and clients (students) would like them to provide?

    ReplyDelete
    Replies
    1. No, the cartoon with the premed student points to a much deeper problem: premeds.

      Delete
    2. konrad says,

      The system is funded by the economy, for the purpose (ultimately) of benefitting the economy. That primarily means career training for professionals, and specifically for professionals we need many of (medical practitioners, engineers, computer programmers; NOT scientists). We don't have to like it, but that's the main reason why funders (both government and private) subsidize universities, and why students pay fees.

      The primary reason for funding research universities has very little to do with undergraduate education.

      The vast majority of undergraduates are NOT training to be physicians, engineers, or computer technicians and that's just the way it should be. The purpose of an undergraduate education is to teach students critical thinking, healthy skepticism, logic etc.

      I agree that many students don't understand why they are at university and they complain that the system isn't meeting THEIR objectives, but that's not a reason to accommodate them. Instead, we should be working hard to teach them the real value of knowledge and education.

      Delete
  9. Student evaluations may not be a good measure of teaching effectiveness, but surely they're measuring something---there's too much consistency over time for them to be completely arbitrary. Rather than throw them out, I'd like to know what they actually mean, especially when I see instructors in my department getting really high ratings in the hardest undergrad courses while I'm getting below-average ratings in upper-level courses.

    ReplyDelete
    Replies
    1. Student evaluations may not be a good measure of teaching effectiveness, but surely they're measuring something---there's too much consistency over time for them to be completely arbitrary.

      Did you read any of the references I gave you?

      Delete
  10. One example of how student evaluations can fail: Some years ago when I was chair of a biology department, one faculty member constantly received very low evaluation scores. Indeed, they were the lowest of anyone in the department. One year we conducted a survey of departmental graduates for the previous five years. One question asked former students to list the instructors from whom they learned the most. Yes, the same professor that received the lowest scores on evaluations at the end of the course was ranked at the top! How could this be? That instructor taught entirely by the Socratic method. A question in lab or lecture was usually met with a question, etc. For most students, the method drove them up the wall, but later they realized that they had been pushed to use critical thinking, etc., much to their long-term benefit. That instructor for many years also trained Peace Corps volunteers in one of the most successful programs the Corps has had. His teaching methods worked very well for the Corps as well.

    ReplyDelete
  11. I'd guess that if one did a hypothetical PCA or multiple regression analysis the students actual grade in the course vs. the grade they expected to get would be the first component explaining student evaluations. And the second one would roughly be more or less how much they 'liked' the professor. Unfortunately most of the forces (K-16 and other environmental sources) influencing students mislead them into making them think they have something useful to say on the quality of their teacher. It also tends to puff up their egos and makes it very easy to rationalize their mistakes (whether laziness or simple misunderstanding) as those of the professor/ teacher (it's not my learning style, he/she is boring...). Compare that to the attitude and behaviors expected of athletes in college (not in the classroom). Hours of essentially professional level training and cut throat competition. An expectation generally that if you aren't where you want to be athletically you work harder, not blame the coach.

    ReplyDelete
  12. Student evaluations do serve a purpose and I use them to determine what the students found most effective/least effective and make revisions accordingly. I am not talking about the content, but how the course is run. For example, I used to give in class essay examinations but based on student feedback I have now switched to take home essay exams. This gives the students more time, so I can ask more extensive questions, the quality of the responses is much improved as the students are not cramming to get their thoughts on paper as possible as rapidly as possible, and the grades are still well distributed. It's actually much easier to evaluate student comprehension, although it takes substantially more time on the students and my parts.

    Regardless, there are ways to assess teaching effectiveness. For example, in the advanced classes the instructor knows whether the students are actually prepared in the introductory courses. Those students who received As and Bs, should actually be prepared. If they are not, the introductory course did not effectively teach. We could track the % of students to be successful after graduation (this could be compared to student GPA, other colleges, majors, universities, etc.). Instead we focus on time to graduation (increasing the pressure to push students through), student GPAs (increasing the pressure to give good grades), and student evaluations (increasing the pressure to cater to the students expectations and give good grades).

    ReplyDelete
  13. What if evaluations weren't anonymous? Names would be hidden from instructors, of course, but they'd be matched to grades in the class, overall GPA, and maybe GPA in the major and courses required for the major. Students who do well in a course may well be able to distinguish good from bad teaching. They'll surely recognized and comment on the really unprepared teachers. Students who do badly in a course often fail to distinguish the quality of their studying from the quality of the teaching.

    ReplyDelete
    Replies
    1. I think that anonymous evaluations are inconsistent with what a university is all about. By the time you get to university you're supposed to stand up for what you believe in and defend it.

      Delete
  14. Evaluations can be easy to influence in ways other than dumbing down the course (or inflating the grades). After I got evaluations suggesting I didn't care about helping students and wasn't available, a colleague recommended that at every class I tell the class I was available, how to contact me, or that I wanted to help. It worked. I'd been interested and available before and that didn't change (and neither did the rate of students asking my help) but my scores improved greatly. :)

    ReplyDelete
  15. I see student evaluations as something akin to democracy - the students/voters don't have enough information/knowledge at their disposal to judge what is best but they're reasonably good at spotting what's really bad. As long as you're not in the "really bad" category, you're probably alright. That's certainly how they're used at my University: we just need to make sure our scores are not too low.

    That said, I still pay attention to my own scores and try to work out why scores are good or bad. There are many factors, not least the subject itself, but we always include a space or free text comments to get actual useful feedback (if we're lucky).

    I also agree that paying too much attention to them is not just inaccurate but can be really counter-productive. What students want is not always the same as what's good for them and rating lecturers according to student evaluations makes them worry too much about the former and not enough about the latter. (I'd be lying if I said that I had never made a decision that was partially motivated with student evaluations in mind

    ReplyDelete
  16. I am new to this topic but have recently began to learn more about perspectives on student evaluations. Your post and referenced links are interesting. Thanks for sharing.

    Since paying more attention to this topic, I also saw a blog post referencing research on the validity of student evaluations from The IDEA Center: http://theideacenter.org/ideablog/2012/05/what-does-research-tell-us-about-student-ratings. I'm curious if you have read this and what your thoughts are?

    ReplyDelete
  17. Back when my wife and I were professors, we ran across a couple of studies (which we've since misplaced, due to to several moves) which indicated that, on the canonical 1-5 scale, teaching a required service course to non-majors was worth about a point, teaching a required course to majors was a half point, being other than a white male was also a half point.

    As a teaching assistant, I was, based on the written evaluations also downgraded on my beard, the fact that the room was unairconditioned, my hoarse voice (after three hours of lecturing), and how boring the professor was.

    ReplyDelete