Friday, February 6, 2009

PGR Minutiae

I've been thinking about the PGR, since we've got a new edition on the way in a week or two. I have a few worries about it. Most of these worries concern the manner in which the information is presented, though one is methodological. Also, for full disclosure, I am broadly in favor of the existence of the report, even if I am dissatisfied with some of the things it is sometimes used for. I found it invaluable when I was applying for grad school.

1. I worry about the rating scale the evaluators use contains the necessary structure to permit calculation of the mean scores that the ranking is based on. The issue is fundamentally one about how the prompts are related. You can use numbers representing features of objects to calculate a mean if and only if the numbers are from an interval scale, which means that the points represented by the numbers are equally spaced. A classic example of this type of scale is the five-point Likert scale (strongly disagree; somewhat disagree; neither agree nor disagree; somewhat agree; strongly agree. Likert scales often come with a visual aid that represents the points as being equally spaced). PGR evaluators use a six-point scale with the following prompts: 5: distinguished; 4: strong; 3: good; 2: adequate; 1: marginal; 0: inadequate; I wonder whether the points on this scale are evenly spaced. For example, intuitively, it seems to me that the distance between "distinguished" and "strong" might be greater than the distance between "marginal" and "inadequate." This is important because unevenness in the spacing of the points will result in distortion in and unreliability of the mean scores that are based on them.

2. I worry about the emphasis placed on the ordinal ranking of departments. Leiter, of course, specifically says to attend to the mean scores and not just to the ordinal ranking, but whenever anyone talks about the report, everyone refers only to the ordinal ranking and never to the mean scores. When, for example, we talk about a range of schools ranked on the report, we always talk about the "top 10" or the "top 25", and never "the threes" or "the fours." Attending to the mean scores paints a rather different picture of the report, I think. NYU and Rutgers are the only schools in the high 4s, with 4.8 and 4.7, respectively. The rest of the top 9 is in the low 4s (below 4.5), with Harvard, MIT, and UCLA all tied at 4.0. This means that only two schools in the US round up to "distinguished" and that MIT is merely "strong." The rest of the top 30 are all in the 3s. The mean scores suggest that the difference in quality between NYU and MIT is equal to the difference in quality between MIT and UMass; looking at the ordinal ranking I would have expected MIT to be closer to NYU than UMass—nothing against UMass, but MIT is ranked at #7 while UMass hangs out down at #24, which seems like a huge difference compared to the seven schools between MIT and NYU. That this difference is illusory is entirely the point. The "top-10" category is fairly arbitrary—there is no cohesive grouping of approximately ten of the strongest departments. The reality is that there are about two departments that are close to "distinguished" and then a relatively large number of departments—more than half the departments ranked by the report—in the neighborhood between "strong" and "good." I worry that the ordinal scale we all internalize exaggerates the differences between departments. If I were on the advisory board, I would encourage a move away from it.

3. I worry a little bit about the sensitivity of the mean-score scale. The mean scores are rounded off to a tenth of a unit, but the raters are only allowed to make half-point distinctions. This makes me wonder whether a difference of one tenth of a point can be regarded as statistically significant. Maybe it would be better to round off to the half-point, as the raters are asked to do. It might not be good to present information in a manner that is more precise than the manner in which it was collected.

4. Related to (3), I also worry about the fact that a bunch of relevant statistical data is missing. For example, there is no information on the margins of error or standard deviations. This is important because at heart, the PGR is a poll that is trying to measure aggregate faculty reputations, and all measurement procedures involve some level of imprecision. Information about MoE and standard deviations would help us to better understand these facts. In particular, it would help us understand which differences are trivial and which are significant, and what the levels of uncertainty in the scores are. (Leiter himself suggests that differences of less than .4 are insignificant, and he may be right. But it would be nice to know how he arrives at that figure.)

--Mr Zero


cst said...

I really enjoyed this post. I think you make a number of good points. Thanks for sharing.

Dave said...

I think these are good points, but the worries should be pretty small.

1. If the 'true' distance between 5 and 4 is larger than the other 'units', this is going to penalize departments about which there is a wide range of views. (If one department gets a 5 and a 1 and another gets two 3s, the first should really be ranked higher, if your suspicion is correct.)

2. Agreed -- it would be smarter for prospective grad students to think in terms of the scores and not the ranks.

3 and 4. I think when Leiter says differences of <.4 are 'insignificant' he means to be combining MoE and importance. If you knew for sure that the 'true score' of Dept. A was exactly .1 higher than that of Dept. B, there would be no margin of error but it still would be a difference that should be swamped by other considerations.
Since, as you say, we don't know the SD, we don't have any idea of what's statistically significant. But the N is quite large, remember, so this doesn't seem too worrisome to me.

Anonymous said...

I would really like see Leiter more fully address the issue of per capita quality among faculty members. It seems to me that departments with under 15 faculty members get the shaft when it comes to the rankings.

Specifically, I worry about the following hypothetical scenario: Department A has 14 faculty members representing a wide range of philosophical views, and the average quality of the faculty is, say, 1.5. Department B has 25 faculty members representing the same range of philosophical views, with average quality 1.2.

On Leiter's methodology, it seems to me that Department B would be ranked higher than Department A. But that doesn't seem prima facie correct. I'd say a graduate student would probably be better served, ceteris paribus, by A.

Leiter did briefly address this when he discussed Johns Hopkins last week, but I'd like to see more accounting for this.

Anonymous said...

Perhaps the best post I've seen in any philosophy blogs!

Anonymous said...

Totally agree with 12:50 -- actual substantive criticisms of the PGR are rare in the blogosphere!

Anonymous said...

Agreed. I'm applying to PhD programs this year, and when you move past the thrill of saying on blog posts 'I'm in a top 25 department' (which means you're in a 21-25 department, otherwise you'd say top 20) and look at the mean scores - which as everyone knows don't even measure everything that matters, like placement, mentoring quality, teaching quality, etc. - the rankings begin to seem more arbitrary than they're treated.

One thing I'd like to se addressed - given the growing importance of M.A. programs as viable options for undergrads - is the M.A. rankings. People now like to say 'I'm applying from a top MA,' by which they mean a Leiter-sanctified M.A. Leiter may be right, but it seems that this is an area of the PGR that many students take seriously, and yet the 'rankings' seem to be based on the vote of one, or of unnamed advisory board members.

Anonymous said...

Best. Post. Ever.

Anonymous said...

These comments from Zac Ernst are also helpful:

Judy Garland said...

Anon 8:22 -

I doubt if most people who claim they are in a "Top --" program are actually in one at all. I am sure that some are, but the internet is fairly antipodal to Wonder Woman's truth lasso. ;)

Anonymous said...

Judy Garland,

True enough. But on the internets, seems that top M.A. just means Leiter-mentioned, and the effect of this is that the Leiter programs might get the lion's share of top applicants. And thus the influence of this small part of the PGR is bigger than usually noted. Since the purported justification of the PGR is giving applicants knowledge (starting to think this is about 1/5th of the justification), why not be thorough and put this section to a survey?
But maybe I'm overestimating the importance of M.A. programs. If so, I'd request that Leiter not speculate about 'top' M.A.s, and leave it to the programs to post their PhD placement stats.
And, if one buys Zac Ernst's criticisms, then a PGR consideration of M.A.s would be all for the worse.


cst said...

If there really are potential flaws in PGR's methodology, perhaps someone needs to hire a specialist/consultant in surveys and survey methodology. Such a person would be able to determine whether there are any flaws and, if so, how serious they are. Then he/she could make constructive suggestions concerning how to improve the report's methodology. It seems that the APA should have interest in doing this. And since I think the PGR plays a valuable function,I'd be willing to contribute to such a purpose.

Anonymous said...

These are minutiae compared with the bigger problem, which is pointed out by Zac Ernst in the link above: Leiter's sampling methodology is to ask his friends! And they invite their friends, and so on. "Snowballing" indeed.

Anonymous said...

YES!! I think there should be an APA committee on this. How would we make that happen?

Anonymous said...

Zac Ernst misdescribes the methodology if he describes it as asking 'friends' who ask 'friends.' Leiter nominated most, not all, of the philosophers on the Advisory Board; most are not his friends (I've talked to several, and I will bet most have not even even met Leiter). Members of the Advisory Board nominate evaluators in their area of expertise who they believe are knowldgeable about the field. The 300-plus evaluators represent a diverse array of fields, and it's just crazy to describe them as Leiter's 'friends.' Ernst doesn't seem to realize the difference between a poll whose aim is to predict who will win a presidential election and a survey whose aim is to identify the best philosophy faculties.

Soon-to-be Jaded Dissertator said...

Now, now, now, Anon. 2:09, I think Anon. 9:21 was being a little loose with his choice of words there and with the description of Zac Ernst's work.

I doubt anyone thinks Leiter is asking 300 of his closest friends, so focusing just on that word choice is a bit of a weak rejoinder to the point of snowball sampling. However, I think you might be onto something in critiquing the choice of analogies by Zac Ernst, but it'd need to be spelled out a little more to see what sort of force they have.

Besides, Mr Zero's post stands on its own two feet and relies on nothing in Zac Ernst's paper. So what about it?

Anonymous said...

Ernst responds to the criticism that the presidential poll is a bad analogy thusly:

I take Professor Leiter’s point that there are obvious disanalogies between conducting a Presidential poll and taking a survey like that done to prepare the PGR. But I think that there is a dilemma here. Either the PGR is a survey, or it is merely the work of a panel of experts, selected for their expertise. If it’s the former, it fails for obvious methodological reasons. If it’s the latter, then there needs to be stringent control and rigorous standard for who counts as an ‘expert’. The PGR fails either way.

Dave said...

I don't get Ernst's criticism, really.

Leiter lists the survey participants. That group is not a 'representative sample' of all professional philosophers in the English-speaking world... but so what? Why would someone be especially interested in the opinions of a representative sample?
There aren't any such things as "stringent control and rigorous standard for who counts as an ‘expert’" in philosophy, at least not insofar as I understand what that means. Some days I wish there were; other days I'm glad there aren't. But I've looked over that list, and I am a whole lot more interested in the opinions of its members than I would be in the opinions of a perfectly representative sample of English-speaking philosophers.

Anonymous said...

I think this post raised some pertinent problems. The general problem is of course the operationalisation of the variables (to speak social sciencese): Leiter wants to measure, AIUI, the quality of the graduate education someone will get at a certain programme, and thus asks people to rate the quality of faculty members. But while that is important, it doesn't include such factors as how well the department is functioning internally: is it all one great big family where there is a lively and friendly intellectual climate, or are you isolated in little groups centred around your supervisors, mired in factional warfare? My department feels like a former type, which is one reason why I'm thoroughly enjoying my graduate education, whereas I've been warned that several departments, which on purely academic merits are stronger, are much more like the latter, and that this makes a difference for how I'm likely to develop as a philosopher.

Anonymous said...

Anon 5:13,

Leiter wants to measure, AIUI, the quality of the graduate education someone will get at a certain programme, and thus asks people to rate the quality of faculty members. But while that is important, it doesn't include such factors as how well the department is functioning internally: is it all one great big family where there is a lively and friendly intellectual climate, or are you isolated in little groups centred around your supervisors, mired in factional warfare?

That's a very odd criticism, given that Leiter writes on his PG pages:

"This Report only measures the philosophical distinction of the faculty, not the quality of their teaching or their commitment to educating young philosophers."

Anonymous said...

"That's a very odd criticism, given that Leiter writes on his PG pages:

'This Report only measures the philosophical distinction of the faculty, not the quality of their teaching or their commitment to educating young philosophers.'

The point, I take it, is that "the philosophical distinction of the faculty" - even if the PGR actually were capable of measuring it, which it does not, as Professor Ernst has shown - isn't the only, or even the most important, factor which prospective graduate students take into account when deciding where to apply. Alternately, if the PGR is directly or indirectly encouraging prospective graduate students to apply solely on the basis of such a myopic criterion, it is doing prospective graduate students and their future colleagues a huge disservice.

One of the best points Ernst makes is that we could arguably do much, much better than the PGR by the lights of its stated purpose. If the goal really is to gauge the overall quality of graduate programs rather than to perform an intra-elite-department survey of elite departments, why not craft an alternative survey that focuses on things that would actually matter to prospective graduate students? And if it turns out that such things just can't be gauged unproblematically through surveys, well, maybe we should just cut our losses instead of holding on to happy illusions like the PGR.

Anonymous said...

The point, I take it, is that "the philosophical distinction of the faculty" [...] isn't the only, or even the most important, factor which prospective graduate students take into account when deciding where to apply.

Obviously. Leiter has made that point many times, so that cannot possibly be a criticism of his contribution.

Alternately, if the PGR is directly or indirectly encouraging prospective graduate students to apply solely on the basis of such a myopic criterion, it is doing prospective graduate students and their future colleagues a huge disservice.

But the PGR is obviously not doing that. Leiter very explicitly cautions against doing so. If there is some 'indirect' way in which his report is encouraging people to do exactly what he cautions them not to do (which I doubt), I don't think it's his fault.

Anonymous said...

I mostly agree with Dave, both about Mr. Zero's original posting and about the Ernst article. Snowball sampling is a legitimate survey method, and its results are especially interesting in thsi context as Dave points out. And Ernst hasn't shown that there is any strategic voting, just that he would not vote honestly if nominated. Ernst's arguments don't come close to establishing his strong conclusions.

Anonymous said...

Here's the thing: I am pretty inclined to trust the specialty rankings. You ask the pre-eminent philosophers of physics to rank departments in their fields, you can be pretty sure that they know who works on that in each department, whether they're still active in the field, etc.

But I would be curious to hear from a philosopher who fills out the survey *what their thought process is* when assigning regular rankings. Do they think, "they've got one guy I've heard of in each area, so I'll give them a high ranking" or "there are three really impressive people there, I'll give them a high ranking" or what? It's not a rhetorical question; I would like to know what questions people take themselves to be answering when they assign numbers (and don't say "what is the philosophical distinction of this faculty?" -- I want to know what you think that comprises).

And speaking for myself (though I am merely an assistant professor), I haven't actually read the work of most of the people in the departments mentioned -- I've probably read something by at least two members of each faculty, and I've probably *heard the names* of many of the others, but the former thing isn't a large enough sample and the latter thing doesn't tell me much. Do you feel like you're in the position of knowing who most everyone on the list is, and what their work is like?

So have at it, senior philosophers, because I haven't actually formed an opinion on whether the PGR is in fact a good measure of faculty distinction, and I'd like to know how you fill out that survey.

Anonymous said...

Say what you will about the negative effects of the PGR. One positive effect, in my opinion, is reducing the influence of undergraduate institution pedigree on perceived graduate program distinction. Without the PGR, I think places like Rutgers and UNC would have a lot harder time attracting graduate students than Ivy League schools would.

Anonymous said...


There once was such a survey. Leiter even featured it on his blog:
Also check this out:

The site is long gone, but I remember reading some of the results, e.g. reports written by current grad students that addressed things like faculty approachability, grad-prof relations, special departmental activities, etc. No idea what happened to the project, seems like even google doesn't know.

colin said...

Ok I just posted (now I have given myself a name!), though the comment hasn't been approved yet, now having read the Ernst I will respond to it.

1. Ernst strikes me as basically correct about a number of issues. Most obviously the PGR should be run by the APA. Why the most important ranking system in our discipline is being run privately completely escapes me. While Leiter may or may not abuse this privilege it is clearly a privilege that is open for abuse. The very fact that the advisory board (whether or not you approve of the members) is chosen solely by him strikes me as problematic and rife with the possibility of social engineering. Obviously the fact that Leiter has such an ax to grind only makes this more evident. Whether or not you agree with his opinions, it's not clear to me that one person's opinions should be so powerful in a diverse professional organization. Bottom line: it would be nice to either be able to vote Leiter out or have his tenure end.

2. If the PGR is meant to broadly reflect the state of the discpline than it seems to me there needs to be a much bigger sample poled. Though again, I agree with Ernst, that we should should probably have someone who is specially trained in data collection figure out how to run the thing. My preference, as stated before, is for quantifiables like publication rates over questions such as "how would you rate this department", but I could be unpersuaded.

Fred said...

What Colin is saying seems in a certain way reasonable, but in an important way crazy. The PGR “should be run by the APA”? What, there should be a law that Brian Leiter can’t conduct a survey? The APA should sue Leiter and Blackwell for control of the web site? Nobody is stopping the APA, or Colin, from putting together a new survey, using whatever methodology they like, and posting it on the web. Leiter is assuming no ‘privilege’ whatsoever. He didn’t get a commission from the APA, he didn’t get a license. He does a lot of work on this stuff, and I’m glad somebody is obsessive enough to want to do it.

Maybe it’s because I’m older than some of the critics, but I look at the PGR very differently. Back in the day, if you were an undergraduate and you weren’t seriously tapped in (maybe you went to a SLAC, maybe you majored in something other than philosophy, maybe you just didn’t pay attention to the academic world when you were in college), and you wanted to go to graduate school in philosophy, you pretty much had to rely on the advice of one, maybe two of your professors. You probably knew they had their biases, and what you didn’t suspect but was probably true was that their answers to your questions were based on impressions, limited and parochial and likely dated. But it was still very helpful. You found out that Pitt was awesome in philosophy, that Yale had some serious problems, stuff like that.

Then Brian Leiter started offering students his own personal advice, too. You could combine it with your advisor’s thoughts. Leiter paid a LOT of attention to the Scene, so you’d take his view seriously. And then he added a big survey, and his advice became even more useful.
But it’s not supposed to be ‘objective’, whatever that means; it isn’t supposed to be a scientifically valid measure of what the average philosopher thinks, whatever that would mean.
So, uh, what’s my conclusion here. I guess it’s this: don’t take this stuff so seriously. And if your deans are taking it too seriously, tell them to stop and explain why.

Anonymous said...

"If your deans are taking it too seriously, tell them to stop and explain why."

Problem is: deans will not listen and neither do provosts nor uninformed faculty on search committees (more common than you might think).

The way many of these people see it is as follows: some data is better than no data. If I can recommend Candidate A (whom I like) to the provost and use an online sampling of 'professionals' to give their institution some imprimatur of respectability then the provost (who is busy anyway) will buy it. Meanwhile Candidate B (whom the hiring department, or some members of the hiring department, really like) comes from another school that is much less Leiter-ized (but perhaps has a single-star faculty and that faculty who is also very smart and helpful was B's advisor). Mr. A gets the pick over Dr. B because dean and some members of department form a coalition using Leiter rankings as primary evidence in selling this to provost.

Nobody is accusing Leiter of having bad intentions. But it takes only a minimal consequentialism to see that his system may have this, and many other, deleterious effects on hiring, the profession, graduate education, etc..

Tailor the scenario as needed but do not deny that such is very standard stuff in any other industry. It follows that it is not unlikely that it might be standard in our industry too. Just ask Bourdieu! Only philosophers who have zero empirical bones in their body (most of us! do you even know who Pierre Bourdieu is? Randall Collins?) will refuse to believe that it happens. Just go visit the marketing department at any big corporation or, indeed, the marketing department at your university.

Anonymous said...

Seriously, nothing like this has ever happened at my university. There has never been an occasion on which the dean of the faculty said, "Yeah, most of the philosophy department says this person is better, but I am going to side with the smaller faction and tell the provost that we have to hire this other person instead." Never, for any reason, let alone a very stupid one (and I agree that the reason the dean in your scenario uses is absolutely terrible). I am sure no dean at my university looks at the Leiter Report.

I can't think of a relevant way to 'tailor' your story to get something that's actually happened anyplace I've heard of.

I do know of one university that used its (top 50 but I think just barely) Leiter ranking to convince a dean that they should get more positions when (oh, remember those days?) some universities actually were thinking of expanding some programs. (Sigh.) This strikes me as a good use, a way of constructively employing the "some information is better than none" mentality that you described.

Oh, and, fyi, at least half of the bones in my body are empiricist bones.