I've been thinking about the PGR, since we've got a new edition on the way in a week or two. I have a few worries about it. Most of these worries concern the manner in which the information is presented, though one is methodological. Also, for full disclosure, I am broadly in favor of the existence of the report, even if I am dissatisfied with some of the things it is sometimes used for. I found it invaluable when I was applying for grad school.
1. I worry about the rating scale the evaluators use contains the necessary structure to permit calculation of the mean scores that the ranking is based on. The issue is fundamentally one about how the prompts are related. You can use numbers representing features of objects to calculate a mean if and only if the numbers are from an interval scale, which means that the points represented by the numbers are equally spaced. A classic example of this type of scale is the five-point Likert scale (strongly disagree; somewhat disagree; neither agree nor disagree; somewhat agree; strongly agree. Likert scales often come with a visual aid that represents the points as being equally spaced). PGR evaluators use a six-point scale with the following prompts: 5: distinguished; 4: strong; 3: good; 2: adequate; 1: marginal; 0: inadequate; I wonder whether the points on this scale are evenly spaced. For example, intuitively, it seems to me that the distance between "distinguished" and "strong" might be greater than the distance between "marginal" and "inadequate." This is important because unevenness in the spacing of the points will result in distortion in and unreliability of the mean scores that are based on them.
2. I worry about the emphasis placed on the ordinal ranking of departments. Leiter, of course, specifically says to attend to the mean scores and not just to the ordinal ranking, but whenever anyone talks about the report, everyone refers only to the ordinal ranking and never to the mean scores. When, for example, we talk about a range of schools ranked on the report, we always talk about the "top 10" or the "top 25", and never "the threes" or "the fours." Attending to the mean scores paints a rather different picture of the report, I think. NYU and Rutgers are the only schools in the high 4s, with 4.8 and 4.7, respectively. The rest of the top 9 is in the low 4s (below 4.5), with Harvard, MIT, and UCLA all tied at 4.0. This means that only two schools in the US round up to "distinguished" and that MIT is merely "strong." The rest of the top 30 are all in the 3s. The mean scores suggest that the difference in quality between NYU and MIT is equal to the difference in quality between MIT and UMass; looking at the ordinal ranking I would have expected MIT to be closer to NYU than UMass—nothing against UMass, but MIT is ranked at #7 while UMass hangs out down at #24, which seems like a huge difference compared to the seven schools between MIT and NYU. That this difference is illusory is entirely the point. The "top-10" category is fairly arbitrary—there is no cohesive grouping of approximately ten of the strongest departments. The reality is that there are about two departments that are close to "distinguished" and then a relatively large number of departments—more than half the departments ranked by the report—in the neighborhood between "strong" and "good." I worry that the ordinal scale we all internalize exaggerates the differences between departments. If I were on the advisory board, I would encourage a move away from it.
3. I worry a little bit about the sensitivity of the mean-score scale. The mean scores are rounded off to a tenth of a unit, but the raters are only allowed to make half-point distinctions. This makes me wonder whether a difference of one tenth of a point can be regarded as statistically significant. Maybe it would be better to round off to the half-point, as the raters are asked to do. It might not be good to present information in a manner that is more precise than the manner in which it was collected.
4. Related to (3), I also worry about the fact that a bunch of relevant statistical data is missing. For example, there is no information on the margins of error or standard deviations. This is important because at heart, the PGR is a poll that is trying to measure aggregate faculty reputations, and all measurement procedures involve some level of imprecision. Information about MoE and standard deviations would help us to better understand these facts. In particular, it would help us understand which differences are trivial and which are significant, and what the levels of uncertainty in the scores are. (Leiter himself suggests that differences of less than .4 are insignificant, and he may be right. But it would be nice to know how he arrives at that figure.)