Wednesday, February 25, 2009

Bride of PGR Minutiae

Obviously, there's a new PGR out. I'd like to sort of renew my complaint about the way the data is presented, in terms of ordinal rankings rather than in terms of mean scores. For example, we see the ordinal rankings from previous years are reported going back to 2002, but the mean scores are omitted. This is bad because, even assuming that the procedure for assigning numbers to departments measures anything, the ordinal rankings are just a derived quantity based on, and carry far less information than, the mean scores on which they are based. For example, we might notice that Texas at Austin has fallen from 13th in '06 - '08 to 20th in the current edition. But while this drop appears to be steep, it could have been caused by any number of things: a decline in quality at UT (obviously); an improvement in by a number of neighboring departments with no decline at UT; or some combination of neighborly improvement and Texan decline. It turns out that although the seven-step drop was caused by a decline in UT's mean score, this decline - 3.6 in '06, 3.4 now - is in a range we have been lead to believe is a statistically insignificant. So, in effect, the report measured no change in the quality of UT's department.

In the current issue, NYU is ranked at #1 (with a 4.9 mean score) and Rutgers #2 (with a mean 4.6); although this is a slightly bigger difference from the previous version, we still have every reason to suspect that the .3 difference in their mean scores is insignificant. They are within the range Leiter cites as insignificant; both schools have a median score of 5, which means that more than half of the respondents gave them 5s - which suggests that we should regard them as tied. The numbers we have suggest that any difference between the two is too small to be measured by the report’s methods. So I'd much rather see the mean scores than ordinal ranks.

--Mr Zero

23 comments:

Juan said...

Where does Leiter say that a .5 difference is insignificant? I saw something to that effect with respect to the specialty rankings, but not with respect to the general one (I didn't look hard, though).

Anonymous said...

I agree with your sentiments regarding ordinal rankings, but I'm a bit confused about how the term "statistically significant" is being used.

Ordinarily, statistical significance is a function of sample size, target population, and standard deviation.

The problem is that we have no idea what the target population is supposed to be.

Just what percentage of "Leiterworthy" reviewers participated? We can tell what percentage who were asked to participate replied. But Leiter has given no indication of what percentage of the Leiterworthy population was asked to participate. Without this info, talk of a margin of error is meaningless.

Anonymous said...

kieran healy did, in the past, a useful visualization with mean and confidence interval. perhaps he'll do so with this year's data.

Xenophon said...

Maybe we should attribute this issue to marketing concerns? If a department goes to its dean and say "hey, we've dropped eight places in the latest rankings, we need to hire another epistemologist stat" they might have more success than if they say "we've fallen by 0.2 on a 5-point scale, which is probably insignificant, but we want another faculty slot anyway." Of course, the danger is that undergrads will be misled as much as deans, but come on, which do you think is smarter (and better able to read the data), an undergrad philosophy major or a dean who in my experience probably has a Ph.D. in English? I'm rooting for the undergrad. So benefits outweigh possible harm.

Anonymous said...

FWIW, I think that presenting the US rankings first and then the worldwide rankings is misleading, in that people tend to focus too much on the US rankings. Oxford is really 2nd, not Rutgers. Surely top prospective grad students should be aiming for an international perspective! What's more, I suspect that undergrads who realise this could benefit from slightly reduced competition at non-US top grad schools... Of course, not everyone wants to move outside the country, but not everyone wants to move to New Jersey either!

Dr. Killjoy said...

The U.S.-Non U.S. separation is useful for a variety of reasons, but for grad students the important difference will be monetary. So unless a potential grad students are independently wealthy or have secured outside funding, Oxford might not be a smart choice over Rutgers because Oxford, unlike Rutgers, won't be slathering their grad students with cash. In the main, grad programs outside the U.S. are far less likely to provide their grad students with a hefty fellowship. Also, in the main, non-U.S. PhDs are less marketable in the U.S. (you know, the country with the vast majority of academic positions in philosophy). The sly move is to get into Rutgers then do a semester or two at Oxford on Rutgers' fat dime. Note that the Oxford/Rutgers gap is also known as the Hawthorne gap.

statboy said...

7:43,
Huh?
If you give me a list of data points, I'll tell you their standard deviation. It's just arithmetic. I don't need to know anything about a 'target population'.

So the SD could be given with the report -- Leiter could include a 'technical page' or something. This would definitely address a part of Mr. Zero's concern... if indeed his concern is over statistical significance (that's what you said, Z, but is it what you meant?).

Mr. Zero said...

Juan,

Leiter suggests that differences of .4 or less are insignificant on "What the Rankings Mean" about 3/4 of the way down, point #2. I am suspicious that he might have just made it up, though.

7:43,

I was using the expression "statistically significant" in a way I knew to be loose and perhaps inappropriate, using Leiter's off-hand remark about significance as a guide. If you look to the first minutiae post, you'll see I make substantially the same point.

Xenophon,

I was inclined to suspect that the ordinal fixation was inspired by the classic "US News" model of rankings. I dunno.

11:19,

Agreed. And I'd much rather live in Oxford than New Brunswick.

Anonymous said...

Agreed, 11:19am! I've never understood why, in this day and age, Leiter would NOT list the world rankings first ... And by the way, the average caliber of each individual faculty member at NYU compared with that of faculty members at Oxford must be extraordinary, in order to make up for the shear difference in numbers (roughly 30 to 100+)!

Anonymous said...

Here's what Leiter says regarding the .4:

"For programs whose mean scores are fairly close (roughly, .4 or less apart), choose a program exclusively on the basis of how well it meets your needs and interests and needs, i.e., because it better meets your intellectual goals, or offers you a better financial aid package, or provides a more supportive intellectual community, and so on."

Nothing about *statistical* significance here, though maybe that's what was meant. Could also just mean: don't get too hung up on difference in result when choosing schools if these other factors favor the lower ranked school, or something like that.

geek said...

7:43,

I don't follow you. Why do you have to know what the 'target population' is to say something about what is statistically significant?

For instance, if Blue U is ranked .8 points above Red State, and there were 300 ratings, and the SD for their ratings is .1, then the difference is very highly statistically significant. And this is so irrespective of "what percentage of the Leiterworthy population was asked to participate."


But I think Mr. Zero is right that what we are really interested in is not the formal notion of statistical significance; rather, we want to know whether these differences are *significant*!

Anonymous said...

statboy and geek:

Of course the size of the target population relative to the sample matters for statistical significance.

Just because you can calculate standard deviation from the sample doesn't mean diddly if you're talking stats like margin of error, which are key to stats like statistical significance.

Reductio: Suppose that everyone Leiterworthy participated. There'd be a standard deviation, but zero error. Thus, the statistical component of "statistical significance" would vanish, and we'd be left just talking about significance.

It's true that you can usually get what you need statistically by looking at SD alone, but that's only against an assumption of a sample size that is dwarfed by the target population.

colin said...

If a difference less than .4 (or less) is insignificant that in essence says that there is no real statistical difference between a school ranked 48th and one ranked 36th or one ranked 6th or one ranked 16th; however there is a significant difference between being ranked 1st and 3rd.

To me the obvious point (again) is that the difference between the various schools is not that great. But this raises another question: why is the difference so much more pronounced at the top?

Anonymous said...

This would be more appropriate for an earlier thread, but let me throw out a rare ray of sunshine: my department will be hiring a one-year position next year, as a leave replacement, but we haven't even begun to think yet about placing an ad. (Why? I have no idea.) So, despite the most recent JFP's being so thin, there will be more jobs coming. My guess is that there will be a lot of late hits this year, even into summer. Come August, classes will still need to be taught.

Anonymous said...

target population, sample size and method of selection are only relevant to assessing the sample's representativeness (or in technical jargon, its external validity). if the evaluators self-select or are selected by someone who knows best (such as Leiter), then this will usually reduce the external validity (representativeness of the sample). so, it is better to select evaluators randomly or use a stratified random sample (stratified into several relevant categories)

geek said...

Okay, anon, if the sample is very close to the entire population then that can be important to statistical significance, you're right.

But you said that without knowing the population size, "talk of margin of error is meaningless". And that's not true; in most work with margins of error it just doesn't matter how big the target population is.

Xenophon said...

Zero: you're right of course. I was trying to be witty. I'll stop.

Statboy, geek, anons, et al: this all assumes we can delimit fairly clearly the population of the Leiterworthy, or identify factors to measure in creating a stratified random sample. I think with something like PGR you've got to accept that external validity either isn't very good (Leiter might even agree to that) or that it's in the eye of the beholder. It is, after all, a popularity contest that tries to formalize its procedures. Take it as it's intended: as a useful tool that should help prospective grad students identify appropriate programs that deserve further research. And, like college sports rankings, something to occupy us in debates held during our lunch hours.

Mr. Zero said...

Xenophon,

I took you to be raising an interesting question: why is the information presented in the way it is, when (assuming there's any information there at all) it would obviously be better presented another way. Sorry.

All y'all,

As several of you have pointed out, the point of worrying about statistical significance, standard deviations, margins of error, etc, is, of course, to determine which differences in mean scores indicate a real difference and which do not. And in order to know which bits of information would be relevant, you have to settle deep issues concerning what the survey is trying to accomplish, whether the sample is representative, who it's supposed to represent, and whether the thing really is just a popularity contest or if it's really measuring something. I was deliberately avoiding some of these issues.

However, it seems clear to me that, whatever else is going on, the PGR is not just a popularity contest (which I intend to be consistent with its being somewhat of one).

Anonymous said...

I'm curious to know what others think about how Leiter manages himself in relation to what counts as an affiliation with a department. Here is what I mean:

Leiter was listed as a faculty member for the philosophy department at the University of Texas at Austin and so (justifiably) he can count his departure as a loss for Texas. However, I do not see Leiter listed on the faculty page for the philosophy department at the University of Chicago and it looks to me that he only has an official post at the Law School. Indeed, I think Leiter basically says as much when he says of himself: "Brian Leiter (philosophy of law, ethics, Continental philosophy) to the University of Chicago Law School." And yet, when characterizing the philosophy department at Chicago, he includes himself as relevant to Chicago's department.

To be fair, Leiter does list a few other people on the summary changes (of departments) page as joining a Law School at various institutions, but I am wondering what the criterion is? Is it having a PhD in philosophy? If so, do PhD's in Philosophy count if they are in, say, a Political Science Department? Or, a Theology department?

Or, is the rationale simply that Leiter has published in philosophy journals and therefore can count himself as contributing to the reputation of the Department of Philosophy at the University of Chicago, even though he does not (apparently) have an appointment in their department?

Anonymous said...

Somewhere in the new PGR, there is an explanation for the listings, but it was easier to search his blog for this which explains it:

http://leiterreports.typepad.com/blog/2008/05/how-to-count-af.html

I couldn’t find any of the ‘cognate’ faculty listed for Chicago on the Chicago homepage. At least some of them do, I know for a fact, work with PhD students in philosophy, and Leiter & Nussbaum are teaching a cross-listed workshop currently. Same true for almost all the Harvard 'cognates,' no sign of most of them on the department homepage either.

Mr. Zero said...

anon 1:36,

I don't see what the big deal is. He lists himself on the "Cognate Faculty and Philosophers in Other Units" section of the Chicago list. He doesn't claim to be in the philosophy department there, just that he's a philosopher in another unit. It's an accurate description his relationship to the department, and there would be some small relevance to philosophy graduate students (e.g. he can sit on committees or whatever). This is also how Dworkin is listed at NYU.

Anonymous said...

This is anon 1:36 re: how Leiter lists himself. I realize now that my post was not exactly clear -- that is my fault. I was not opposing the fact that Leiter listed himself as cognate faculty, but rather I wanted to know what the criterion was for such a listing. Anon 4:45 provided a link with a helpful explanation (thanks!). However, it seems to me that Leiter basically admits that there is a problem about how to determine relevant cognate faculty.

Mr. Zero said...

1:36/524,

I apologize if I misunderstood your gist. I guess I took you to be asking an easier-to-answer question than you were. Sorry.