Friday, December 18, 2015

On the Leiter/Bruya Donnybrook

I've been trying to keep up with this back-and-forth between Brians Leiter and Bruya about Bruya's criticism the PGR. (Roundup: Bruya's paper in Metaphilosophy; initial discussion at Daily Nous; Leiter's first response; Bruya's reply at Daily Nous; David Wallace's "comment on Bruya"; Leiter's most recent follow-up. Bruya has written a longer reply to Wallace hereI haven't had a chance to read it at all.) I haven't been doing a good job. There's a lot to digest, and I'm not sure what to make of it all. I am not sure whether Bruya's criticisms, on the whole, hold up.

But I guess I am inclined to think that the criticism of the PGR's sampling technique is sound. This criticism has been around for a long timeZachary Ernst has been making it since at least 2009. Leiter says it's the it is "the correct method to use when what is wanted is expert, "insider" information," but as far as I can tell, this isn't really true. It's certainly much too strong to say it's the correct methodit is one way among several of identifying experts and insiders, any of which would potentially work as well or better. So at best it is a method.

And it is a method that has serious drawbacks that I have never seen Leiter acknowledge, let alone address adequately. These problems include: that the first participants will have a disproportionate impact on the rest of the sample; that the resulting sample is not random; and that it is not possible to know whether the sample is representative of the larger group it is attempting to emulate. To me, that seems bad, and like it might not be such a great method to use for this. To me, it seems like it might be better to use a different method. A method that would be less likely to introduce bias into the sample, and is more likely to generate a sample that can be known to be representative. To me, it seems like Leiter's claim that it is "the correct method" is somewhere in the range between misleading and totally wrong, depending on whether it is merely one of several correct methods or just not even a correct method at all. And it seems to me that Leiter's response to this criticism is somewhere in the range between inadequate and nonexistent.

Of course, I could be wrong, and if I'm wrong I hope one of y'all Smokers will set me straight.

I also wanted to ask about one of David Wallace's criticisms of Bruya's analysis. In comments at Daily Nous, Wallace writes:
Just to repeat from the other thread [the relevant material is here]: the correlation is not in fact between institutional PGR rank and number of evaluators at the institution. (That correlation is extremely weak.) It's between institutional PGR rank and number of evaluators with a PhD from the institution. That's only to be expected if placement record tracks faculty quality and faculty quality isn't too wildly varying in time.* 
I suppose I think that's what you'd expect if placement record tracks faculty quality
provided, of course, that the PGR measures faculty quality. But I don't think you'd want to affirm the consequent and infer that this correlation confirms that the PGR reliably measures faculty quality.** Because that's also what you'd expect if the pool of evaluators was disproportionately packed with people who studied at a particular set of institutions, who then vote one another's Ph.D.-granting institutions up. And I think that's what you'd expect if you started with a small group of elite evaluators, and then accumulated additional evaluators by getting current evaluators to nominate their friends. I mean, sure, maybe this correlation just confirms that the best researchers studied in the best departments. But it's not as though they use an independent procedure for identifying objectively qualified potential evaluators and assessing their competence, or have independently confirmed that the sample is representative.

So maybe this correlation just confirms that evaluators tend to have gone to grad school at high-ranking departments, and people who went to grad school at high-ranking departments have a lot of friends who also went to grad school at high-ranking departments, which is how they got nominated to become evaluators, and then they give that group of schools high ratings. If there was some sort of objective standard for being a qualified evaluator, and the pool of actual evaluators was selected in a way that was likely to generate a sample that is representative of the larger pool of research-active philosophers, and then you ended up with a correlation like that, then I'd be inclined to agree that it was evidence that things with the PGR were working more or less well. But that's not the situation.

Seriously. Wallace's result here is consistent with the hypothesis that the PGR's pool of evaluators is largely populated with people who studied at the highest-ranked schools, who then determine that those are the schools that are ranked the highest. Right? That's bad, right? This is a suspicious result, right? It's suspicious if a survey of an unrepresentative sample consistently shows that the departments that are overrepresented in the sample are the best, right? I realize it would have been better for Bruya to have been explicit about what 'from' means, but it's not any less problematic if it means "where you got your Ph.D." instead of "where you currently work," right? Am I wrong? Have I misunderstood? Help, please.

--Mr Zero

*Wallace's blog comment is a more succinct restatement of material he presents on page 6 of his critique.

**I do not take Wallace to be doing this. Wallace is clear on p. 6 of the longer piece that his point is that an argument based on this correlation for the conclusion that the PGR is unreliable is question-begging, not that the correlation confirms that the PGR is reliable.