Revisiting the Statistics of Tiebreakers

Dormant threads from the high school sections are preserved here.
User avatar
Posts: 3446
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA

Revisiting the Statistics of Tiebreakers

Post by cvdwightw » Mon Apr 25, 2011 8:17 pm

While on a perhaps ill-advised quest to see just exactly what caused the mess in the North Georgia MUT mirror thread, I came across this early salvo:
Matt Weiner wrote:[the examination that Mike, Brian and I did of statistical tiebreakers] was flawed in many ways, from both a statistics and quizbowl theory perspective (notably, it found that "who won a game" did not predict who had won that game with 100% accuracy) and, much like the various other attempts people use to make up for woefully insufficient data with algebraic wizardry, should not be cited as evidence for's made-up nonsense from people who seem hellbent on reinventing the wheel for its own sake rather that doing what is sensible and/or precedented.
I would like to say several things on this.

1. Matt's claim that the test "was flawed...[from] a statistical theory perspective" is correct. I should not have been using 2-proportion Z-tests. I should have been using McNemar's test instead. Luckily, Brian's 53 data points were still available and I was able to run McNemar's test on the data. The results did not change. PPG, PPG differential, and BC were still significantly better predictors of who would win a tiebreaker game than head-to-head at the 5% significance level, and none of them were significantly better than each other.

2. I am confused as to what Matt means by "flawed from...[a] quizbowl theory perspective." Our quizbowl theory was basically, "we should always play a tiebreaker game; if we can't play a tiebreaker game, we should break the tie using the paper tiebreaker that best predicts the winner of a hypothetical tiebreaker game. So let's look at a bunch of tiebreakers and see which one best predicts that hypothetical winner." As far as I know, that's not flawed quizbowl theory.

3. The claim that "who won a game did not predict who had won that game with 100% accuracy" is entirely untrue, as we did not look at that as an input variable. Nevertheless, to satisfy Matt, I created a fifth variable ("winner of the rematch") that would predict the winner of the rematch with 100% accuracy. All four paper tiebreakers performed significantly worse than "winner of the rematch" at like the 0.05% significance level. Donald Taylor's assertion that PPG and BC were "pretty accurate" is not at all true either.

4. I am also a bit confused about what Matt means by "woefully insufficient data." We identified fifty-three examples of games that were (a) played in 2007 or 2008, (b) between teams that had the same record against common opponents, and (c) rematches of an earlier game between those teams. I don't know if he means "woefully insufficient" in terms of "there is not enough data here to make a conclusion" (which like half the point of statistics is to generalize based on a small subset of data, so that's not at all valid) or "the way the data was procured and/or the criteria for including/rejecting data points is invalid," which is a valid complaint but not one that could have been easily clarified in the context of that post without diminishing the rhetoric of the rest of it.

5. The claim that we used "algebraic wizardy" to support our claims is patently false. Perhaps Matt is referring to Brian's program that loosened the requirements about what rounds the "equivalent schedule" was played in, and removing the results of the other games that weren't against common opponents or each other. No actual algebra was performed, "wizardry" or otherwise, that was not inherent in the calculation of commonly used statistics like "points per game."

6. We were not "hellbent on reinventing the wheel for its own sake." There was legitimate debate within the quizbowl community over what the best tiebreaker should be. Conventional wisdom in Andrew Hart's world says that if no tiebreaker packet is available to break the tie, the tiebreaker should be PPG for a round-robin and PPB if teams have played unequal schedules. Conventional wisdom in Georgia says that the tiebreaker should be head-to-head. Obviously one of these instances of "conventional wisdom" is wrong. Instead of throwing out Georgia's conventional wisdom because Georgia is an intellectual backwater of quizbowl theory and therefore can't possibly be right about anything, we looked at what the data had to say. The data say that head-to-head is a significantly worse predictor of who will win a tiebreaker game than PPB or PPG.

7. This "study" was empirically grounded in the data, and did not actually produce a "Superstat" to defend, only a denunciation of the use of head-to-head tiebreakers. Anyway, after correcting for my error in statistical procedure, I still believe that the results of the study are valid, and I'm half-convinced that like 90% of Matt's issues with it stem from the idea that we dared to question what Matt found "sensible and precedented." That said, I'm not some closed-minded idiot who's not willing to listen to reasonable arguments. I challenge Matt and anyone else who had a problem with either the methodology or the results of the study to elucidate those concerns in this thread, and let's see if we can't re-do the study to their specifications.
Dwight Wynne
UC Irvine 2008-2013; UCLA 2004-2007; Capistrano Valley High School 2000-2003

"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." --Matt Birk on rowing, SI On Campus, 10/21/03

"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." --Jerry