Stats: Greatest Upsets in Quizbowl History

jonah · Post by **jonah** » Tue Sep 20, 2016 11:58 pm

Inspired by the conversation about statistical interpretations of upsets, I looked at the 155,090 games in NAQT's database that were played on NAQT rules, have full individual statistics, and featured both teams hearing at least 3 bonuses*. Of those…

the winning team had the better PPB in that game 71.4% of the time
the winning team had the better PPB for the tournament as a whole 77.5% of the time
the winning team had the better PPTUH for the tournament as a whole 80.2% of the time

*To filter out situations like "the losing team only got a crack at one bonus, but happened to 20 or 30 it

Post by **theMoMA** » Wed Sep 21, 2016 4:31 am

I did some stats wizardry of my own. (I'll publish the Excel sheet on Google docs once I see if I can put in some head-to-head data.)

Since 2012, when ACF Nationals went to a 12-team top bracket, a team's tossup conversion percentage relative to the rest of the top bracket (by standard deviations) has an r^2 with team winning percentage of .847 (high). If I remember the definition of r^2 correctly, this means that 84.7% of change in team winning percentage percentage can be predicted by change in tossup conversion percentage. A team's bonus conversion relative to the rest of the top bracket (by standard deviations) has a more moderate r^2 with team winning percentage of .648. Note that I averaged and took the standard deviation from the team PPBs, not the overall PPB of the top bracket (which is always higher because the teams that get the most bonuses also have higher PPBs); I think this makes sense, because what we're interested in is how good a team is on a given bonus relative to the competition, not relative to the absolute conversion percentages.

I blended the two stats at different proportions, and found that the highest r^2 (.849) is achieved when you add 13 parts of tossup conversion to one part of bonus conversion. I haven't read much about this technique since encountering it in a baseball stats book about a decade ago, but I think this would indicate that tossup conversion is about thirteen times more important to winning than bonus conversion (at least within this five-year sample at a particular tournament), but both do help.

A few interesting notes: the strongest teams relative to their fields (this is obviously not an absolute measure of strength, as it depends on the packet set and the strength of the competition) were 2014 Virginia A (by a wide margin), 2016 Michigan A, and 2012 Yale A, all of which were champions. After that, there's a fairly steep dropoff (to 2016 Chicago A, also the strongest non-champion relative to its field). The weakest champion relative to its field was 2013 Illinois A (11th overall, and second in the 2013 year behind Yale A, which was the 6th-strongest team relative to its field of the past five years). The weakest team was, perhaps unsurprisingly, me playing solo (almost exactly as bad as 2014 Virginia A was good).

I'm also going to see if I can find out a smart way to encode head-to-head data, so we can get some empirical numbers on just how often upsets of certain magnitudes occur. I'll post that, and the spreadsheet, when I finish.

Post by **theMoMA** » Wed Sep 21, 2016 9:02 am

Based on the "six parts tossup conversion ability and one part bonus conversion ability" (both relative to that year's top bracket), I've gone through head-to-head matchups to look for the biggest upsets of the past five years at Nationals.

In terms of unlikelihood of any upset based on the relative strengths of the teams, four stand above the rest (in order of unlikelihood):

Louisville 320, Maryland A 265 in 2016
Chicago A 280, Virginia A 135 in 2013
Ohio State 130, Chicago A 120 in 2012
Illinois 270, Berkeley A 165 in 2016

The rest of the top ten most unlikely upsets:

Minnesota 215, Penn 180 in 2012
Minnesota 265, Berkeley A 230 in 2016
Minnesota 195, Alberta 190 in 2013
Harvard 220, Yale A 175 in 2012
MIT A 325, Michigan A 265 in 2015
Stanford A 240, Virginia 180 in 2015

In terms of most unexpected margin of victory in an upset, three stand apart (two of which are on the above list):

Chicago A 280, Virginia A 135 in 2013 (based on their tossup and bonus conversion rates, Virginia was expected to win by 190, but instead lost by 145).
Michigan 300, Maryland A 25 in 2014 (Maryland was expected to win by 35, but lost by 275)
Illinois 270, Berkeley A 165 in 2016 (Berkeley was expected to win by 180, but instead lost by 105)

Virginia also holds the most overachieving win; in 2014, UVA was expected to beat Stanford by a healthy 330-point margin, but bested that by winning 660 to -5.

naan/steak-holding toll · Wed Sep 21, 2016 9:24 am

These stats are pretty interesting, Andrew - I'd be interested to see the methodology, particularly for generating the expected game stats? I figure if I did this it'd be using something along the lines of "log5-generated chance of getting tossup x (10 + PPB)" but I'm not precisely sure about the specifics.

One thing - I think that Ohio State-Chicago game is from 2012, not 2014, i.e. before the John Lawrence era.

Post by **theMoMA** » Wed Sep 21, 2016 9:29 am

One more post.

http://imgur.com/F5MHhk3

The above-linked table has some data on the likelihood of upsets based on how closely the teams are matched. I used the team performance stat (thirteen parts tossups, one part bonus) to find a "MoS" ("margin of strength") for each matchup. The average margin of strength in a given game is about 20 (i.e. the average game is not a particularly well-matched one).

I then placed the matchups into half-stdev bins based on how many standard deviations from the mean they were. The closest matchups are around 1.0 standard deviations away on the negative side; really mismatched games are about 2.5 standard deviations to the positive. (I mislabeled the last bin, which should be 2.0 to 2.5.)

I also included the "Avg MoV" ("average margin of victory") for the favorites within each stdev range, and listed the number of non-upsets, the number of upsets, the win percentage of the favorites, and an upset involving a contending team that would give you an idea of about what order of magnitude we're talking about. (The last three are hypothetical, because within the sample, no one lost with such a big expected advantage.)

Overall, favorites win about 78.5% of the time, but that number goes up to 85.7% when you remove the most closely matched bin, and rises to 93.6% when you remove the two most closely matched bins.

It's interesting how cleanly these numbers break. The only real surprise is that upsets occurred more than half the time in the most closely matched games.

Post by **theMoMA** » Wed Sep 21, 2016 10:12 am

Periplus of the Erythraean Sea wrote:These stats are pretty interesting, Andrew - I'd be interested to see the methodology, particularly for generating the expected game stats? I figure if I did this it'd be using something along the lines of "log5-generated chance of getting tossup x (10 + PPB)" but I'm not precisely sure about the specifics.

One thing - I think that Ohio State-Chicago game is from 2012, not 2014, i.e. before the John Lawrence era.

Thanks for catching the transcription error on the Chicago/OSU game; I just fixed it.

As for methodology, here's basically what I did:

1. I calculated a z-score for each team's tossup conversion rate (tossups answered / tossups heard) relative to that year's Nationals field. I did the same with points per bonus, again relative to that year's field.

2. I ran some regressions in Excel until I found the mix of tossup/bonus z-scores that had the highest r^2 to winning percentage (13-to-1 tossup-to-bonus). I'll call that the "measure of strength" or MoS.

3. I copied all of the individual game scores from 2012-2016 Nationals into Excel and did some data entry to get them matched up with each team's tossup/bonus z-scores and MoS.

4. At this point, I wanted to calculate the expected points, but Excel was struggling to find a good line of fit. So I put the matchups into the eight bins shown in the above-linked table and made a scatterplot of average MoS vs. average margin of victory for each eight points. The result was almost exactly linear (r^2 f 0.98 or so), and very nearly passed through the origin, so it was on the right track. I used Excel's option of having the regression line run through the origin so that it wouldn't give the favored team a negative expected margin of victory in the very closest matchups, which lowered the r^2 to about 0.95. This is kind of a quick-and-dirty way to do it, but I think it's good enough for these purposes.

5. Based on the above, you can find a team's estimated margin of victory in a given matchup by taking the absolute value of the difference in the two teams' MoS and multiplying the result by 16.95.

Post by **theMoMA** » Wed Sep 21, 2016 10:55 am

Will's mention of log5 gave me another short stats idea. I wondered if there have been more or fewer upsets than expected at Nationals.

I used the equation from the regression line of the MoS vs. team winning percentage to find the "true winning percentage" of each team (this doesn't mean "true" in an empirical sense). Then I used the log5 formula to calculate the chance of an upset in any given matchup. (Because the line of fit breaks down at both ends--it thinks that some teams have have win percentages over 100% or less than 0%--this isn't a good way to see what the least likely upsets were, but if I truncate it at 1 and 0 it should give us an ok idea of how many upsets we should expect in the aggregate, across each of the bins.)

Here's a small comparison of the number of upsets that log5 expects vs. how many there were:

Bin 1: 26.4 expected (41%), 33 occurred (52%)
Bin 2: 21.3 expected (27%), 26 occurred (33%)
Bin 3: 11.5 expected (16%), 8 occurred (11%)
Bin 4: 5.0 expected (11%), 4 occurred (7%)
Bin 5: 2.8 expected (7%), 0 occurred
Bin 6: 1.2 expected (3%), 0 occurred
Bin 7: 0 expected, 0 occurred

All in all, it appears that "moderate-to-big upsets" have happened at recent National a little less than would be expected, but that "little upsets" between fairly closely matched teams have tended to occur more often than expected.

I wondered if these "little upsets" were more common among teams toward the lower end of the field (on the theory that games between teams conceding many points to the packet were likely to be more random than the underlying stats might predict). This ended up not really being the case; upsets occurred about 3% more often for teams with MoS in the negative, but I doubt that's significant.

Post by **theMoMA** » Wed Sep 21, 2016 12:42 pm

Naturally, I made a mistake on my spreadsheet, but I fixed it and updated the above posts. (I didn't notice that I had the x and y axes switched around when I was doing the regressions, but the results aren't too terribly different.) The main difference is that the ideal blend of tossup and bonus performance turns out to be 13:1 as opposed to 6:1, further lending credence to the idea that tossup performance is the main skill in quizbowl.

jonah · Post by **jonah** » Wed Sep 21, 2016 12:45 pm

In 177,391 games on NAQT rules with full individual stats (the same data set but without the bonuses-heard floor), the winning team answered more tossups correctly 91% of the time. If you relax to "at least as many," 96.5%.

The King's Flight to the Scots · Wed Sep 21, 2016 12:49 pm

jonah wrote:In 177,391 games on NAQT rules with full individual stats (the same data set but without the bonuses-heard floor), the winning team answered more tossups correctly 91% of the time. If you relax to "at least as many," 96.5%.

Can you restrict that to reasonably competitive games by some metric?

jonah · Post by **jonah** » Wed Sep 21, 2016 12:57 pm

The King's Flight to the Scots wrote:
jonah wrote:In 177,391 games on NAQT rules with full individual stats (the same data set but without the bonuses-heard floor), the winning team answered more tossups correctly 91% of the time. If you relax to "at least as many," 96.5%.
Can you restrict that to reasonably competitive games by some metric?

Depends on what the metric is — any suggestions? I can certainly do something like "final margin is within 100 points" or similar; I'm not sure that really captures what a "close game" is, but things along those lines are probably the best we can do given the stats that are currently kept.

By the way, perhaps the stats posts in this thread should be split.

ryanrosenberg · Post by **ryanrosenberg** » Wed Sep 21, 2016 1:04 pm

jonah wrote:
The King's Flight to the Scots wrote:
jonah wrote:In 177,391 games on NAQT rules with full individual stats (the same data set but without the bonuses-heard floor), the winning team answered more tossups correctly 91% of the time. If you relax to "at least as many," 96.5%.
Can you restrict that to reasonably competitive games by some metric?
Depends on what the metric is — any suggestions? I can certainly do something like "final margin is within 100 points" or similar; I'm not sure that really captures what a "close game" is, but things along those lines are probably the best we can do given the stats that are currently kept.

Maybe "within a tossup" (i.e. the margin is less than 2*[15 + average game bonus conversion])? I don't know how easy that is to calculate though.

jonah · Post by **jonah** » Wed Sep 21, 2016 1:18 pm

Granny Soberer wrote:
jonah wrote:
The King's Flight to the Scots wrote:
jonah wrote:In 177,391 games on NAQT rules with full individual stats (the same data set but without the bonuses-heard floor), the winning team answered more tossups correctly 91% of the time. If you relax to "at least as many," 96.5%.
Can you restrict that to reasonably competitive games by some metric?
Depends on what the metric is — any suggestions? I can certainly do something like "final margin is within 100 points" or similar; I'm not sure that really captures what a "close game" is, but things along those lines are probably the best we can do given the stats that are currently kept.
Maybe "within a tossup" (i.e. the margin is less than 2*[15 + average game bonus conversion])? I don't know how easy that is to calculate though.

Using this metric for closeness, there are 37,950 games to consider:

in 63.1% the winning team had more tossups correct, and in 84.8% the winning team had at least as many tossups correct
in 53.4% the winning team had the better in-game PPB
in 57.7% the winning team had the better tournament-overall PPB
in 58.0% the winning team had the better tournament PPTUH

Mewto55555 · Post by **Mewto55555** » Tue Sep 27, 2016 11:50 pm

theMoMA wrote:I did some stats wizardry of my own. (I'll publish the Excel sheet on Google docs once I see if I can put in some head-to-head data.)

Since 2012, when ACF Nationals went to a 12-team top bracket, a team's tossup conversion percentage relative to the rest of the top bracket (by standard deviations) has an r^2 with team winning percentage of .847 (high). If I remember the definition of r^2 correctly, this means that 84.7% of change in team winning percentage percentage can be predicted by change in tossup conversion percentage. A team's bonus conversion relative to the rest of the top bracket (by standard deviations) has a more moderate r^2 with team winning percentage of .648. Note that I averaged and took the standard deviation from the team PPBs, not the overall PPB of the top bracket (which is always higher because the teams that get the most bonuses also have higher PPBs); I think this makes sense, because what we're interested in is how good a team is on a given bonus relative to the competition, not relative to the absolute conversion percentages.

I blended the two stats at different proportions, and found that the highest r^2 (.849) is achieved when you add 13 parts of tossup conversion to one part of bonus conversion. I haven't read much about this technique since encountering it in a baseball stats book about a decade ago, but I think this would indicate that tossup conversion is about six times more important to winning than bonus conversion (at least within this five-year sample at a particular tournament), but both do help.

A few interesting notes: the strongest teams relative to their fields (this is obviously not an absolute measure of strength, as it depends on the packet set and the strength of the competition) were 2014 Virginia A (by a wide margin), 2016 Michigan A, and 2012 Yale A, all of which were champions. After that, there's a fairly steep dropoff (to 2016 Chicago A, also the strongest non-champion relative to its field). The weakest champion relative to its field was 2013 Illinois A (11th overall, and second in the 2013 year behind Yale A, which was the 6th-strongest team relative to its field of the past five years). The weakest team was, perhaps unsurprisingly, me playing solo (almost exactly as bad as 2014 Virginia A was good).

I'm also going to see if I can find out a smart way to encode head-to-head data, so we can get some empirical numbers on just how often upsets of certain magnitudes occur. I'll post that, and the spreadsheet, when I finish.

I'm a little suspicious that something is off here, because the r^2 in the without bonuses (0.847) and with bonuses case (0.849) are really darn close. I'd be happy to poke at the numbers a little when I have time and see what I come up with -- did you end up posting the spreadsheet?

i never see pigeons in wheeling · Wed Sep 28, 2016 1:20 am

Mewto55555 wrote:
theMoMA wrote:I did some stats wizardry of my own. (I'll publish the Excel sheet on Google docs once I see if I can put in some head-to-head data.)

Since 2012, when ACF Nationals went to a 12-team top bracket, a team's tossup conversion percentage relative to the rest of the top bracket (by standard deviations) has an r^2 with team winning percentage of .847 (high). If I remember the definition of r^2 correctly, this means that 84.7% of change in team winning percentage percentage can be predicted by change in tossup conversion percentage. A team's bonus conversion relative to the rest of the top bracket (by standard deviations) has a more moderate r^2 with team winning percentage of .648. Note that I averaged and took the standard deviation from the team PPBs, not the overall PPB of the top bracket (which is always higher because the teams that get the most bonuses also have higher PPBs); I think this makes sense, because what we're interested in is how good a team is on a given bonus relative to the competition, not relative to the absolute conversion percentages.

I blended the two stats at different proportions, and found that the highest r^2 (.849) is achieved when you add 13 parts of tossup conversion to one part of bonus conversion. I haven't read much about this technique since encountering it in a baseball stats book about a decade ago, but I think this would indicate that tossup conversion is about thirteen times more important to winning than bonus conversion (at least within this five-year sample at a particular tournament), but both do help.

A few interesting notes: the strongest teams relative to their fields (this is obviously not an absolute measure of strength, as it depends on the packet set and the strength of the competition) were 2014 Virginia A (by a wide margin), 2016 Michigan A, and 2012 Yale A, all of which were champions. After that, there's a fairly steep dropoff (to 2016 Chicago A, also the strongest non-champion relative to its field). The weakest champion relative to its field was 2013 Illinois A (11th overall, and second in the 2013 year behind Yale A, which was the 6th-strongest team relative to its field of the past five years). The weakest team was, perhaps unsurprisingly, me playing solo (almost exactly as bad as 2014 Virginia A was good).

I'm also going to see if I can find out a smart way to encode head-to-head data, so we can get some empirical numbers on just how often upsets of certain magnitudes occur. I'll post that, and the spreadsheet, when I finish.
I'm a little suspicious that something is off here, because the r^2 in the without bonuses (0.847) and with bonuses case (0.849) are really darn close. I'd be happy to poke at the numbers a little when I have time and see what I come up with -- did you end up posting the spreadsheet?

I suspect that what you're seeing here is a multicollinearity problem in the latter model due to the extremely high correlation between tossup conversion and bonus conversion (I'm assuming this is the case because it's almost certain this is true). Therefore, adding bonus conversion as a parameter adds only a tiny bit of explanatory power.

Post by **theMoMA** » Wed Sep 28, 2016 6:02 pm

i never see pigeons in wheeling wrote:
Mewto55555 wrote:I'm a little suspicious that something is off here, because the r^2 in the without bonuses (0.847) and with bonuses case (0.849) are really darn close. I'd be happy to poke at the numbers a little when I have time and see what I come up with -- did you end up posting the spreadsheet?
I suspect that what you're seeing here is a multicollinearity problem in the latter model due to the extremely high correlation between tossup conversion and bonus conversion (I'm assuming this is the case because it's almost certain this is true). Therefore, adding bonus conversion as a parameter adds only a tiny bit of explanatory power.

Ankit's explanation makes sense. Tossup conversion and bonus conversion both flow mainly from the overarching factor of "knowledge base," which is likely unquantifiable through game results. I'd just add that tossup conversion has direct explanatory power on win percentage--the only way to win games is to convert tossups, and conversely, if you know a team has converted a lot of tossups, you can infer that they won a lot of games. PPB doesn't directly explain win percentage, although it has a strong correlation with "knowledge base," which has a strong correlation with tossup conversion, which is how you win. I'd guess that PPB's relatively small contribution to r^2 has to do with the fact that it adds relatively little information about the external "knowledge base" factor that the tossup conversion percentage doesn't already "know."

Thanks for reminding me to post the spreadsheets, by the way:

https://drive.google.com/file/d/0B6tRw2 ... sp=sharing
https://drive.google.com/file/d/0B6tRw2 ... sp=sharing

The top link has most of the interesting info. The bottom one has the regression graphs. Some of the work on these isn't shown; if you have questions, or need help deducing my gibberish column headers, let me know.

The Quizbowl Resource Center

Stats: Greatest Upsets in Quizbowl History

Stats: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History

Re: Greatest Upsets in Quizbowl History