Statistical Tiebreakers
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Statistical Tiebreakers
While I was making this post, I got to thinking: we should be able to measure which is the best statistical tiebreaker. All we need are situations in which teams (preferably with the same record, but not necessarily) play each other more than once: for some fraction of these, the results for headtohead, PPG, PPB and whatever other tiebreakers will differ. We can then easily see which is the strongest correlate to winning a given match (so, like, if PPG differential predicts the winner 87% of the time but headtohead only 62%, we can quantifiably say that PPG differential is a better tiebreaker.)
As such matchups happen not infrequently at tournaments, we should be able to assemble some data fairly quickly if some of you are willing to look over old stats. What do people think of this idea? If anyone wants to, I welcome them to find a tournament with such a matchup (a team playing another more than once) and see how often each tiebreaker correctly predicts the results of the actually played match.
MaS
PS: The first thing I came across was the finals of this year's IO, but this provides no useful data since the headtohead, PPG, and PPB tiebreakers all predict the same result (unless there's a tiebreaker that's broken that I don't know about.)
As such matchups happen not infrequently at tournaments, we should be able to assemble some data fairly quickly if some of you are willing to look over old stats. What do people think of this idea? If anyone wants to, I welcome them to find a tournament with such a matchup (a team playing another more than once) and see how often each tiebreaker correctly predicts the results of the actually played match.
MaS
PS: The first thing I came across was the finals of this year's IO, but this provides no useful data since the headtohead, PPG, and PPB tiebreakers all predict the same result (unless there's a tiebreaker that's broken that I don't know about.)
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
I was actually thinking of doing this a couple of months ago, but I never got around to it. I think with my database of
tournaments I was using to calculate my individual computer rankings, I should be able to write a script to do a
bunch of these comparisons. I might try working on this over the weekend.
tournaments I was using to calculate my individual computer rankings, I should be able to write a script to do a
bunch of these comparisons. I might try working on this over the weekend.
Brian
Stanford University
Stanford University
 Deviant Insider
 Auron
 Posts: 4486
 Joined: Sun Jun 13, 2004 6:08 am
 Location: Chicagoland
Re: Statistical Tiebreakers
Philosophically, I wonder if this is the best way to go.
Let's say you had two teams play common opponents. One of them narrowly wins all of its matches, going 50 with an average margin of victory of 50 PPG. The other one blows out all of its opponents but goes on a negfest against that other team, going 41 with an average margin of victory of 250 PPG. If I had to predict which team would win a rematch, I would pick the 41 team. However, if I had to select one team to go into the Championship Bracket, I would pick the 50 team. In other words, the best team and the team most deserving of advancement are not necessarily the same team.
My example is a bit extreme, but there have been plenty of cases similar to it. At IHSA Sectionals, four teams play a Round Robin with one team advancing. If the team generally considered the best loses to the team generally considered the second best, then that generally decides who advances even if it is a very narrow defeat as long as those two teams win their other matches by whatever scores they rack up.
The fact that you are talking about a tiebreaker somewhat alleviates this, but there is still an issue of whether the team with a higher PPG has earned the right to advance because answering more questions is an accomplishment as opposed to finding a more complex metric that may better predict success at the next level.
Let's say you had two teams play common opponents. One of them narrowly wins all of its matches, going 50 with an average margin of victory of 50 PPG. The other one blows out all of its opponents but goes on a negfest against that other team, going 41 with an average margin of victory of 250 PPG. If I had to predict which team would win a rematch, I would pick the 41 team. However, if I had to select one team to go into the Championship Bracket, I would pick the 50 team. In other words, the best team and the team most deserving of advancement are not necessarily the same team.
My example is a bit extreme, but there have been plenty of cases similar to it. At IHSA Sectionals, four teams play a Round Robin with one team advancing. If the team generally considered the best loses to the team generally considered the second best, then that generally decides who advances even if it is a very narrow defeat as long as those two teams win their other matches by whatever scores they rack up.
The fact that you are talking about a tiebreaker somewhat alleviates this, but there is still an issue of whether the team with a higher PPG has earned the right to advance because answering more questions is an accomplishment as opposed to finding a more complex metric that may better predict success at the next level.
David Reinstein
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Re: Statistical Tiebreakers
One problem I can foresee is that there is a lot of interaction between the different tiebreakers. For instance, teams with high bonus conversions generally have high points per game.
I do not have the base of tournaments necessary to run this, but it strikes me that a more prudent approach might be to record each game as a sixdimensional vector where:
1 means winner of game was higher in stat
0 means winner of game was equal or lower in stat
stats being WL record, headtohead, ppg, ppb, h2h differential, overall point differential
We then put this into a 2x2x2x2x2x2 matrix, where each entry is the number of games with that particular vector.
For each cell, moving the equivalent of down/right would be the equivalent of changing a 1 to a 0. We can then get a ranking of what's most important by comparing that cell with all cells "above" it. So if there were 45 110110 cells but only 24 100111 cells and only 12 101110, then for cell 100110, "flipping statistic 6" is more likely to explain the winner than "flipping statistic 2", which is more likely to explain the winner than "flipping statistic 3". For each cell, then, we would have a "ranking" of which 0s flipping to 1s are most likely to explain the winner, given that the 1s stay the same.
Among the 64 cells, we have:
1 ranking of 0 stats
6 rankings of 1 stat only
15 different rankings of 2 stats
20 different rankings of 3 stats
15 different rankings of 4 stats
6 different rankings of 5 stats
1 ranking of all 6 stats
We can then use any method we like to interpret these rankings ("play 30 games" pitting statistic A vs statistic B in the 16 cells they are ranked in, one ranked ahead in more cells wins the game, best WLT record wins, seems to me to be the best strategy; one could also use the 654321 system with 438 total points to determine order, look for interesting trends in the data, etc.)
I do not have the base of tournaments necessary to run this, but it strikes me that a more prudent approach might be to record each game as a sixdimensional vector where:
1 means winner of game was higher in stat
0 means winner of game was equal or lower in stat
stats being WL record, headtohead, ppg, ppb, h2h differential, overall point differential
We then put this into a 2x2x2x2x2x2 matrix, where each entry is the number of games with that particular vector.
For each cell, moving the equivalent of down/right would be the equivalent of changing a 1 to a 0. We can then get a ranking of what's most important by comparing that cell with all cells "above" it. So if there were 45 110110 cells but only 24 100111 cells and only 12 101110, then for cell 100110, "flipping statistic 6" is more likely to explain the winner than "flipping statistic 2", which is more likely to explain the winner than "flipping statistic 3". For each cell, then, we would have a "ranking" of which 0s flipping to 1s are most likely to explain the winner, given that the 1s stay the same.
Among the 64 cells, we have:
1 ranking of 0 stats
6 rankings of 1 stat only
15 different rankings of 2 stats
20 different rankings of 3 stats
15 different rankings of 4 stats
6 different rankings of 5 stats
1 ranking of all 6 stats
We can then use any method we like to interpret these rankings ("play 30 games" pitting statistic A vs statistic B in the 16 cells they are ranked in, one ranked ahead in more cells wins the game, best WLT record wins, seems to me to be the best strategy; one could also use the 654321 system with 438 total points to determine order, look for interesting trends in the data, etc.)
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Yeah, I think you're misunderstanding me. We're talking about situations in which we need a tiebreaker. I'm just proposing a measurement to determine which popular tiebreaker is actually the most valid (best correlate to winning.) Obviously the team with the best record should win regardless of whatever tiebreakers another team may hold against them.Shcool wrote:Philosophically, I wonder if this is the best way to go.
...The fact that you are talking about a tiebreaker somewhat alleviates this...
Dwight: I think you're misunderstanding the nature of what I'm proposing to do here. We don't want to compare WL because that isn't a tiebreaker; only WL against the same team. We can easily determine how predictive, for example, PPG differential is in the outcome of any game, but that isn't very useful because we can't make the same comparison to headtohead unless in the case of a repeat matchup, which means we can't isolate the other factors (so no direct comparison can be made.) Only in the case of a repeat matchup can we isolate all the factors. Also, the fact that the tiebreakers are correlated isn't important; the proposed measurement measures only the differences between them.
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Matt Weiner
 Sin
 Posts: 8411
 Joined: Fri Apr 11, 2003 8:34 pm
 Location: Richmond, VA
Re: Statistical Tiebreakers
It's an accepted principle in sports analysis (and yes, I will break my virulent opposition to sports analogies here because the sabermetricians are much more advanced with their data and mathematical thought than we are) that a loss or victory by a small margin means nothing, but longterm trends in scoring mean everything. Who wins a 280245 quizbowl game comes down to the luck of the draw in terms of whether that third arts tossup was better for one team's opera specialist or the other team's architecture player; who consistently scores 35 PPG more over the course of the tournament reliably indicates more knowledge or at least more ability to play quizbowl.
I don't like any appeals to "you're making the headtohead result meaningless" because:
1) the headtohead result IS meaningless, essentially, when we're talking about a tie situationthe teams must be very close in ability if they are tied, especially if the one who won the headtohead game then went and lost to someone who the opponent beat, which mathematically must happen in the "twoway tie at the top of the standings" scenario. If the headtohead result was a 300point blowout and then the winning team went and lost to someone who the losing team also beat by 300, then something is wrong with the questions. In the more usual scenario, if the headtohead result is a very close game, then it has very little value in determining who the better team would be in a longer series of games.
2) the headtohead result is taken into account to create the tie; without it, someone is 1 game ahead. That game has all the value in the world when we're talking about the difference between "you are 1 game ahead and you have won the tournament/earned the advantage in the final" and "well, I guess we have a tie now, let's find some way to break it." That's value enough for any one game without artificially adding any more.
I don't like any appeals to "you're making the headtohead result meaningless" because:
1) the headtohead result IS meaningless, essentially, when we're talking about a tie situationthe teams must be very close in ability if they are tied, especially if the one who won the headtohead game then went and lost to someone who the opponent beat, which mathematically must happen in the "twoway tie at the top of the standings" scenario. If the headtohead result was a 300point blowout and then the winning team went and lost to someone who the losing team also beat by 300, then something is wrong with the questions. In the more usual scenario, if the headtohead result is a very close game, then it has very little value in determining who the better team would be in a longer series of games.
2) the headtohead result is taken into account to create the tie; without it, someone is 1 game ahead. That game has all the value in the world when we're talking about the difference between "you are 1 game ahead and you have won the tournament/earned the advantage in the final" and "well, I guess we have a tie now, let's find some way to break it." That's value enough for any one game without artificially adding any more.
Matt Weiner
Founder of hsquizbowl.org
Founder of hsquizbowl.org
Re: Statistical Tiebreakers
The two longterm data trends that emerge from quizbowl games are PPG and PPB. I am in favor of using PPG (because it incorporates the entirety of quizbowl activity) when the teams have played common opponents. When teams haven't played common opponents, I think the only fair thing to do is to use bonus conversion, since that is much less affected by the opponents one plays than PPG is. Ideally PPB is contextneutral, but depending on how variation in packets and opponents line up, it might not be.
Andrew Hart
Minnesota alum
Minnesota alum
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Well, look; if you guys believe these sports analogies, they should be reflected in the measurement I'm proposing to make, so you have nothing to lose and everything to gain. More importantly, if you believe in reason, you can't advocate uncritically using one tiebreaker based on those arguments; rather, you are compelled to acknowledge that relying on a priori arguments when a posteriori evidence is available is the very pinnacle of unreasonable, unscientific thinking.
The case remains this: all else equal, longterm trends like PPG/PPB have lower fluctuations due to (massively) larger sample size, but are less predictive per datum, while the outcomes of previous games between tied teams are more predictive per datum, but potentially contain very large fluctuations. Therefore, until we can quantify things (which is exactly what I'm proposing to do,) doubt must remain regarding which is the better tiebreaker.
In short, both of you are compelled to advocate this comparison as the justification of your beliefs or abandon reason (and, concomitantly, your arguments), in which case you must either form a new argument or not argue against this measurement. So far, we have one datum indicating unit correlation to winning (and to one another) for headtohead, PPG, and PPB tiebreakers. I know we can do better than that.
MaS
The case remains this: all else equal, longterm trends like PPG/PPB have lower fluctuations due to (massively) larger sample size, but are less predictive per datum, while the outcomes of previous games between tied teams are more predictive per datum, but potentially contain very large fluctuations. Therefore, until we can quantify things (which is exactly what I'm proposing to do,) doubt must remain regarding which is the better tiebreaker.
In short, both of you are compelled to advocate this comparison as the justification of your beliefs or abandon reason (and, concomitantly, your arguments), in which case you must either form a new argument or not argue against this measurement. So far, we have one datum indicating unit correlation to winning (and to one another) for headtohead, PPG, and PPB tiebreakers. I know we can do better than that.
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
Matt:
Are you talking about 35 PPG over the course of an entire tournament that is true roundrobin, or one that has several divisions of teams? If it is the former kind of tournament, then state so explicitly; if it is the latter one, then your argument holds little weight since common opponents must be factored into any metric attempting to break a tie between 2 teams with identical records. A 35 PPG differential means little, if anything, if the only common opponent between Team A and Team B is the other one. Team A may have played some of their 6 Divisional games against Middle School teams while Team B played games against Dorman B, Charter C, RM D, among others? 35 PPG more for Team A means very little if they finish with the same record as Team B, but lost to them headtohead. If I am missing some piece of your argument, please clarify your post.
Are you talking about 35 PPG over the course of an entire tournament that is true roundrobin, or one that has several divisions of teams? If it is the former kind of tournament, then state so explicitly; if it is the latter one, then your argument holds little weight since common opponents must be factored into any metric attempting to break a tie between 2 teams with identical records. A 35 PPG differential means little, if anything, if the only common opponent between Team A and Team B is the other one. Team A may have played some of their 6 Divisional games against Middle School teams while Team B played games against Dorman B, Charter C, RM D, among others? 35 PPG more for Team A means very little if they finish with the same record as Team B, but lost to them headtohead. If I am missing some piece of your argument, please clarify your post.
Elliott Rountree
Lambert Coach, 2017present
Buford Coach, 20142015
Chattahoochee Coach, 20042014
GATA, 2018present; 20072014
ACE, 20002013
Lambert Coach, 2017present
Buford Coach, 20142015
Chattahoochee Coach, 20042014
GATA, 2018present; 20072014
ACE, 20002013
 Mechanical Beasts
 Banned Cheater
 Posts: 5673
 Joined: Thu Jun 08, 2006 10:50 pm
Re: Statistical Tiebreakers
I think in general people don't support PPG comparisons unless they're made against teams with all common opponents; if you have to compare across brackets, you always prefer PPB to PPG. The only rare circumstance in which this fails is if Team A wins all its games 6000, getting twenty tossups per game and 20PPB, and team B wins all its games 800, getting two tossups per game and 30PPBor some more realistic corner case, I suppose. But this relies on absolutely atrocious bracket balance. Getting at least decent bracket balance means that the teams that only get two tossups per gamethe teams for which PPB means little due to a relatively small samplewill also lose a whole lot.elrountree wrote:Matt:
Are you talking about 35 PPG over the course of an entire tournament that is true roundrobin, or one that has several divisions of teams? If it is the former kind of tournament, then state so explicitly; if it is the latter one, then your argument holds little weight since common opponents must be factored into any metric attempting to break a tie between 2 teams with identical records. A 35 PPG differential means little, if anything, if the only common opponent between Team A and Team B is the other one. Team A may have played some of their 6 Divisional games against Middle School teams while Team B played games against Dorman B, Charter C, RM D, among others? 35 PPG more for Team A means very little if they finish with the same record as Team B, but lost to them headtohead. If I am missing some piece of your argument, please clarify your post.
Andrew Watkins
 Matt Weiner
 Sin
 Posts: 8411
 Joined: Fri Apr 11, 2003 8:34 pm
 Location: Richmond, VA
Re: Statistical Tiebreakers
I mean PPG within the roundrobin that produced the tie, of course.
Matt Weiner
Founder of hsquizbowl.org
Founder of hsquizbowl.org
 Deviant Insider
 Auron
 Posts: 4486
 Joined: Sun Jun 13, 2004 6:08 am
 Location: Chicagoland
Re: Statistical Tiebreakers
You're not correct, Mike.
To continue with sports analogies, there is overwhelming evidence that the Patriots were the best team in the NFL last year. However, that does not mean that they should be considered the NFL Champions. Titles and playoff berths go to teams that earn them through criteria decided ahead of time, not to teams that prove themselves the greatest statistically.
If somebody knowledgeable with statistics goes through a large amount of data, they could produce a complex formula to determine which teams are better than which other teams. They will not find that PPG is always the best predictorthey will find that PPG correlates to a certain extent with being better, PPB correlates to a certain extent, team record correlates, etc. There very well could be correlations with the number of negs and, in NAQT tournaments, with the number of powers. If somebody wants to, as best as possible, determine which team is better, then they will need a formula that takes all available correlating statistics into account. Is your goal to use such a formula to break ties?
To continue with sports analogies, there is overwhelming evidence that the Patriots were the best team in the NFL last year. However, that does not mean that they should be considered the NFL Champions. Titles and playoff berths go to teams that earn them through criteria decided ahead of time, not to teams that prove themselves the greatest statistically.
If somebody knowledgeable with statistics goes through a large amount of data, they could produce a complex formula to determine which teams are better than which other teams. They will not find that PPG is always the best predictorthey will find that PPG correlates to a certain extent with being better, PPB correlates to a certain extent, team record correlates, etc. There very well could be correlations with the number of negs and, in NAQT tournaments, with the number of powers. If somebody wants to, as best as possible, determine which team is better, then they will need a formula that takes all available correlating statistics into account. Is your goal to use such a formula to break ties?
David Reinstein
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
 AlphaQuizBowler
 Tidus
 Posts: 695
 Joined: Mon Dec 03, 2007 6:31 pm
 Location: Alpharetta, GA
Re: Statistical Tiebreakers
This part of your post makes no sense to me. The whole point of this thread is to decide "ahead of time" the criteria to use to break ties in tournaments. The goal is not to retroactively change the outcome of tournaments, as calling the Patriots the NFL Champions would be, but to find a fair way to do it in the future.To continue with sports analogies, there is overwhelming evidence that the Patriots were the best team in the NFL last year. However, that does not mean that they should be considered the NFL Champions. Titles and playoff berths go to teams that earn them through criteria decided ahead of time, not to teams that prove themselves the greatest statistically.
William
Alpharetta High School '11
Harvard '15
Alpharetta High School '11
Harvard '15
Re: Statistical Tiebreakers
I think that it is actually useful to determine how predictive any given stat is in the outcome of any given game (in order to better quantify "upsets" for instance). However, seeing what you are actually trying to do now, this seems to be a project reserved for a later time.Captain Scipio wrote:Dwight: I think you're misunderstanding the nature of what I'm proposing to do here. We don't want to compare WL because that isn't a tiebreaker; only WL against the same team. We can easily determine how predictive, for example, PPG differential is in the outcome of any game, but that isn't very useful because we can't make the same comparison to headtohead unless in the case of a repeat matchup, which means we can't isolate the other factors (so no direct comparison can be made.) Only in the case of a repeat matchup can we isolate all the factors. Also, the fact that the tiebreakers are correlated isn't important; the proposed measurement measures only the differences between them.
Are you looking for repeat matchups, or just repeat matchups between teams of the same record? If the latter, here are some data points. Statistics are calculated at the instantaneous point in time that the match began, not data from the entire tournament. If you want to know entire tournament data you can calculate that yourself but it's mostly similar.
2007 TWAIN, Round 11: UCLA B vs. UCI A. Both teams 82. UCLA B held headtohead advantage; UCI A held all other tiebreakers. UCI A def UCLA B 260110.
At least one other example probably exists from 2007 TWAIN, but due to UCLA's policy of not counting rounds I don't know which one(s) it is.
2007 Aztlan Cup, Round 11: UCI 1 vs. UCLA 1. Both teams 91*. UCI 1 held all tiebreakers except h2h, which was split, and won finals match 440115.
2006 ACF Fall, Round 10: Caltech vs Stanford B. Both teams 63*. Caltech held ppb tiebreaker, point differential tiebreaker; Stanford B held h2h tiebreaker and h2h differential tiebreaker; ppg tiebreaker was negligible (Caltech 303 to Stanford B 300). Stanford B def Caltech 340275.
2006 ACF Fall, Round 9: UCLA vs Stanford B. Both teams 62. UCLA held all tiebreakers and won rematch 450205.
2006 ACF Fall, Round 9: Caltech vs Stanford A. Both teams 53. Caltech held headtohead and h2h differential tiebreaker; Stanford A held ppg, point differential, ppb tiebreaker. Caltech def Stanford A 385210.
2006 ACF Fall, Round 9: UCI vs Berkeley. Both teams 17. UCI held h2h and h2h differential tiebreaker, Berkeley held ppg, point differential, ppb tiebreaker. Berkeley def UCI 265155.
2006 Aztlan Cup, Round 10?: USC vs UCSD. Both teams 71*. USC held PPG tiebreaker, h2h, h2h differential tiebreaker, point differential tiebreaker; UCSD held PPB tiebreaker. USC def UCSD by unknown score.
2006 ACF Regionals, Round 5: UCLA vs Berkeley. Both teams 31. UCLA held h2h, h2h differential, ppb tiebreaker. Berkeley held ppg, point differential tiebreaker. UCLA def Berkeley 290220.
2006 SCT West D1, Round 9: UCLA vs Stanford. Both teams 62*. UCLA held ppth advantage, point differential advantage, headtohead differential advantage; Stanford held bonus conversion advantage; headtohead was split 11. UCLA def Stanford 450320.
2006 SCT West D1, Round 15: UCLA vs Stanford. Both teams 113*. UCLA held ppth advantage, point differential advantage, headtohead differential advantage; head to head was split 11 and bonus conversion was negligible (18.71 for UCLA to 18.67 for Stanford). UCLA def Stanford 470185.
2005 ACF Regionals, Round 14: Berkeley A vs Berkeley B. Both teams 101*. Berkeley B held h2h and h2h differential tiebreaker, ppg tiebreaker. Berkeley A narrowly held ppb tiebreaker (difference of about .2 ppb). Berkeley A def Berkeley B 250210.
2004 ACF Fall, Round 11: Berkeley A vs Berkeley C. Both teams 73*. Berkeley A held h2h differential tiebreaker. Berkeley C held ppg, ppb, point differential tiebreaker. h2h was split 11. Berkeley C def Berkeley A 475225.
2004 SCT West D2, Round 13: Berkeley Well vs Stanford Incoln. Both teams 66. Berkeley Well held h2h differential, ppg, ppb, narrowly held point differential tiebreaker (by about 2 ppg), head to head was split 11. Berkeley Well def Stanford Incoln 215160.
2004 SCT West D2, Round 12: Caltech vs Stanford Incoln. Both teams 65. Caltech held h2h differential, ppg, point differential, narrowly held ppb (about .2 difference), h2h was split 11. Caltech def Stanford Incoln 280230.
2004 Cardinal Classic, Round Finals: Berkeley Jeff vs Berkeley David. Both teams 111. Berkeley Jeff held ppg, ppb, point differential tiebreakers. Berkeley David held h2h, h2h differential tiebreaker. Berkeley Jeff def Berkeley David 350225.
2003 Cardinal Junior Bird, Round 5: Berkeley STP vs Stanford A. Both teams 13. Berkeley STP held all tiebreakers. Berkeley STP def Stanford A 395220.
2003 ACF Fall, Round 14: Berkeley Untitled vs UCLA. Both teams 58*. Berkeley Untitled held ppg, point differential, h2h, h2h differential tiebreakers. UCLA held ppb tiebreaker. UCLA def Berkeley Untitled 330150.
2003 ACF Fall, Round 11: Stanford Old vs Berkeley Kids. Both teams 82. Stanford Old held h2h, h2h differential, ppb tiebreakers. Berkeley Kids held ppg and point differential tiebreakers. Berkeley Kids def Stanford Old 350280.
2003 ACF Fall, Round 11: Berkeley Untitled vs Berkeley Discovery. Both teams 37. Berkeley Untitled held h2h and h2h differential. Berkeley Discovery held ppg, point differential, ppb tiebreakers. Berkeley Untitled def Berkeley Discovery 250210.
2003 ACF Fall, Round 9: Berkeley Nominalists vs UCLA. Both teams 35. Berkeley Nominalists held ppg, ppb, point differential tiebreakers. UCLA held h2h and h2h differential tiebreakers. Berkeley Nominalists def UCLA 310290.
2003 ACF Fall, Round 8: Berkeley Discovery vs UCLA. Both teams 25*. Berkeley Discovery held all tiebreakers. UCLA def Berkeley Discovery 280230.
2003 Buzzerfest Mirror at Stanford, Round 10: Stanford vs Berkeley C. Both teams 72*. Stanford held h2h tiebreaker. Berkeley C held ppg, ppb, point differential, narrowly held h2h tiebreaker (+20 over 3 games). Stanford def Berkeley C 300245.
*teams had played exactly the same opponents except for each other.
More coming.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Not only am I not wrong, but you've said almost nothing even germane to what I'm saying. I'm begging you and everyone to stop arguing with sports analogies (or really any analogies whatsoever): you're only confusing yourselves. Please look at the actual situation at hand.
Nobody is talking about supplanting winning and losing games to determine tournament winners. That has nothing to do with anything. This thread is about measuring which is the best tiebreaker*. Of course, it would be easy to regress any number of formulae onto winning percentage as you say, but that's not of interest here.
So, again, what we want to do here is to practically compare commonly used (or usable) tiebreakers. I've devised a method that seems to isolate other factors and allows us to draw an immediate conclusion regarding which is the best (most predictive) among the three common tiebreakers (PPG differential over common opponents, PPB differential, headtohead.) If you or anyone else has an easily computable tiebreaker formula that you'd like to see in use, I invite you to publish it here: any such should be comparable by the method I've outlined. Of course, given enough data, we could use regression to determine a statistically best tiebreaker, but let's worry about that later.
MaS
*Maybe people are confused on this point. A tiebreaker is used to choose the best among several teams with equal records to determine, for example, seeding or sometimes other things. The impetus for this thread was a dispute in a previous thread regarding a tiebreaker to award a tournament championship, so it a positive fact that things like that are happening.
Nobody is talking about supplanting winning and losing games to determine tournament winners. That has nothing to do with anything. This thread is about measuring which is the best tiebreaker*. Of course, it would be easy to regress any number of formulae onto winning percentage as you say, but that's not of interest here.
So, again, what we want to do here is to practically compare commonly used (or usable) tiebreakers. I've devised a method that seems to isolate other factors and allows us to draw an immediate conclusion regarding which is the best (most predictive) among the three common tiebreakers (PPG differential over common opponents, PPB differential, headtohead.) If you or anyone else has an easily computable tiebreaker formula that you'd like to see in use, I invite you to publish it here: any such should be comparable by the method I've outlined. Of course, given enough data, we could use regression to determine a statistically best tiebreaker, but let's worry about that later.
MaS
*Maybe people are confused on this point. A tiebreaker is used to choose the best among several teams with equal records to determine, for example, seeding or sometimes other things. The impetus for this thread was a dispute in a previous thread regarding a tiebreaker to award a tournament championship, so it a positive fact that things like that are happening.
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Dwight, yeah, that's massively awesome. Is there an easy way to get those data?
MaS
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
2007 WIT, Round 10: Chicago A vs Stanford A. Both teams 81*. Chicago A held all tiebreakers. Chicago A def Stanford A 330260.
2007 WIT, Round 10: Berkeley B vs Stanford B. Both teams 45*. Berkeley B held h2h, h2h differential, point differential. Stanford B held ppb and narrowly held ppg (by about 3 ppg). Stanford B def Berkeley B 290205.
2007 SCT West, Round 14: Berkeley 1 vs Stanford A1. Both teams 112*. Stanford A1 held h2h differential. Berkeley 1 held ppg, ppb, point differential. h2h was split. Berkeley 1 def Stanford A1 340265.
2007 SCT West, Round 13: Stanford B1 vs USC 2. Both teams 39*. Stanford B1 held h2h, h2h differential, narrowly held ppg (by 0.06 ppth). USC 2 held ppb, point differential. USC 2 def Stanford B1 16085.
2007 SCT West, Round 12: USC 1 vs Caltech. Both teams 65. USC held h2h, h2h differential. Caltech held ppg, ppb, point differential. Caltech def USC 1 335250.
2006 WIT, Round 10: Berkeley B vs Chicago B. Both teams 35*. Berkeley B held ppg, ppb. Chicago B held h2h, h2h differential, narrowly held point differential (by about 8 ppg). Chicago B def Berkeley B 220150.
2005 WIT, Round 8: UCLA vs Stanford B. Both teams 25*. Stanford B held all tiebreakers. Stanford B def UCLA 235225.
2005 TRASH Regionals, Round 9: Mich Alums vs UCLA. Both teams 44. UCLA held h2h and h2h differential; Mich Alums held ppg, ppb, point differential. UCLA def Mich Alums 205165.
2005 TRASH Regionals, Round 8: Mich Alums vs. Berkeley. Both teams 43. Berkeley held all tiebreakers and won 270135.
2005 ACF Fall, Round 9: UCLA A vs Stanford C. Both teams 71. UCLA A held ppb tiebreaker; Stanford C held point differential, h2h, h2h differential, narrowly held ppg (by about 5 ppg). Stanford C def UCLA A 340305.
2005 ACF Fall, Round 9: UCLA B vs Stanford B. Both teams 26. UCLA B held h2h, h2h differential, ppb, narrowly held ppg (by about 1 ppg). Stanford B held point differential. Stanford B def UCLA B 195125.
2005 ACF Fall, Round 8: Berkeley A vs Stanford C. Both team 61*. Berkeley A held h2h, h2h differential, ppb, narrowly held ppg (by about 5 ppg). Stanford C narrowly held point differential (by about 2 ppg). Stanford C def Berkeley A 425190.
2005 BLaST, Round 15: Berkeley D vs Stanford. Both teams 86*. Berkeley D held point differential, narrowly held ppg (by about 7 ppg). Stanford held h2h, h2h differential, ppb. Berkeley D def Stanford 215135.
2005 BLaST, Round 12: Chicago A vs Berkeley B. Both teams 92. Chicago A held ppg, point differential. Berkeley B held h2h, h2h differential, ppb. Berkeley B def Chicago A 360270.
2004 ACF Regionals, Round 10: Berkeley Jerry vs Stanford. Both teams 36*. Berkeley Jerry held h2h, h2h differential. Stanford held ppg, ppb, point differential. Berkeley Jerry def Stanford 210160.
*teams played exact same opponents except for each other.
2007 WIT, Round 10: Berkeley B vs Stanford B. Both teams 45*. Berkeley B held h2h, h2h differential, point differential. Stanford B held ppb and narrowly held ppg (by about 3 ppg). Stanford B def Berkeley B 290205.
2007 SCT West, Round 14: Berkeley 1 vs Stanford A1. Both teams 112*. Stanford A1 held h2h differential. Berkeley 1 held ppg, ppb, point differential. h2h was split. Berkeley 1 def Stanford A1 340265.
2007 SCT West, Round 13: Stanford B1 vs USC 2. Both teams 39*. Stanford B1 held h2h, h2h differential, narrowly held ppg (by 0.06 ppth). USC 2 held ppb, point differential. USC 2 def Stanford B1 16085.
2007 SCT West, Round 12: USC 1 vs Caltech. Both teams 65. USC held h2h, h2h differential. Caltech held ppg, ppb, point differential. Caltech def USC 1 335250.
2006 WIT, Round 10: Berkeley B vs Chicago B. Both teams 35*. Berkeley B held ppg, ppb. Chicago B held h2h, h2h differential, narrowly held point differential (by about 8 ppg). Chicago B def Berkeley B 220150.
2005 WIT, Round 8: UCLA vs Stanford B. Both teams 25*. Stanford B held all tiebreakers. Stanford B def UCLA 235225.
2005 TRASH Regionals, Round 9: Mich Alums vs UCLA. Both teams 44. UCLA held h2h and h2h differential; Mich Alums held ppg, ppb, point differential. UCLA def Mich Alums 205165.
2005 TRASH Regionals, Round 8: Mich Alums vs. Berkeley. Both teams 43. Berkeley held all tiebreakers and won 270135.
2005 ACF Fall, Round 9: UCLA A vs Stanford C. Both teams 71. UCLA A held ppb tiebreaker; Stanford C held point differential, h2h, h2h differential, narrowly held ppg (by about 5 ppg). Stanford C def UCLA A 340305.
2005 ACF Fall, Round 9: UCLA B vs Stanford B. Both teams 26. UCLA B held h2h, h2h differential, ppb, narrowly held ppg (by about 1 ppg). Stanford B held point differential. Stanford B def UCLA B 195125.
2005 ACF Fall, Round 8: Berkeley A vs Stanford C. Both team 61*. Berkeley A held h2h, h2h differential, ppb, narrowly held ppg (by about 5 ppg). Stanford C narrowly held point differential (by about 2 ppg). Stanford C def Berkeley A 425190.
2005 BLaST, Round 15: Berkeley D vs Stanford. Both teams 86*. Berkeley D held point differential, narrowly held ppg (by about 7 ppg). Stanford held h2h, h2h differential, ppb. Berkeley D def Stanford 215135.
2005 BLaST, Round 12: Chicago A vs Berkeley B. Both teams 92. Chicago A held ppg, point differential. Berkeley B held h2h, h2h differential, ppb. Berkeley B def Chicago A 360270.
2004 ACF Regionals, Round 10: Berkeley Jerry vs Stanford. Both teams 36*. Berkeley Jerry held h2h, h2h differential. Stanford held ppg, ppb, point differential. Berkeley Jerry def Stanford 210160.
*teams played exact same opponents except for each other.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Amusingly, the first batch of isolated data (the *'d data from Dwight's first post) indicate that all tiebreakers have a correlation of 0.5. Further reports as processing proceeds.
MaS
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
Mike, I take it that you're only looking at the *'d ones, so that's what I'll keep looking for. Unfortunately, because a lot of tournaments don't have round numbers attached, and because you have to manually add and subtract things (SQBS won't necessarily give you a tournament snapshot after, e.g., Round 9 of a 12 round tournament), I'm not sure there's an easier way to do this kind of thing. That said, if people want to put in the time and scour stats pages, this is what you should look for:
A tournament small enough to run a full round robin (usually <15 teams). Anything more and you get bracketed round robins, which skews the data. In these tournaments, the first or last game in a playoff bracket, or a finals game, is guaranteed to be between teams that have faced the exact same opponents (except for themselves). It's just then a matter of manually sorting through that subset of games to find ones between teams of the same record.
If you prefer the endoftournament overall data to the instantaneouspointintime data, then it's easier to just read numbers off the page; I think that the instantaneouspointintime data is more correct to use, but it's also more timeconsuming to get.
A tournament small enough to run a full round robin (usually <15 teams). Anything more and you get bracketed round robins, which skews the data. In these tournaments, the first or last game in a playoff bracket, or a finals game, is guaranteed to be between teams that have faced the exact same opponents (except for themselves). It's just then a matter of manually sorting through that subset of games to find ones between teams of the same record.
If you prefer the endoftournament overall data to the instantaneouspointintime data, then it's easier to just read numbers off the page; I think that the instantaneouspointintime data is more correct to use, but it's also more timeconsuming to get.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Hi Dwight,
Well, thanks for your effort, then! The pointintime data are indeed what we want here, thought even the wholetournament data have some validity. The unstarred data will be included later, but I consider them to be less predictive (since there are more nonisolated factors; the starred data isolate everything possible.)
MaS
Well, thanks for your effort, then! The pointintime data are indeed what we want here, thought even the wholetournament data have some validity. The unstarred data will be included later, but I consider them to be less predictive (since there are more nonisolated factors; the starred data isolate everything possible.)
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
With 19 games (all the *'d data): total points difference, 0.6 > PPB difference, 0.55 > HH, 0.53 > HH point difference, 0.5 = PPG difference, 0.5. I'll now include the nonstarred data. If anyone else can get me more, I've found a method to enter them pretty quickly. I may just post the spreadsheet on Google Docs to let people enter them by themselves.
MaS
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Okay, with all the data entered (came to 38 games) we've got:
Commonopponent data: Point difference, 60%; Bonus points per bonus heard difference, 55%; HeadtoHead Result, 53.33%; Points per game/points per tossup heard difference, 50%; Headtohead points difference, 50%.
All data: Point difference, 65.79%; Bonus points per bonus heard difference, 60.53%; Points per game/points per tossup heard difference, 57.89%; Headtohead points difference, 57.89%; HeadtoHead result, 54.84%.
So, at this point, I'll conclude three things:
1. Point difference is the best tiebreaker in these data by a fair margin.
2. No normal tiebreaker significantly outperforms any other; they're all in the 5070% range at predicting the right winner of an actual game.
3. Relatedly, no standard tiebreaker is very good, so meaningful ties should absolutely be played off if a tournament wants to find a fair winner.
I'd further suggest that, as I have little faith point 3 will carry the weight it ought, that we ought to takeup Coach Reinstein's suggestion and consider a better, composite tiebreaker. I'm open to suggestions in this area and will gladly test any. If we can find enough data, I will try a regression study.
I'll add the caveat that I'm currently confused about one thing in these data: how can a team hold points per game but not point difference if they've played the same number of games? Perhaps I've misunderstood what Dwight meant by point differential; I took that to mean difference in total points scored. Dwight, please let me know what's up; I can update this easily to reflect whatever changes.
MaS
PS: Perhaps point differential means, like, the difference between the teams' mean point difference per match. That might explain the discrepancy.
Commonopponent data: Point difference, 60%; Bonus points per bonus heard difference, 55%; HeadtoHead Result, 53.33%; Points per game/points per tossup heard difference, 50%; Headtohead points difference, 50%.
All data: Point difference, 65.79%; Bonus points per bonus heard difference, 60.53%; Points per game/points per tossup heard difference, 57.89%; Headtohead points difference, 57.89%; HeadtoHead result, 54.84%.
So, at this point, I'll conclude three things:
1. Point difference is the best tiebreaker in these data by a fair margin.
2. No normal tiebreaker significantly outperforms any other; they're all in the 5070% range at predicting the right winner of an actual game.
3. Relatedly, no standard tiebreaker is very good, so meaningful ties should absolutely be played off if a tournament wants to find a fair winner.
I'd further suggest that, as I have little faith point 3 will carry the weight it ought, that we ought to takeup Coach Reinstein's suggestion and consider a better, composite tiebreaker. I'm open to suggestions in this area and will gladly test any. If we can find enough data, I will try a regression study.
I'll add the caveat that I'm currently confused about one thing in these data: how can a team hold points per game but not point difference if they've played the same number of games? Perhaps I've misunderstood what Dwight meant by point differential; I took that to mean difference in total points scored. Dwight, please let me know what's up; I can update this easily to reflect whatever changes.
MaS
PS: Perhaps point differential means, like, the difference between the teams' mean point difference per match. That might explain the discrepancy.
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Haaaaaaaarry Whiiiiiiiiiite
 Auron
 Posts: 1110
 Joined: Tue Dec 04, 2007 8:46 pm
 Location: Fairfax, VA
 Contact:
Re: Statistical Tiebreakers
If Team A has 350 PPG and 275 PPGA while Team B has 300 PPG and 200 PPGA, then Team A has higher PPG while Team B has higher point differential. Usually that means that Team B is better at answering tossups (hence less chance for the opponent to score) but worse at bonuses (hence lower PPG).Captain Scipio wrote:I'll add the caveat that I'm currently confused about one thing in these data: how can a team hold points per game but not point difference if they've played the same number of games? Perhaps I've misunderstood what Dwight meant by point differential; I took that to mean difference in total points scored. Dwight, please let me know what's up; I can update this easily to reflect whatever changes.
Also, while the data crunching is neat and all, I think the margin of error is way too great for what we have right now. But then again, I'm just eyeballing these numbers.
Re: Statistical Tiebreakers
This is exactly what I meant, and exactly what I think that statistic means (which is why it would be useful as a tiebreaker).hwhite wrote:If Team A has 350 PPG and 275 PPGA while Team B has 300 PPG and 200 PPGA, then Team A has higher PPG while Team B has higher point differential. Usually that means that Team B is better at answering tossups (hence less chance for the opponent to score) but worse at bonuses (hence lower PPG).
Mike, since I've given the exact scores for something like 37 of those games, would it be possible to run a regression involving not just who wins, but by how much (e.g. if team A holds PPG tiebreaker, but team B holds headtohead, and team A beats team B 230180, then it would be +50 for the PPG tiebreaker and 50 for the h2h tiebreaker).
Harry, can you elaborate about the margin of error? I think Mike is saying exactly that when he claims that no statistic significantly outperforms any other, though he hasn't quantified that significance/error.
I'll see if I can scrounge up some more data for allelseequal matches.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Haaaaaaaarry Whiiiiiiiiiite
 Auron
 Posts: 1110
 Joined: Tue Dec 04, 2007 8:46 pm
 Location: Fairfax, VA
 Contact:
Re: Statistical Tiebreakers
(N.b. I don't claim to be a statistician, nor have I taken a statistics course, so I could be wrong)cvdwightw wrote:Harry, can you elaborate about the margin of error? I think Mike is saying exactly that when he claims that no statistic significantly outperforms any other, though he hasn't quantified that significance/error.
If you remember the presidential election polls, it works in the same way. Long story short, if you want 80% confidence (which is rather low, but then again, tiebreaking is not perfect to begin with), then with the current sample size of 38, you have a 10% margin of error, which means that no tiebreaker is statistically significantly better than the other (HH could be 10% higher than reported, and PPG difference could be 10% lower than reported). If you increase your sample size to 100 games, you'll be down to a 6% margin of error, which may start to allow you to confidently (statisticallywise) rule out options.
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Dwight, yeah, I don't see why not. Perhaps for future work.
MaS
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
This looks pretty cool.
Regarding what the margin of error on these numbers is, I'm pretty sure you can just use a binomial
distribution. In that case, the standard deviation on the number of successes is just sqrt(n*p*(1p)).
So, for example, Mike said point difference had a success rate of 65.79 percent, out of 38 games. In
other words, there were 25 successes in 38 games. The error on that is sqrt(38*0.6579*(10.6579))
= 2.92. So, we have (25 +/ 2.92)/38 = 0.6579 +/ 0.0768. The errors on the other numbers will
be similar. So, I agree with Harry that the errors on these numbers are too big to say definitively
which tiebreaker is the best. We probably need to lower the error from the current 7.7 percent to
about 3 percent or less to say with much confidence which tiebreaker is the best. Since error scales
like 1/sqrt(n), this means we might need 6 times more data than we currently have. Whether that's
feasible or not I don't know.
Regarding what the margin of error on these numbers is, I'm pretty sure you can just use a binomial
distribution. In that case, the standard deviation on the number of successes is just sqrt(n*p*(1p)).
So, for example, Mike said point difference had a success rate of 65.79 percent, out of 38 games. In
other words, there were 25 successes in 38 games. The error on that is sqrt(38*0.6579*(10.6579))
= 2.92. So, we have (25 +/ 2.92)/38 = 0.6579 +/ 0.0768. The errors on the other numbers will
be similar. So, I agree with Harry that the errors on these numbers are too big to say definitively
which tiebreaker is the best. We probably need to lower the error from the current 7.7 percent to
about 3 percent or less to say with much confidence which tiebreaker is the best. Since error scales
like 1/sqrt(n), this means we might need 6 times more data than we currently have. Whether that's
feasible or not I don't know.
Brian
Stanford University
Stanford University
Re: Statistical Tiebreakers
Considering that this is just from one small, isolated circuit that doesn't run a lot of tournaments (as compared to, say, the Midwest), we should be able to find (hopefully) a nearequivalent amount of data from the Midwest, Northeast, MidAtlantic, and Southeast circuits. Plus, there's an entire high school circuit, if we can find small enough tournaments that run double RR or single RR + playoff brackets. I'd say it's feasible to get a sample size of ~200250 games if we work at it and include anything between teams of the same record (not ideal, but hey, it's the best we can do if we're looking at 250 games).Schweizerkas wrote:Since error scales like 1/sqrt(n), this means we might need 6 times more data than we currently have. Whether that's feasible or not I don't know.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
Re: Statistical Tiebreakers
I don't think we can work with the assumption that data trends in past quizbowl matches necessarily predict game results of future matches. I'm unconvinced that the body of quizbowl match results as a whole to this point represent the expected outcomes of matches to come, and I certainly reject outright the idea that a data set that mixes noncommon and commonopponent schedules, is heavily skewed towards sketchily edited west coast sets, TRASH regionals, and IS set tournaments, and has a whole bevy other other problems has any useful extrapolative value whatsoever. These data stem from activities whose commonality barely extends past the use of questions and buzzers. Who says that stirring up all of these (or any other concoction) yields something that will be predictive for future quizbowl as a whole, or more importantly, any individual tournament?
I hold that there is a hefty burden that resides with those who advocate using past data to remake the tiebreaker system, and that that burden is to show that there is a predictive relationship between what has happened in the past and what will happen in the future. Unless someone can show that one tiebreaker stands above the rest regardless of the type of questions, level of competition, a team's slate of opponents, and a plethora of other variables, I don't think we can safely use this kind of data at all.
I still believe that it's best to set a reasonable, intuitive goalpost as the tiebreaker and stick to that. As we see above, points per game and points per bonus correlate similarly to other methods; even if you claim that the above data are valid, you are still forced to admit that the traditional tiebreakers of PPG and PPB appear to be about as useful as any other proposed method.
Moreover, they have the benefit of being both intuitive and positive. It makes a lot of sense that the better team will score more points against common opponents, or score more points per bonus on a differing schedule. Furthermore, it's a positive tiebreaker; you start from zero and go up, there is a goalpost out there, and once you pass it and another team doesn't, you win the tiebreaker. Which is more appealing, that a team should strive to score as many total points and as many points per bonus as possible, or that a team should hope that their margin of victory in one game (or some amalgamation of all of the proposed tiebreakers that historically boosts correlation by X%) was good enough that results from 1994 Wahoo Wars combined with data from Tartan Tussle XX will indicate that they have a 2.5% better chance of winning a followup game?
In sum, I hold that Mike's argument that we must reject theory (which amounts to intuition and reason coupled with practice) because there are data out there is ludicrous. There is no reason at all to take at face value these data as useful.
I hold that there is a hefty burden that resides with those who advocate using past data to remake the tiebreaker system, and that that burden is to show that there is a predictive relationship between what has happened in the past and what will happen in the future. Unless someone can show that one tiebreaker stands above the rest regardless of the type of questions, level of competition, a team's slate of opponents, and a plethora of other variables, I don't think we can safely use this kind of data at all.
I still believe that it's best to set a reasonable, intuitive goalpost as the tiebreaker and stick to that. As we see above, points per game and points per bonus correlate similarly to other methods; even if you claim that the above data are valid, you are still forced to admit that the traditional tiebreakers of PPG and PPB appear to be about as useful as any other proposed method.
Moreover, they have the benefit of being both intuitive and positive. It makes a lot of sense that the better team will score more points against common opponents, or score more points per bonus on a differing schedule. Furthermore, it's a positive tiebreaker; you start from zero and go up, there is a goalpost out there, and once you pass it and another team doesn't, you win the tiebreaker. Which is more appealing, that a team should strive to score as many total points and as many points per bonus as possible, or that a team should hope that their margin of victory in one game (or some amalgamation of all of the proposed tiebreakers that historically boosts correlation by X%) was good enough that results from 1994 Wahoo Wars combined with data from Tartan Tussle XX will indicate that they have a 2.5% better chance of winning a followup game?
In sum, I hold that Mike's argument that we must reject theory (which amounts to intuition and reason coupled with practice) because there are data out there is ludicrous. There is no reason at all to take at face value these data as useful.
Andrew Hart
Minnesota alum
Minnesota alum
Re: Statistical Tiebreakers
What does this even mean? All the proposed tiebreakers and combinations of tiebreakers hold the following: it is better to win a game than not, it is better to answer tossups than not, it is better to answer bonus parts than not. We're using West Coast data because I know where those stats are and no one else has volunteered data.theMoMA wrote:Moreover, they have the benefit of being both intuitive and positive. It makes a lot of sense that the better team will score more points against common opponents, or score more points per bonus on a differing schedule. Furthermore, it's a positive tiebreaker; you start from zero and go up, there is a goalpost out there, and once you pass it and another team doesn't, you win the tiebreaker. Which is more appealing, that a team should strive to score as many total points and as many points per bonus as possible, or that a team should be hope that their margin of victory in one game (or some amalgamation of all of the proposed tiebreakers that historically boosts correlation by X%) was good enough that results from 1994 Wahoo Wars combined with data from Tartan Tussle XX will indicate that they have a 2.5% better chance of winning a followup game?
Data is useful because it confirms intuition. Since there are good arguments to be made for various tiebreakers, it follows that we must go to whatever data is available, or collect new data, in order to verify one or more of these arguments. After all, in Georgia, they consider head to head to be "intuitive", a view with which you appear to disagree  therefore there is not a consensus on what is "intuitive". If you have a better set of data on immaculate questions with perfectly opponentcontrolled matches, I'd love to see it, because it would be the best data set out there. But I don't think using the data the we do have is somehow invalid.
We're going back to the instantaneous point in a tournament at which the rematch occurs, and predicting which team will win given results of that tournament up to that point. We already know the result, so we're testing how often our predictor is right. 50% means it's a bad predictor, <50% means it's predicting that the team with the better stat will lose the game more often than it will win it. Can we extrapolate this to the future? I don't see why not. We're already pretty certain it can't replace tiebreaker matches, and as more tournaments happen we can feed more data into the machine and come up with the best "approximation" of a tiebreaker match for tournaments that don't have the luxury of that extra packet. I argue that doing this is independent of question quality and independent of strength of schedule; heck, I'm back with my "let's use W/L and predict outcomes of every match" suggestion.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Deviant Insider
 Auron
 Posts: 4486
 Joined: Sun Jun 13, 2004 6:08 am
 Location: Chicagoland
Re: Statistical Tiebreakers
The people pointing out that the data above is inconclusive are correct. If anything, they are understating how inconclusive it is. If two people each toss a coin 38 times, the expected value for the difference in the number of heads each one gets is about 3.5 heads, or about 9%.
David Reinstein
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
First of all, if you don't like these data, get me some more more to your liking. I've addressed your concerns by publishing means (now with error bounds; thanks, Brian! I was about to get on that myself...) for both isolated tiebreaking and nonisolated tiebreaking. If we get more data, I can address them further by publishing data for different kinds of situations: competition level, type of questions, etc. There really shouldn't be anything systematic that we can't deconvolve without enough data. However, the fact is, even introducing more random errors* by considering the nonstar data (or considering "skewed data," though your criticism of skewed hiow is well wide of the mark: please reconsider what sets these data are from,) we should (must) converge to the correct mean with enough data; that's just statistics. This also addresses your claim that these data are useless: no data are useless, we just have to carefully consider the nature of the error we introduce and consider propagating fluctuations.theMoMA wrote:I don't think we can work with the assumption that data trends in past quizbowl matches necessarily predict game results of future matches. I'm unconvinced that the body of quizbowl match results as a whole to this point represent the expected outcomes of matches to come, and I certainly reject outright the idea that a data set that mixes noncommon and commonopponent schedules, is heavily skewed towards sketchily edited west coast sets, TRASH regionals, and IS set tournaments, and has a whole bevy other other problems has any useful extrapolative value whatsoever. These data stem from activities whose commonality barely extends past the use of questions and buzzers. Who says that stirring up all of these (or any other concoction) yields something that will be predictive for future quizbowl as a whole, or more importantly, any individual tournament?
Secondly, your arguments are massively unscientific. You're just arguing from untested dogmas and saying thing that, again, are not counter to what we're examining here. Again, if longterm trends are the best tiebreakers, that will (must) be borne out by the data; if it's not, then it's your dogmas that are wrong. This is what is known as science.
Okay. I turn that burden back on you: justify uncritically retaining the traditional tiebreaker system without an appeal to tradition itself or to unverified dogmas like "longterms trends are always best." The simple fact is you can't: all untested dogmas are of the same standing and, as you are apparently opposed to looking at actual data and/or don't have any (or are holding out on me...) that's all you can possibly bring me.theMoMA wrote:I hold that there is a hefty burden that resides with those who advocate using past data to remake the tiebreaker system, and that that burden is to show that there is a predictive relationship between what has happened in the past and what will happen in the future. Unless someone can show that one tiebreaker stands above the rest regardless of the type of questions, level of competition, a team's slate of opponents, and a plethora of other variables, I don't think we can safely use this kind of data at all.
Consider, for example, that the whole impetus for this is another person's appeal to "reason" and tradition in favor of the straight headtohead tiebreaker. Consider, further, that that same tiebreaker for those same reasons was widely considered "the correct ones" very recently in the college game and, further, that there's no reason it can't become so again. Evidently, you vehemently disagree with that person and with the practitioners of the college game of years past, but their arguments are just as sound as yours in the absence of data and analysis: you've all brought only your dicks to a sword fight.
MaS
*This is somewhat begging the question: Andrew evidently means to assert that competition level/question type may introduce systematic, rather than random, drifts. I don't know if I buy that, but, at the same time, it's not something I can safely dismiss out of hand. The answer is (you guessed it!) more data.
PS: Also, your argument is contradictory at least in this: You're arguing that different situations (types of questions, level of competition) may have different results for the most predictive tiebreaker. You're then arguing that everyone is therefore compelled to use the same tiebreaker in the name of reason. That does not follow.
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Report with Random Errors*:
Starred data (sample size 20): Margin difference, 60% (10.95%); Bonus conversion difference, 55% (11.12%); Headtohead result, 53.33% (11.15%); Headtohead margin, 50% (11.18%); Point conversion difference, 50% (11.18%).
All data (sample size 38): Margin difference, 65.79% (7.70%); Bonus conversion difference, 60.53% (7.93%); Headtohead result, 54.84% (8.07%); Headtohead margin, 57.89% (8.01%); Point conversion difference, 57.89% (8.07%).
*Binomial random errors in parentheses. These should be considered lower error bounds: there are other drifts unaccounted for.
MaS
Starred data (sample size 20): Margin difference, 60% (10.95%); Bonus conversion difference, 55% (11.12%); Headtohead result, 53.33% (11.15%); Headtohead margin, 50% (11.18%); Point conversion difference, 50% (11.18%).
All data (sample size 38): Margin difference, 65.79% (7.70%); Bonus conversion difference, 60.53% (7.93%); Headtohead result, 54.84% (8.07%); Headtohead margin, 57.89% (8.01%); Point conversion difference, 57.89% (8.07%).
*Binomial random errors in parentheses. These should be considered lower error bounds: there are other drifts unaccounted for.
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
You have not addressed my concerns, and your statement about error bounds reflects a fundamental misunderstanding of what I'm saying. Your error bounds are useless outside of the data themselves. You've yet to show that these data have any value outside of themselves (ie, some kind of extraordinary power to predict future action), and until you do so, I will continue to reject what you're doing. I do hold that your data are useless, just like golf ball trajectory data are useless in determining who should win quizbowl tiebreakers. Until you show that the data are applicable to the situation at hand, I hold that we have no reason to assume that the data are valuable. When Dwight says "I argue that [feeding a bunch of data from past tournaments into a machine and coming up with a statistical tiebreaker] is independent of question quality and independent of strength of schedule, why on earth should we take him at face value? This is the major contention in using past data; you can't simply argue it away by putting "I argue" in front of an opinion.
Moreover, why would the burden be on me to get you data "to my liking"? I am the one making objections here; either find a way to counter them, find new data, or abandon your argument. Don't tell me that I have to counter my own argument for you. And stop mischaracterizing my argument. I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
I merely offer PPG and PPB as reasonable, intuitive, and positive. I am by no means saying that these are the only reasonable, intuitive, and positive tiebreakers that exist. The fact that some people see headtohead as a legitimate tiebreaker doesn't do anything to my argument; those people can show up and convincingly justify their beliefs as such, which would only show that there can be more than one legitimate tiebreaker. Or they can be wrong. Neither of these possibilities undermines what I'm saying. I see no reason to accept the "other people believe differently and appeal to some of the same things you do, abandon your argument" argument.
It may very well be that the current mode of tiebreaking is an untested dogma, but you've got a responsibility to show that your test is actually the correct one. You haven't done anything to shift the burden back to me. Show that your data are meaningful, or be forced to submit to bottomup instead of topdown tiebreakers.
Moreover, why would the burden be on me to get you data "to my liking"? I am the one making objections here; either find a way to counter them, find new data, or abandon your argument. Don't tell me that I have to counter my own argument for you. And stop mischaracterizing my argument. I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
I merely offer PPG and PPB as reasonable, intuitive, and positive. I am by no means saying that these are the only reasonable, intuitive, and positive tiebreakers that exist. The fact that some people see headtohead as a legitimate tiebreaker doesn't do anything to my argument; those people can show up and convincingly justify their beliefs as such, which would only show that there can be more than one legitimate tiebreaker. Or they can be wrong. Neither of these possibilities undermines what I'm saying. I see no reason to accept the "other people believe differently and appeal to some of the same things you do, abandon your argument" argument.
It may very well be that the current mode of tiebreaking is an untested dogma, but you've got a responsibility to show that your test is actually the correct one. You haven't done anything to shift the burden back to me. Show that your data are meaningful, or be forced to submit to bottomup instead of topdown tiebreakers.
Andrew Hart
Minnesota alum
Minnesota alum
Re: Statistical Tiebreakers
Andrew, unless I'm horribly mischaracterizing your argument, you are appearing to state that we cannot use the data that we have because it is not at all useful. Do you agree with the following method:theMoMA wrote:I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
Hypothesis: A is a better predictor of B than C is.
Testing Hypothesis: We make two Bernoulli random variables corresponding to A > B and C > B. We find a bunch of situations in which A occurs, and a bunch of situations in which C occurs. In each situation, either B will occur (a 1) or B will not occur (a 0). From this, we are able to guess the mean of these Bernoulli variables, i.e., the true probability that A > B and C > B.
Data Analysis: We can run a onesided ztest with H0: The true probability that B occurs given A and the true probability that B occurs given C are the same, and HA: The true probability that B occurs given A is greater than the true probability that B occurs given C.
Conclusion: If we get a pvalue of less than our significance level, say 5%, then we reject H0 and claim that the true probability that B occurs given A is greater than the true probability B occurs given C. This necessarily implies that A is a better predictor of B than C is. If we get a pvalue greater than our significance level, then we cannot reject H0 and we're back to "intuition" in deciding whether A or C is better.
If you do not agree, tell me where there is a problem with this setup. If you do agree, tell me where I can find data that might be more "useful," or prove to me that no such data exists. Unless I'm terribly mischaracterizing your argument (and I think I am), you seem to be implying that the only useful data is future data, i.e., data that we don't have (and once we do have it it'll be invalid because it's now past data).
As Mike said, there may be some systemic drift between different types of questions, or between different records, and he's entirely right when he says that we can check this if there's enough data (using the method outlined above).
The only argument that I think you can really be making is that the data has not been randomly selected. I will agree with you there, because we don't have data from other circuits. From our small sample of data, we are making a generalization about the population of (rematches between teams of the same record on the same packet set). If there is a systemic reason why we should not include "old" or "poorly edited" tournaments in our sample, outside of that it might skew the data one way or another (which, as Mike said, we can deconvolve with enough data), then you need to explain to me what it is, because you haven't done that yet. Performance on 1994 Wahoo Wars is probably not well predictive of performance of 2007 ACF Regionals, but we are comparing data from within tournaments (and their fields), not between tournaments (and their fields). That is, we are not taking data from 1994 Wahoo Wars and extrapolating to 2007 ACF Regionals. We are taking data from Wahoo Wars and comparing it to other data from Wahoo Wars, and doing the same thing with ACF Regionals. As long as the match passes our exclusionary criteria (e.g. we need rematches so the teams are at theoretically the same level at which they played the last time, although this assumption does not always hold; furthermore, we need matches between teams of the same record because WL record is probably the best predictor of who will win a given match), it should be included in the data set. You appear to be arguing that we need additional exclusionary criteria: please elucidate what exactly these criteria should be.
If there was some procedural change (for instance, if the halftime WhackaMole game was played until 2002, then discontinued), then tiebreakers affected by that change would no longer be valid (we can't use WhackaMole to predict tiebreakers because the probability that a team will win a tiebreaker given a WhackaMole win is 0, since there is no chance a team will actually win the WhackaMole game). The only "changes" that have really occurred in the past decade are that questions are almost uniformly longer and relatively easier. Neither of these are systemic changes that prevent us from taking a meaningful, for instance, bonus conversion statistic.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Okay, Andrew. I say that I have, in fact, understood and addressed your concerns. I will now try to do so a second time. If you see any objection of yours that isn't addressed, I invite you to point out what isn't addressed and how.
I hold as an axiom that past results are the best (and only reasonable) predictor of future results available. This is the farthest thing possible from "extraordinary power." I seek only to use the very basic predictive power of statistics. If you can see a better predictor than past performance, you're welcome to disclose what it is, but the claim I'm making here is hardly odd or extraordinary.
However, the fact that you continue to denigrate even the principle that future results can be predicted by past data leads me to believe that it is you who lack understanding in this case. Therefore, let me take your argument to its logical conclusion: if past results have negligible predictive power, one cannot justly have resort to any tiebreaker whatsoever because, as they're all based on past results of some kind, they're all inherently unfair and baseless fiat judgments that a TD foists on their field. Now, I don't believe that, and your argument about the reasonableness of traditional tiebreakers leads me to conclude that you don't believe that, either. This is, in fact, a major contradiction in what you're saying and gives you every reason to abandon your argument.
Now, then, I've said already and say again that you make a valid criticism by saying that differing types of questions or levels of competition may introduce systematic drifts in the data that we have no good way of compensating for. For a second time, I accept that that may be, though I have my doubts. That means that I can only publish what you will see as lower bounds on the error, for now.
However, I addressed that and address that again by saying that, with further data, we can observe what these drifts are by deconvolving whatever trends you like. Therefore, your criticism (to the extent that it is valid) is one of the data, not of the method per se. Given sufficient data, this method will observe which is the best tiebreaker in any situation that occurs frequently enough. But nobody, me least of all, has ever said that this method will work well with a paucity of data or with only certain kinds of data: in fact, I am saying and have always said the exact opposite of that.
However, if you cleave to this criticism and want to convince me of it, it is incumbent on you to demonstrate it. Find data for a situation of import (welledited sets or topflight teams or whatever) and show me that my results are badly different from your results for those. If you can't or won't do this, your criticism is in the realm of conjecture and my (or anyone else's) counterconjecture is equally valid.
Now, I'll note for a second time that your argument that different tiebreakers may be more predictive in different situations directly contradicts your contention that we should just use PPG or PPB in all cases. Your argument, in fact, dictates that, if we would be fair, we must use the correct tiebreaker for the situation. That is a second major contradiction in what your saying and, again, gives you every reason to abandon this argument.
Now, if you understand what I've said, you understand that I'm not assuming that every datum is equally valid in every situation. In fact, I'm saying quite the opposite of that: I'm saying that we are compelled to examine different situations to determine if the most predictive tiebreaker may be different in different cases. So, if you're not opposed to examining the data and drawing conclusions, you have no further issue with what we're doing here. However, you claim to understand what I'm saying and yet continue to oppose it. That is a third major contradiction in what your saying and, again, gives you every reason to abandon this argument.
You say that other tiebreakers may be just as good as the ones you propose, even by your own standards. Then, I ask you: on what basis do you propose the ones you do and not others? The fact that the exact same argument you're making can be used to justify different conclusions (by your own admonition!) formally indicates that your conclusion does not follow from your argument. That is a fourth major contradiction in what your saying and, again, gives you every reason to abandon this argument.
In closing, I'll note that you're right that the responsibility is on me to show that my test is valid. Fair enough: I take as an axiom that, if we're fair, we are compelled to select the tiebreakers that would best predict the outcome of an actual match, since we would presumably play the match to break the tie if we could. However, I say that what is above shows precisely that, given enough data, this test will indicate which stats are most predictive of winning in any situations that you like.
However, nothing substantive above is new; it is rather what I've been saying all along. Therefore, I claim that, if you have not heretofore understood that my proposed test is valid (or, indeed, if you don't understand that now), it is not because of my failure to demonstrate that it's so, but rather your failure to understand the principles of my arguments. I invite you to demonstrate that this is not so if you can.
MaS
I hold as an axiom that past results are the best (and only reasonable) predictor of future results available. This is the farthest thing possible from "extraordinary power." I seek only to use the very basic predictive power of statistics. If you can see a better predictor than past performance, you're welcome to disclose what it is, but the claim I'm making here is hardly odd or extraordinary.
However, the fact that you continue to denigrate even the principle that future results can be predicted by past data leads me to believe that it is you who lack understanding in this case. Therefore, let me take your argument to its logical conclusion: if past results have negligible predictive power, one cannot justly have resort to any tiebreaker whatsoever because, as they're all based on past results of some kind, they're all inherently unfair and baseless fiat judgments that a TD foists on their field. Now, I don't believe that, and your argument about the reasonableness of traditional tiebreakers leads me to conclude that you don't believe that, either. This is, in fact, a major contradiction in what you're saying and gives you every reason to abandon your argument.
Now, then, I've said already and say again that you make a valid criticism by saying that differing types of questions or levels of competition may introduce systematic drifts in the data that we have no good way of compensating for. For a second time, I accept that that may be, though I have my doubts. That means that I can only publish what you will see as lower bounds on the error, for now.
However, I addressed that and address that again by saying that, with further data, we can observe what these drifts are by deconvolving whatever trends you like. Therefore, your criticism (to the extent that it is valid) is one of the data, not of the method per se. Given sufficient data, this method will observe which is the best tiebreaker in any situation that occurs frequently enough. But nobody, me least of all, has ever said that this method will work well with a paucity of data or with only certain kinds of data: in fact, I am saying and have always said the exact opposite of that.
However, if you cleave to this criticism and want to convince me of it, it is incumbent on you to demonstrate it. Find data for a situation of import (welledited sets or topflight teams or whatever) and show me that my results are badly different from your results for those. If you can't or won't do this, your criticism is in the realm of conjecture and my (or anyone else's) counterconjecture is equally valid.
Now, I'll note for a second time that your argument that different tiebreakers may be more predictive in different situations directly contradicts your contention that we should just use PPG or PPB in all cases. Your argument, in fact, dictates that, if we would be fair, we must use the correct tiebreaker for the situation. That is a second major contradiction in what your saying and, again, gives you every reason to abandon this argument.
Now, if you understand what I've said, you understand that I'm not assuming that every datum is equally valid in every situation. In fact, I'm saying quite the opposite of that: I'm saying that we are compelled to examine different situations to determine if the most predictive tiebreaker may be different in different cases. So, if you're not opposed to examining the data and drawing conclusions, you have no further issue with what we're doing here. However, you claim to understand what I'm saying and yet continue to oppose it. That is a third major contradiction in what your saying and, again, gives you every reason to abandon this argument.
You say that other tiebreakers may be just as good as the ones you propose, even by your own standards. Then, I ask you: on what basis do you propose the ones you do and not others? The fact that the exact same argument you're making can be used to justify different conclusions (by your own admonition!) formally indicates that your conclusion does not follow from your argument. That is a fourth major contradiction in what your saying and, again, gives you every reason to abandon this argument.
In closing, I'll note that you're right that the responsibility is on me to show that my test is valid. Fair enough: I take as an axiom that, if we're fair, we are compelled to select the tiebreakers that would best predict the outcome of an actual match, since we would presumably play the match to break the tie if we could. However, I say that what is above shows precisely that, given enough data, this test will indicate which stats are most predictive of winning in any situations that you like.
However, nothing substantive above is new; it is rather what I've been saying all along. Therefore, I claim that, if you have not heretofore understood that my proposed test is valid (or, indeed, if you don't understand that now), it is not because of my failure to demonstrate that it's so, but rather your failure to understand the principles of my arguments. I invite you to demonstrate that this is not so if you can.
MaS
theMoMA wrote:You have not addressed my concerns, and your statement about error bounds reflects a fundamental misunderstanding of what I'm saying. Your error bounds are useless outside of the data themselves. You've yet to show that these data have any value outside of themselves (ie, some kind of extraordinary power to predict future action), and until you do so, I will continue to reject what you're doing. I do hold that your data are useless, just like golf ball trajectory data are useless in determining who should win quizbowl tiebreakers. Until you show that the data are applicable to the situation at hand, I hold that we have no reason to assume that the data are valuable. When Dwight says "I argue that [feeding a bunch of data from past tournaments into a machine and coming up with a statistical tiebreaker] is independent of question quality and independent of strength of schedule, why on earth should we take him at face value? This is the major contention in using past data; you can't simply argue it away by putting "I argue" in front of an opinion.
Moreover, why would the burden be on me to get you data "to my liking"? I am the one making objections here; either find a way to counter them, find new data, or abandon your argument. Don't tell me that I have to counter my own argument for you. And stop mischaracterizing my argument. I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
I merely offer PPG and PPB as reasonable, intuitive, and positive. I am by no means saying that these are the only reasonable, intuitive, and positive tiebreakers that exist. The fact that some people see headtohead as a legitimate tiebreaker doesn't do anything to my argument; those people can show up and convincingly justify their beliefs as such, which would only show that there can be more than one legitimate tiebreaker. Or they can be wrong. Neither of these possibilities undermines what I'm saying. I see no reason to accept the "other people believe differently and appeal to some of the same things you do, abandon your argument" argument.
It may very well be that the current mode of tiebreaking is an untested dogma, but you've got a responsibility to show that your test is actually the correct one. You haven't done anything to shift the burden back to me. Show that your data are meaningful, or be forced to submit to bottomup instead of topdown tiebreakers.
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
While I'm at it, I'll continue to disregard Andrew's remarks and post some more useless data:
2008 ACF Fall North, Round 14: Minnesota C vs Eden Prairie. Both teams 48*. Minnesota C held h2h, h2h differential, ppb. EPHS held ppg, point differential. EPHS def Minnesota C 325140.
2008 ACF Fall North, Round 15: Minnesota A vs Chicago C. Both teams 121*. Minnesota A held all tiebreakers except h2h, which was split. Minnesota A def Chicago C 350220.
2008 VCU Novice, Round 12: Broccoli Forest vs Jonathan Hoag. Both teams 55*. Jonathan Hoag held all tiebreakers (ppb by only about 0.15 ppb) and won 405205.
2008 VCU Novice, Round 12: Lampoon vs Streetcar. Both teams 73*. Lampoon held all tiebreakers. Streetcar won 310220.
2008 FEUERBACH South, Round 11: VCU vs Clemson. Both teams 73*. Clemson held ppb and h2h differential, narrowly held point differential (by 4.5 ppg). VCU narrowly held ppg (by 8 ppg). h2h was split. Clemson won 195125.
2008 FEUERBACH South, Round 9: VCU vs Clemson. Both teams 62*. Clemson held ppb and h2h differential, narrowly held point differential (by about 2 ppg). VCU held ppg (this time by about 12 ppg). h2h was split. VCU won 190170.
EDIT: Three more
2008 MUT, Round 13: Drake vs Illinois A. Both teams 101*. Drake held h2h differential, ppb. Illinois A held point differential, narrowly held ppg (by about 5 ppg). Drake def Illinois A 43060.
2008 MUT, Round 9: Minnesota A vs Armageddon. Both teams 43. Minnesota A held h2h, h2h differential. Armageddon held ppg, ppb, point differential. Armageddon won 395130.
2008 MCMNT, Round 10: Lawrence vs Chicago Police Cops. Both teams 62*. Lawrence held all tiebreakers and won 310155.
2008 ACF Fall North, Round 14: Minnesota C vs Eden Prairie. Both teams 48*. Minnesota C held h2h, h2h differential, ppb. EPHS held ppg, point differential. EPHS def Minnesota C 325140.
2008 ACF Fall North, Round 15: Minnesota A vs Chicago C. Both teams 121*. Minnesota A held all tiebreakers except h2h, which was split. Minnesota A def Chicago C 350220.
2008 VCU Novice, Round 12: Broccoli Forest vs Jonathan Hoag. Both teams 55*. Jonathan Hoag held all tiebreakers (ppb by only about 0.15 ppb) and won 405205.
2008 VCU Novice, Round 12: Lampoon vs Streetcar. Both teams 73*. Lampoon held all tiebreakers. Streetcar won 310220.
2008 FEUERBACH South, Round 11: VCU vs Clemson. Both teams 73*. Clemson held ppb and h2h differential, narrowly held point differential (by 4.5 ppg). VCU narrowly held ppg (by 8 ppg). h2h was split. Clemson won 195125.
2008 FEUERBACH South, Round 9: VCU vs Clemson. Both teams 62*. Clemson held ppb and h2h differential, narrowly held point differential (by about 2 ppg). VCU held ppg (this time by about 12 ppg). h2h was split. VCU won 190170.
EDIT: Three more
2008 MUT, Round 13: Drake vs Illinois A. Both teams 101*. Drake held h2h differential, ppb. Illinois A held point differential, narrowly held ppg (by about 5 ppg). Drake def Illinois A 43060.
2008 MUT, Round 9: Minnesota A vs Armageddon. Both teams 43. Minnesota A held h2h, h2h differential. Armageddon held ppg, ppb, point differential. Armageddon won 395130.
2008 MCMNT, Round 10: Lawrence vs Chicago Police Cops. Both teams 62*. Lawrence held all tiebreakers and won 310155.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
I've made a bit of progress on this.
I've written a script that scans the "*_games.html" SQBS files, and finds cases of where 2 teams face each other multiple times. It then finds all the opponents that those 2 teams have in common, and keeps track of both teams' stats for the games they play against common opponents (as well as for their headtohead matchups). So, for each of the two teams, I keep stats for N games (N2 games versus common opponents, and 2 headtohead matchups). I require that the two teams have identical records in their first N1 games, and I require N>5, so that the teams have at least a reasonable number of common opponents. If all these requirements are met, I calculate the teams' stats for those N1 games (ppg, ppb, point differential, headtohead), and see how well those stats predict which team wins their second headtohead match.
This is essentially equivalent to Dwight's *'d data points, except I'm loosening the requirement on what rounds the teams play their opponents. For example, let's say Teams A and B play 8 rounds, with the following opponents:
Team A plays [B,C,D,E,F,G,B,H]
Team B plays [A,E,C,H,D,F,A,K]
In this case, instead of using the first 6 rounds for comparison (where the teams don't play all common opponents), I can look at rounds [1,2,3,4,5,8] for Team A, and rounds [1,2,3,4,5,6] for team B. In those rounds, both A and B play each other once, as well as play C,D,E,F, and H. Assuming A and B both have identical records in those 6 games, we can look at their stats in those games, and see how they predict who wins their second headtohead matchup (in round 7).
I've applied this script to about a year's worth of tournaments, looking for all the results I could find for the last year. This resulted in 54 data points. Here's the results:
PPG: 0.7222 +/ 0.0610
PPG Differential: 0.6852 +/ 0.0632
Bonus Conversion: 0.7593 +/ 0.0582
Head to Head: 0.5000 +/ 0.0680
Here we're to starting to see fairly significant differences between headtohead and the other stats. We'll need a lot more data to distinguish between PPG, PPG differential, and bonus conversion, but in any case, it looks unlikely that there's a large difference between those three statistics.
I've written a script that scans the "*_games.html" SQBS files, and finds cases of where 2 teams face each other multiple times. It then finds all the opponents that those 2 teams have in common, and keeps track of both teams' stats for the games they play against common opponents (as well as for their headtohead matchups). So, for each of the two teams, I keep stats for N games (N2 games versus common opponents, and 2 headtohead matchups). I require that the two teams have identical records in their first N1 games, and I require N>5, so that the teams have at least a reasonable number of common opponents. If all these requirements are met, I calculate the teams' stats for those N1 games (ppg, ppb, point differential, headtohead), and see how well those stats predict which team wins their second headtohead match.
This is essentially equivalent to Dwight's *'d data points, except I'm loosening the requirement on what rounds the teams play their opponents. For example, let's say Teams A and B play 8 rounds, with the following opponents:
Team A plays [B,C,D,E,F,G,B,H]
Team B plays [A,E,C,H,D,F,A,K]
In this case, instead of using the first 6 rounds for comparison (where the teams don't play all common opponents), I can look at rounds [1,2,3,4,5,8] for Team A, and rounds [1,2,3,4,5,6] for team B. In those rounds, both A and B play each other once, as well as play C,D,E,F, and H. Assuming A and B both have identical records in those 6 games, we can look at their stats in those games, and see how they predict who wins their second headtohead matchup (in round 7).
I've applied this script to about a year's worth of tournaments, looking for all the results I could find for the last year. This resulted in 54 data points. Here's the results:
PPG: 0.7222 +/ 0.0610
PPG Differential: 0.6852 +/ 0.0632
Bonus Conversion: 0.7593 +/ 0.0582
Head to Head: 0.5000 +/ 0.0680
Here we're to starting to see fairly significant differences between headtohead and the other stats. We'll need a lot more data to distinguish between PPG, PPG differential, and bonus conversion, but in any case, it looks unlikely that there's a large difference between those three statistics.
Brian
Stanford University
Stanford University
 Captain Sinico
 Auron
 Posts: 2838
 Joined: Sun Sep 21, 2003 1:46 pm
 Location: Champaign, Illinois
Re: Statistical Tiebreakers
Awesome! I'm glad people so much better at mining these data than I am exist. So, it seems what we need now are more SQBS files, then? How hard would it be to make splits for, like, record or tournament type using your script?
MaS
MaS
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Coach, Centennial High School of Champaign, IL (2014) & Team Illinois (20162018)
Alumnus, Illinois ABT (20002002; 20032009) & Fenwick Scholastic Bowl (19992000)
ACF
IHSSBCA
PACE
Re: Statistical Tiebreakers
Wow, excellent. Well done.Schweizerkas wrote:I've made a bit of progress on this.
Greg Peterson
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player"  Mike Cheyne
Northwestern University '18
Lawrence University '11
Maine South HS '07
"a decent player"  Mike Cheyne
Re: Statistical Tiebreakers
Brian,
That looks awesome. If you need more data, you can try NAQT's database. You'd probably have to rework your script and make sure you don't get repeats, but I'm guessing that NAQT has some statistics that aren't elsewhere. Of course, you could also try going back to 200607 or 200506 too.
I've taken your data and really quickly run it through the 2PropZTest function on my trusty TI83+. I get the following pvalues (I've defined the following: H0: p1 = p2; Ha: p1 > p2):
p1 = BC, p2 = PPG: 0.330
p1 = BC, p2 = PPGDiff: 0.195
p1 = BC, p2 = H2H: 0.002**
p1 = PPG, p2 = PPGDiff: 0.337
p1 = PPG, p2 = H2H: 0.009**
p1 = PPGDiff, p2 = H2H: 0.025*
*Significant at the 5% significance level
**Significant at the 1% significance level
Given that data, I think we can safely conclude that headtohead is the weakest of the four considered tiebreakers.
For those of you who haven't taken a statistics class, or don't remember anything from it, I define a null hypothesis that the true percentage of games accurately predicted by one tiebreaker is the same as the true percentage of games accurately predicted by a different tiebreaker, and an alternative hypothesis that the true percentage of games accurately predicted by one tiebreaker is greater than the true percentage of games accurately predicted by the other. I plug the data into a fancy mathematical formula to get a zscore, which I can turn into a pvalue. If my pvalue is less than my significance level, I reject my null hypothesis (and am forced to accept my alternative hypothesis, assuming I've defined my hypotheses correctly); otherwise I cannot reject the null hypothesis (and thus I must continue to assume that one tiebreaker is not better than the other).
That looks awesome. If you need more data, you can try NAQT's database. You'd probably have to rework your script and make sure you don't get repeats, but I'm guessing that NAQT has some statistics that aren't elsewhere. Of course, you could also try going back to 200607 or 200506 too.
I've taken your data and really quickly run it through the 2PropZTest function on my trusty TI83+. I get the following pvalues (I've defined the following: H0: p1 = p2; Ha: p1 > p2):
p1 = BC, p2 = PPG: 0.330
p1 = BC, p2 = PPGDiff: 0.195
p1 = BC, p2 = H2H: 0.002**
p1 = PPG, p2 = PPGDiff: 0.337
p1 = PPG, p2 = H2H: 0.009**
p1 = PPGDiff, p2 = H2H: 0.025*
*Significant at the 5% significance level
**Significant at the 1% significance level
Given that data, I think we can safely conclude that headtohead is the weakest of the four considered tiebreakers.
For those of you who haven't taken a statistics class, or don't remember anything from it, I define a null hypothesis that the true percentage of games accurately predicted by one tiebreaker is the same as the true percentage of games accurately predicted by a different tiebreaker, and an alternative hypothesis that the true percentage of games accurately predicted by one tiebreaker is greater than the true percentage of games accurately predicted by the other. I plug the data into a fancy mathematical formula to get a zscore, which I can turn into a pvalue. If my pvalue is less than my significance level, I reject my null hypothesis (and am forced to accept my alternative hypothesis, assuming I've defined my hypotheses correctly); otherwise I cannot reject the null hypothesis (and thus I must continue to assume that one tiebreaker is not better than the other).
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
Dwight, that looks really nice. I didn't realize TI calculators had that type of capability built in. The p value looks like exactly the thing we want to be looking at.
I'll see if I can make a Google Docs spreadsheet available with all the numbers for the 54 datapoints, so you can play around with the numbers yourself.
Splitting up by tournament type is just a matter of sorting the SQBS files by hand into different categories (most of the tournaments I have are college ACFstyle, but there are some NAQT and trash tournaments as well). So, that shouldn't be very hard. What exactly do you mean by "splitting by record"? I think it should be relatively easy to make splits based on any category you can imagine. One idea I had was to look at how predictive the stats are as a function of the stat difference between the two teams. So, for example, instead of just asking, "how often does the team with the higher PPB win the second H2H matchup?", we can look at, "how often does a team with 1 (or 2, or 3, etc.) higher PPB win the second H2H matchup?" This way, we can find out, is a 1 PPB advantage more or less significant than (e.g.) a 20 PPG advantage?Captain Scipio wrote:How hard would it be to make splits for, like, record or tournament type using your script?
I'll see if I can make a Google Docs spreadsheet available with all the numbers for the 54 datapoints, so you can play around with the numbers yourself.
Brian
Stanford University
Stanford University

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
Okay, I have a spreadsheet with all the datapoints available here.
Also, I noticed that one of my datapoints accidentally appeared twice, because I had two copies of the same tournament in my directory. I removed the extra datapoint, and here are the new numbers (based on 53 points):
PPG: 0.7170 +/ 0.0619
PPG Differential: 0.6792 +/ 0.0641
Bonus Conversion: 0.7547 +/ 0.0591
Head to Head: 0.5094 +/ 0.0687
Also, I noticed that one of my datapoints accidentally appeared twice, because I had two copies of the same tournament in my directory. I removed the extra datapoint, and here are the new numbers (based on 53 points):
PPG: 0.7170 +/ 0.0619
PPG Differential: 0.6792 +/ 0.0641
Bonus Conversion: 0.7547 +/ 0.0591
Head to Head: 0.5094 +/ 0.0687
Brian
Stanford University
Stanford University
Re: Statistical Tiebreakers
Dwight, Not to be a stats nitpicker, but you probably should be using a ttest here.
Christian Carter
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)
 Deviant Insider
 Auron
 Posts: 4486
 Joined: Sun Jun 13, 2004 6:08 am
 Location: Chicagoland
Re: Statistical Tiebreakers
If two people each flip a coin 53 times, the expected value for the difference in their number of heads is a little over 4, which is approximately the difference in the number of successful picks between PPG, PPG Differential, and Bonus Conversion. P Values around 0.3 should not be used to draw any conclusions other than more research is necessary. (I'm not contradicting anybodyI'm just making the statistical uncertainties more explicit in case anybody reading this thread thinks it's a good idea to draw conclusions at this point.)
David Reinstein
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (20112017), IHSSBCA Board Member, IHSSBCA Chair (20042014), PACE Member, PACE President (20162018), New Trier Coach (19942011)
Re: Statistical Tiebreakers
Ttest is used for sample means. Ztest is used for sample proportions. We're comparing proportions, not means. Really, the only criticisms that you can make are:cdcarter wrote:Dwight, Not to be a stats nitpicker, but you probably should be using a ttest here.
1. The samples were not selected randomly or independently
2. There is a hidden variable that is causing the difference in data, and so the null and alternate hypotheses are invalid
BTW, for the "new" data set:
p1 = BC, p2 = PPG: .330
p1 = BC, p2 = PPGdiff: .194
p1 = BC, p2 = H2H: .004**
p1 = PPG, p2 = PPGdiff: .336
p1 = PPG, p2 = H2H: .014*
p1 = PPGdiff, p2 = H2H: .037*
*significant at the 5% significance level
**significant at the 1% significance level
Really, not a huge difference, except that the PPG vs H2H is no longer significant at the 1% level.
Dwight Wynne
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
socalquizbowl.org
UC Irvine 20082013; UCLA 20042007; Capistrano Valley High School 20002003
"It's a competition, but it's not a sport. On a scale, if football is a 10, then rowing would be a two. One would be Quiz Bowl." Matt Birk on rowing, SI On Campus, 10/21/03
"If you were my teammate, I would have tossed your ass out the door so fast you'd be emitting Cerenkov radiation, but I'm not classy like Dwight." Jerry
Re: Statistical Tiebreakers
Oh these totally are proportions... I was thinking you were doing means with a Ztest which can be done but is like...bad. I should read.cvdwightw wrote:Ttest is used for sample means. Ztest is used for sample proportions. We're comparing proportions, not means.cdcarter wrote:Dwight, Not to be a stats nitpicker, but you probably should be using a ttest here.
Christian Carter
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)
Minneapolis South High School '09  Emerson College '13
PACE Member (retired)

 Lulu
 Posts: 83
 Joined: Sat May 12, 2007 1:01 am
 Location: Stanford, CA
Re: Statistical Tiebreakers
Added some more tournament data. I now have all college tournaments from 2008 and the second half of 2007. The new numbers, with 106 data points:
Bonus Conversion: 0.6981 +/ 0.0446
PPG: 0.6792 +/ 0.0453
PPG Differential: 0.6415 +/ 0.0466
Head to Head: 0.4906 +/ 0.0486
Bonus Conversion: 0.6981 +/ 0.0446
PPG: 0.6792 +/ 0.0453
PPG Differential: 0.6415 +/ 0.0466
Head to Head: 0.4906 +/ 0.0486
Brian
Stanford University
Stanford University