Introducing BPA, a new evaluation metric using detailed stats
- ryanrosenberg
- Auron
- Posts: 1891
- Joined: Thu May 05, 2011 5:48 pm
- Location: Palo Alto, California
Introducing BPA, a new evaluation metric using detailed stats
What is BPA?
BPA stands for Buzz Point AUC (area under the curve). It is the total area under the curve of [% of tossups gotten successfully] against [% of question elapsed].
The theoretical maximum is 100 (i.e., if all tossups were gotten near-instantly); however, top players will generally get somewhere between 10 and 15 at regular difficulty, which corresponds to preventing about 15% of the total words in the tournament's tossups from being read by getting the question. Top teams will generally get around 20-25 at regular difficulty. As an illustration, below is Chris Ray's buzz point graph from 2018 ACF Regionals.
BPA can be calculated for any tournament that records buzz points.
How do I calculate BPA?
BPA is actually pretty easy to calculate, especially for an individual player. The below screenshot shows an example of calculating conversion percent at each buzz point (max gets is the number of possible tossups heard, so games played times 20), and BPA is simply the sum of column F (over all buzz points 0.01 to 1).
What are the advantages of BPA over other quizbowl stats?
BPA is the first metric to take advantage of buzz point tracking and provide a more detailed view into how early people are getting questions. This reveals player skill that may be masked by traditional stats.
For example, let's look at the top two scorers from the Minnesota site of CMST: Shan Kothari and Auroni Gupta. Shan outscored Auroni by about a tossup per game, and recorded seven powers to Auroni's four. However, Auroni had a 6.9 BPA, while Shan comes in at 6.47, since Auroni was buzzing earlier on a higher percentage of tossups, particularly in the late-middle clues, before Shan overtakes him during giveaways. BPA ranking Auroni over Shan is in line with subjective appraisals of the two players (the player poll had Auroni as a top-5 player in grad school, and Shan in the 10-15 range), but neither of the traditional stats (PPG and powers) capture this difference in skill.
What are BPA's shortcomings?
BPA is still, like PPG, a heavily context-dependent stat, and is not exactly comparable across fields of different strength (or even across different schedules in the same field). Teammate effects are also fairly strong; BPA does not incorporate the PATH adjustment for shadow effect since I believe that introduces more false positives than the false negatives it corrects.
Who does BPA say is good at quizbowl?
The top 10 players at CMST were Jordan Brownstein (18.04), Jacob Reed (11.36), Stephen Liu (10.51), Neil Gurram (10.21), Eric Mukherjee (9.05), John Lawrence (8.37), Rafael Krichevsky (7.99), Matt Bollinger (7.95), Will Alston (7.36), and Auroni Gupta (6.9).
The top 5 teams were Brownstein et al. (23.86), Yale (23.13), BHSU A (20.45), Bloq Mayus (18.95), and Chicago A (18.13).
The top 10 players at 2018 Regionals were Eric Mukherjee (17.15), Jakob Myers (15.68), Aseem Keyal (14.33), Evan Lynch (12.89), Rafael Krichevsky (12.84), Eric Wolfsberg (12.72), Adam Silverman (12.56), Chris Ray (12.18), John Lawrence (11.82), and Derek So (11.64).
The top 5 teams were Penn A (28.32), Berkeley A (27.6), Chicago A (25.35), Columbia A (25.05), and Maryland A (25.03).
There's also category-specific BPA! Here are overall and category-specific rankings for 2018 Regionals and CMST.
BPA stands for Buzz Point AUC (area under the curve). It is the total area under the curve of [% of tossups gotten successfully] against [% of question elapsed].
The theoretical maximum is 100 (i.e., if all tossups were gotten near-instantly); however, top players will generally get somewhere between 10 and 15 at regular difficulty, which corresponds to preventing about 15% of the total words in the tournament's tossups from being read by getting the question. Top teams will generally get around 20-25 at regular difficulty. As an illustration, below is Chris Ray's buzz point graph from 2018 ACF Regionals.
BPA can be calculated for any tournament that records buzz points.
How do I calculate BPA?
BPA is actually pretty easy to calculate, especially for an individual player. The below screenshot shows an example of calculating conversion percent at each buzz point (max gets is the number of possible tossups heard, so games played times 20), and BPA is simply the sum of column F (over all buzz points 0.01 to 1).
What are the advantages of BPA over other quizbowl stats?
BPA is the first metric to take advantage of buzz point tracking and provide a more detailed view into how early people are getting questions. This reveals player skill that may be masked by traditional stats.
For example, let's look at the top two scorers from the Minnesota site of CMST: Shan Kothari and Auroni Gupta. Shan outscored Auroni by about a tossup per game, and recorded seven powers to Auroni's four. However, Auroni had a 6.9 BPA, while Shan comes in at 6.47, since Auroni was buzzing earlier on a higher percentage of tossups, particularly in the late-middle clues, before Shan overtakes him during giveaways. BPA ranking Auroni over Shan is in line with subjective appraisals of the two players (the player poll had Auroni as a top-5 player in grad school, and Shan in the 10-15 range), but neither of the traditional stats (PPG and powers) capture this difference in skill.
What are BPA's shortcomings?
BPA is still, like PPG, a heavily context-dependent stat, and is not exactly comparable across fields of different strength (or even across different schedules in the same field). Teammate effects are also fairly strong; BPA does not incorporate the PATH adjustment for shadow effect since I believe that introduces more false positives than the false negatives it corrects.
Who does BPA say is good at quizbowl?
The top 10 players at CMST were Jordan Brownstein (18.04), Jacob Reed (11.36), Stephen Liu (10.51), Neil Gurram (10.21), Eric Mukherjee (9.05), John Lawrence (8.37), Rafael Krichevsky (7.99), Matt Bollinger (7.95), Will Alston (7.36), and Auroni Gupta (6.9).
The top 5 teams were Brownstein et al. (23.86), Yale (23.13), BHSU A (20.45), Bloq Mayus (18.95), and Chicago A (18.13).
The top 10 players at 2018 Regionals were Eric Mukherjee (17.15), Jakob Myers (15.68), Aseem Keyal (14.33), Evan Lynch (12.89), Rafael Krichevsky (12.84), Eric Wolfsberg (12.72), Adam Silverman (12.56), Chris Ray (12.18), John Lawrence (11.82), and Derek So (11.64).
The top 5 teams were Penn A (28.32), Berkeley A (27.6), Chicago A (25.35), Columbia A (25.05), and Maryland A (25.03).
There's also category-specific BPA! Here are overall and category-specific rankings for 2018 Regionals and CMST.
Ryan Rosenberg
North Carolina '16
NYU '26 (ideally)
ACF
North Carolina '16
NYU '26 (ideally)
ACF
Re: Introducing BPA, a new evaluation metric using detailed stats
I think this might be the most precise (and intuitively useful) non-PATH-like stat we've ever had—thanks to Ryan for the computations and visualizations!
Jacob R., ex-Chicago
- naan/steak-holding toll
- Auron
- Posts: 2517
- Joined: Mon Feb 28, 2011 11:53 pm
- Location: New York, NY
Re: Introducing BPA, a new evaluation metric using detailed stats
niceAuroni Gupta (6.9)
Will Alston
Dartmouth College '16
Columbia Business School '21
Dartmouth College '16
Columbia Business School '21
Re: Introducing BPA, a new evaluation metric using detailed stats
This is awesome! Thanks for putting the work into coming up with this.
This is also an interesting statistic to look at on a game-by-game basis, though you have to take the results with a grain of salt. Here are the top 10 games from 2018 ACF Regionals by total BPA:
Note that in the fifth game, Northwestern A beat MSU A despite having a significantly lower BPA. This is partly due to the fact that Northwestern waited until the end on all three of MSU's negs, while not negging at all themselves. However, even on the 7 live tossups they converted, Northwestern had an average buzz location of 0.547, substantially later than MSU's average of 0.463.
Here are the five games with the closest margin of BPA, selected from among games with a total BPA of at least 30:
In all but one of these games, the winner had the lower BPA. However, only some of them can be chalked up to a negstorm by the losing team. For example, in the McGill-Toronto game, McGill went 9/4 to Toronto's 10/2 and won on the strength of their bonus conversion.
Interesting questions for future BPA analysis: what fraction of games are won by the team with the lower BPA? In these situations, can we discriminate between occurrences of (a) one team waiting to the end on a bunch of negs, (b) one team out-bonusing the other, (c) one team having a large advantage in certain categories and being able to sit on those questions, (d) something else? Perhaps it's fruitful to only consider tossups that were not negged, in order to restrict the analysis to situations in which both teams are playing each tossup live. This requires a bit more careful work to determine the number of tossups heard, but it's certainly possible with the data we have.
This is also an interesting statistic to look at on a game-by-game basis, though you have to take the results with a grain of salt. Here are the top 10 games from 2018 ACF Regionals by total BPA:
Code: Select all
Winner Loser Score Winner BPA Loser BPA Total BPA
Berkeley A UC San Diego B 500-80 37.915 6.7 44.615
Cambridge B Oxford B 315-290 24.515 19.83 44.345
Penn A Villanova 490-50 37.735 6.595 44.33
Penn A Johns Hopkins A 375-240 31.585 12.31 43.895
Northwestern A MSU A 320-285 15.855 26.835 42.69
Columbia A Amherst 355-200 26.67 15.785 42.455
McGill A McGill B 315-175 26.845 15.42 42.265
Penn A Delaware 490-115 31.745 10.04 41.785
Northwestern A Ohio State A 385-215 22.425 19.155 41.58
Columbia A Harvard A 375-170 26.98 14.565 41.545
Here are the five games with the closest margin of BPA, selected from among games with a total BPA of at least 30:
Code: Select all
Winner Loser Score Winner BPA Loser BPA Total BPA
Ohio State A Chicago B 325-240 15.1 15.26 30.36
Harvard A Yale A 310-245 15.9 14.855 30.755
McGill A Toronto A 270-240 14.95 16.84 31.79
MSU A Chicago A 310-260 17.33 19.585 36.915
Berkeley B Stanford 305-230 15.215 17.56 32.775
Interesting questions for future BPA analysis: what fraction of games are won by the team with the lower BPA? In these situations, can we discriminate between occurrences of (a) one team waiting to the end on a bunch of negs, (b) one team out-bonusing the other, (c) one team having a large advantage in certain categories and being able to sit on those questions, (d) something else? Perhaps it's fruitful to only consider tossups that were not negged, in order to restrict the analysis to situations in which both teams are playing each tossup live. This requires a bit more careful work to determine the number of tossups heard, but it's certainly possible with the data we have.
Stephen Eltinge
Then: TJ, MIT, Yale, PACE, NAQT
Now: ACF
Then: TJ, MIT, Yale, PACE, NAQT
Now: ACF
Re: Introducing BPA, a new evaluation metric using detailed stats
This is super cool! Is there any chance we can see similar metrics for EFT?
Jon Suh
Wheaton Warrenville South High School '16
Harvard '20
Wheaton Warrenville South High School '16
Harvard '20
- ryanrosenberg
- Auron
- Posts: 1891
- Joined: Thu May 05, 2011 5:48 pm
- Location: Palo Alto, California
Re: Introducing BPA, a new evaluation metric using detailed stats
Yes, I'll post EFT BPA later today.
Ryan Rosenberg
North Carolina '16
NYU '26 (ideally)
ACF
North Carolina '16
NYU '26 (ideally)
ACF
- ryanrosenberg
- Auron
- Posts: 1891
- Joined: Thu May 05, 2011 5:48 pm
- Location: Palo Alto, California
Re: Introducing BPA, a new evaluation metric using detailed stats
Here's a public link to code used to generate overall BPA for last year's Regionals.
Ryan Rosenberg
North Carolina '16
NYU '26 (ideally)
ACF
North Carolina '16
NYU '26 (ideally)
ACF
- ProfessorIanDuncan
- Wakka
- Posts: 195
- Joined: Tue Dec 20, 2011 10:37 pm
Re: Introducing BPA, a new evaluation metric using detailed stats
Does this metric factor in negs? Would that be a useful feature? It seems that adding a negative value, namely the difference between the minimum of question length and correct answer buzz point and the neg point, could shed some insight on how negs affect how much of the tournament is heard. I suppose that this would fail to take into account teams waiting until the end of the question to convert, so maybe its not that useful of an addition.
Alec Vulfson
Irvington High School '13
Irvington High School '13
Re: Introducing BPA, a new evaluation metric using detailed stats
I calculated BPA for BLAST Online this afternoon. Although I might be the only person who a post like this applied to, I thought I'd put here a few of the traps I fell into and how to avoid them.
I calculated it in R, using Ryan's script posted above. I've never used R before this afternoon, but I was able to download R and RStudio easily enough. Ryan's code uses the tidyverse library, which I had to import before the code will work, but that was easy enough to find online.
When my spreadsheet finally worked, it used the following columns (everything in single quotes is the name of a cell): 'round', 'packet', 'tossup', 'answer', 'category', 'subcategory', 'team', 'player', 'buzz_value', and 'buzz_location_pct'. Of these, 'packet' is super important: the script relies on each packet having a name, even if it's just the name of the round the packet was played in. From glancing over the code, I think 'team', 'player', 'category', 'buzz_value', and 'buzz_location_pct' are all necessary. And because of the way Regionals worked, you can't use S as a packet name without changing the code.
I ended up getting all of this from the file used to generate the ACF Regionals BPA, which Ryan posted here: https://github.com/quizbowl/open-data/b ... ossups.tsv.
Ryan's code also takes in a tsv file. This is easy enough to change by either changing your file type to a .tsv file or changing "read_tsv" in the second line of code to "read_csv". Using an .xlsx as the input file doesn't really work, and it's easy enough to change into a csv.
For category stats, Ryan was kind enough to share his code for that, which can be found below. I'm pretty sure it only works after you've run the overall code. Also, another mistake I made was failing to recognize the difference between "Arts" and "Fine Arts".
I calculated it in R, using Ryan's script posted above. I've never used R before this afternoon, but I was able to download R and RStudio easily enough. Ryan's code uses the tidyverse library, which I had to import before the code will work, but that was easy enough to find online.
When my spreadsheet finally worked, it used the following columns (everything in single quotes is the name of a cell): 'round', 'packet', 'tossup', 'answer', 'category', 'subcategory', 'team', 'player', 'buzz_value', and 'buzz_location_pct'. Of these, 'packet' is super important: the script relies on each packet having a name, even if it's just the name of the round the packet was played in. From glancing over the code, I think 'team', 'player', 'category', 'buzz_value', and 'buzz_location_pct' are all necessary. And because of the way Regionals worked, you can't use S as a packet name without changing the code.
I ended up getting all of this from the file used to generate the ACF Regionals BPA, which Ryan posted here: https://github.com/quizbowl/open-data/b ... ossups.tsv.
Ryan's code also takes in a tsv file. This is easy enough to change by either changing your file type to a .tsv file or changing "read_tsv" in the second line of code to "read_csv". Using an .xlsx as the input file doesn't really work, and it's easy enough to change into a csv.
For category stats, Ryan was kind enough to share his code for that, which can be found below. I'm pretty sure it only works after you've run the overall code. Also, another mistake I made was failing to recognize the difference between "Arts" and "Fine Arts".
Code: Select all
category_bpa <- regs_tossups %>%
filter(!is.na(buzz_location_pct)) %>%
left_join(regs_games_played) %>%
left_join(regs_category_counts) %>%
mutate(max_gets = tu_count*n) %>%
mutate(conv_flag = ifelse(buzz_value == "10", 1, 0),
#Uncomment below line if set has powers
#conv_flag = ifelse(buzz_value %in% c("15","10"), 1, 0),
buzz_location_pct = ifelse(is.na(buzz_location_pct), 1, round(buzz_location_pct, 2)),
buzz_location_pct = factor(buzz_location_pct, levels = seq(0,1,.01))) %>%
group_by(player, category, team, max_gets, buzz_location_pct) %>%
summarize(gets = sum(conv_flag)) %>%
complete(nesting(player, category, team, max_gets), buzz_location_pct, fill = list(gets = 0)) %>%
group_by(player, category, team) %>%
mutate(cum_gets = cumsum(gets),
conv_pct = cum_gets/max_gets,
buzz_location_pct = buzz_location_pct %>% as.character() %>% as.numeric()) %>%
ungroup() %>%
mutate(player = paste0(player, " (", team, ")")) %>%
group_by(player, category, team) %>%
summarize(BPA = sum(conv_pct)) %>%
arrange(-BPA)
Matthew Siff
Georgetown Day School '20
Yale University '25
Georgetown Day School '20
Yale University '25
Re: Introducing BPA, a new evaluation metric using detailed stats
If you have Excel 2016 or later, you can also use Power Query to calculate BPA by adding the Excel table as a source (disclaimer: I work on Power Query). Here's a sample for ACF Regionals 2018.
The approach to calculate the area is different, but should give similar results (instead of joining a list of percentages, calculate the number of buzzes at each percentage a player has buzzed at, and multiply it by how that percentage is from 1).
Alejandro
Naperville Central '07
Harvey Mudd '11
University of Washington '17
Naperville Central '07
Harvey Mudd '11
University of Washington '17
- Smuttynose Island
- Forums Staff: Moderator
- Posts: 614
- Joined: Wed Oct 21, 2009 9:07 pm
Re: Introducing BPA, a new evaluation metric using detailed stats
I'm late to the party, but you can find a Jupyter Notebook template similar to the one I use to compute BPA on my github.
Daniel Hothem
TJHSST '11 | UVA '15 | Oregon '??
"You are the stuff of legends" - Chris Manners
https://sites.google.com/site/academicc ... ubuva/home
TJHSST '11 | UVA '15 | Oregon '??
"You are the stuff of legends" - Chris Manners
https://sites.google.com/site/academicc ... ubuva/home