MSNCT/HSNCT bonus difficulty discussion

Nick · Post by **Nick** » Mon Apr 23, 2012 5:52 pm

In my opinion, this was the most appropriate (difficulty-wise) set I've seen at any of the NAQT national tournaments (MS, HS, ICT) that I've ever played or read. I think the bonus conversions back this up.

Also, I read Kealing's 925 game. It was crazy.

Down and out in Quintana Roo · Mon Apr 23, 2012 6:50 pm

Nick wrote:In my opinion, this was the most appropriate (difficulty-wise) set I've seen at any of the NAQT national tournaments (MS, HS, ICT) that I've ever played or read. I think the bonus conversions back this up.

Also, I read Kealing's 925 game. It was crazy.

Nick, does that mean you're advocating for a HS team to get 27ppb at HSNCT? And many many schools in the 20s?

Sen. Estes Kefauver (D-TN) · Mon Apr 23, 2012 7:20 pm

I wouldn't mind a situation where the top teams can average around 25 ppb at the HSNCT, and I don't think NAQT would object either.

Down and out in Quintana Roo · Mon Apr 23, 2012 7:27 pm

Absolutely, i think i would be great as well. I doubt it will happen but i would love for that to be the case in Atlanta next month. I was just trying to ask for clarification on what Nick was saying.

Stained Diviner · Post by **Stained Diviner** » Mon Apr 23, 2012 10:06 pm

The only reason I can think of to avoid very high PPBs is that a bunch of teams getting very high PPBs does not distinguish between them. That didn't happen here. In fact, there was a decent spread between the best teams, which is a good thing.

20-25% of teams had over 20 PPB, which again isn't a bad thing as long as there is some spread, with some teams just clearing 20 PPB and other teams clearing it with room to spare, and that's what happened.

Kyle · Post by **Kyle** » Tue Apr 24, 2012 4:36 am

Leucippe and Clitophon wrote:The only reason I can think of to avoid very high PPBs is that a bunch of teams getting very high PPBs does not distinguish between them. That didn't happen here. In fact, there was a decent spread between the best teams, which is a good thing.

20-25% of teams had over 20 PPB, which again isn't a bad thing as long as there is some spread, with some teams just clearing 20 PPB and other teams clearing it with room to spare, and that's what happened.

Well, it sort of did, but only among the absolute best teams. I wasn't there, obviously, but I have my doubts that a 460-455 game was the most effective method of distinguishing between Kealing and Longfellow.

Stained Diviner · Post by **Stained Diviner** » Tue Apr 24, 2012 7:13 am

I was just looking at bonus conversion, and I was trying to hold the set to a reasonable standard. I don't think "The better team should always win by a clear margin" or even "The better team should always win" is a reasonable standard, because that rarely happens in quizbowl. Additionally, a small character limit on tossups, the use of non-academic categories, and an occasional clunker makes such a standard more unlikely to be met.

Basically, I was just pointing out that in theory one team breaking 27 PPB and 22% of teams breaking 20 PPB is not necessarily a problem, even if the set is used for a tournament that is viewed more as a competition than as a feel good introduction to quizbowl.

btressler · Post by **btressler** » Thu Apr 26, 2012 10:42 am

To quantify this (prelim rounds only):

Max = 27.23
Q3 = 19.32
Q2 = 16.69
Q1 = 14.19
Min = 9.11

This is close to what I was aiming for as an editor. Last year the median was even higher, so I was aiming to increase the difficulty, mainly on medium and hard parts. I wouldn't mind if these numbers were slightly lower, like a median of 15.0-16.0.

Another thing to keep in mind is as the tournament grows, the difficulty will almostly assuredly inch up. This year there were 20 packets. If next year we produce 22 packets, then I need 144 more bonus answers. To find 144 more answers, I'm going to have to include some harder material.

Down and out in Quintana Roo · Thu Apr 26, 2012 10:59 am

Of course, if we have teams like Kealing playing on ACF Fall, and converting almost 18ppb (and 12.5 on ACF Regionals!!), then teams may inherently "demand" that questions get more difficult.

I pray that this does not happen.

the return of AHAN · Post by **the return of AHAN** » Thu Apr 26, 2012 1:57 pm

IMO, Kealing is the extreme outlier and really shouldn't be driving the MS canon. Even if they fail to win because, say, Longfellow is beating them on the same early clues, that's still not a reason to change. Ratcheting up the difficulty on the entire MS canon would only drive out the schools at the lower half of the continuum.
Put another way, my team was sent to the rail in the IESA State Series yesterday because we simply weren't fast enough on certain, canonical answer lines against Barrington Prairie. Very difficult answer lines were few and far between, and we largely killed them, but having enough difficult answer lines to separate us from Prairie would've driven the scores down even further almost everywhere else*.

* - A neighboring regional produced a champion whose winner didn't score more than 165 points in their 3 matches.

AKKOLADE · Post by **AKKOLADE** » Thu Apr 26, 2012 2:45 pm

The current middle school group of competitors is like an shrunk-down sample size of the current high school competitors: a few teams at the extreme far right of the Bell curve which should definitely not be the target audience of your average set.

It does get a bit trickier with nationals, but you shouldn't write for the elite 3 or 4 teams at the expense of the 50 after them.

marnold · Post by **marnold** » Thu Apr 26, 2012 3:12 pm

Fred wrote:It does get a bit trickier with nationals, but you shouldn't write for the elite 3 or 4 teams at the expense of the 50 after them.

Why? Isn't the point of nationals to distinguish teams at the top?

Post by **Important Bird Area** » Thu Apr 26, 2012 3:35 pm

marnold wrote:
Fred wrote:It does get a bit trickier with nationals, but you shouldn't write for the elite 3 or 4 teams at the expense of the 50 after them.
Why? Isn't the point of nationals to distinguish teams at the top?

Because no one wants to have a four-team national tournament. As long as the tossup leadins and the hard parts of the bonuses fairly distinguish those elite teams, the later clues in the tossups/easy and middle parts of bonuses absolutely should be written for the entire field.

marnold · Post by **marnold** » Thu Apr 26, 2012 3:46 pm

I guess I really don't know anything about the gap between contending and other teams, so if the skew is as big as that suggests I'll shut up.

Nick · Post by **Nick** » Tue May 08, 2012 12:43 am

Down and out in Quintana Roo wrote:
Nick wrote:In my opinion, this was the most appropriate (difficulty-wise) set I've seen at any of the NAQT national tournaments (MS, HS, ICT) that I've ever played or read. I think the bonus conversions back this up.

Also, I read Kealing's 925 game. It was crazy.
Nick, does that mean you're advocating for a HS team to get 27ppb at HSNCT? And many many schools in the 20s?

To answer your question- yes, I would prefer the percentage of Nationals teams with a 20+ ppb to be closer to ~18% (like msnct) than ~5% (like 2011 hsnct).

Also, I'm not a math person, so maybe someone else can explain this to me: what is the difference, with regards to distinguishing-top-teams, between the top two teams having ppb of 26 and 24 and the top two teams having a ppb of 22 and 20? If those are the choices, I'm choosing the former.

Lightly Seared on the Reality Grill · Tue May 08, 2012 1:30 am

Nick wrote:Also, I'm not a math person, so maybe someone else can explain this to me: what is the difference, with regards to distinguishing-top-teams, between the top two teams having ppb of 26 and 24 and the top two teams having a ppb of 22 and 20? If those are the choices, I'm choosing the former.

Here's an explanation courtesy of your friendly neighborhood math person:
A team that has 22 PPB scores 10% more points than a team that has 20 PPB.
A team that has 26 PPB scores 8.33% more points than a team that has 24 PPB.
So the two point difference suggests a stronger jump in skill at lower PPBs than higher PPBs. Therefore, a team with 22 PPB is more distinguished from a team with 20 PPB than a team with 26 PPB is distinguished from a team with 24 PPB.
It could be more complex than this (especially since there is a hard limit to PPB), but that's all I'm capable of at 1:30 in the morning.

Whiter Hydra · Post by **Whiter Hydra** » Tue May 08, 2012 2:12 am

Let's assume a normal distribution of bonus conversion. (In this example, I'm assuming mean=15, standard deviation=4, though it should work with whatever numbers you choose.) Of the teams that have over 20 PPB, 38% of them also had PPBs over 22. However, only 24% of teams that got >24 PPB also got 26 PPB. Therefore, it is a harder climb to get to each successive bonus conversion milestone if you're an above-average team.

If your PPB is below 15, however, it works in reverse. It's easier to go from 14 to 15 PPB than it is to go from 0 to 1 PPB.

(Note that this could just be utter BS, but anything to get out of studying for finals.)

Black-throated Antshrike · Tue May 08, 2012 2:49 am

William Crotch wrote:
Nick wrote:It could be more complex than this

Now you're just

imagining things

Stained Diviner · Post by **Stained Diviner** » Tue May 08, 2012 7:29 am

Nick wrote:Also, I'm not a math person, so maybe someone else can explain this to me: what is the difference, with regards to distinguishing-top-teams, between the top two teams having ppb of 26 and 24 and the top two teams having a ppb of 22 and 20? If those are the choices, I'm choosing the former.

There is no difference in distinguishing between those two teams. If the average for the tournament is in the vicinity of 15, then top teams getting 24-26 points are more easily distinguished from the average teams than top teams getting 20-22 points. It also means the top teams are more distinguished from the above average but not top teams.

The only way to have top teams getting 20-22 points and have bonuses that distinguish between teams is to have the average teams getting something like 12 points or fewer.

Scaled Flowerpiercer · Tue May 08, 2012 11:56 pm

Other people were trying to make statistical arguments, so...

For MSNCT, Standard Deviation of PPB was about 3.6856 , with a mean of 16.77
For HSNCT, Standard Deviation of PPB was about 3.9555 , with a mean of 12.546

So, the z-score (For anyone stat illiterate, this is the number of standard deviations away from the mean a value is, or "how outstanding" it is) of the top 4 MSNCT PPBs are

Kealing A: 2.596
Longfellow A: 2.2382
Barrington Station A: 1.842
Westminster A: 1.779

Whereas the z-scores for HSNCT (2011) were

State College A: 2.5796
Bellarmine: 2.415
Maggie Walker: 2.2257
Adlai E Stevenson: 2.1195

Based on this, there is actually greater differentiation among the top teams of MSNCT relative to the whole field than at HSNCT. However, after producing these numbers, I realized that way more teams come to HSNCT, so the data is "dilluted" by worse teams that are not at MSNCT. So, assuming that MSNCT has the 72 best middle school teams in attendance, I decided to calculate HSNCT stats based on the 72 highest ranked teams, then

SD = 2.69, mean = 16.712, and the z-scores would be

State College: 2.244
Bellarmine: 2.002
Maggie Walker: 1.723
Adlai E Stevenson: 1.567

Change from 1st to 2nd z-score: MSNCT: .358, HSNCT: .242
2nd->3rd: M: .540, H: .279
3rd->4th: M: .163 H: .156

And now the differences between the z-scores are very similar to those at MSNCT, though notably with a smaller difference between the top 2 teams still.

So, based on this, either way you slice it, MSNCT differentiated better between the top two teams.

EDIT: I did a lot of fancy math and then messed up subtraction in my first attempt.

Dominator · Post by **Dominator** » Wed May 09, 2012 10:16 am

The problem is, though, that bonus conversion is not what needs to separate the best team in the country from the second-best. Those two teams will be close enough in PPB that the match between them will be determined by tossups. If SC had 26 PPB and Bellarmine had 24, but Bellarmine converted twelve tossups to SC's eleven, Bellarmine wins that game. Even if you work in powers and negs, the point remains that tossups decide that match.

I think the biggest concern with bonus conversion being out-of-whack is that it fails to differentiate not between first best and second best, but rather first tier and second tier (or second tier and third tier, et cetera). Intratier games will be tossup battles like the Bellarmine-SC example above, but if bonus conversion between tiers of teams is too close, then all intertier games, from a statistical standpoint, becomes a tossup games as well. If bonus conversion is to be important to who wins a quizbowl match, it must be the case that a team can statistically lose in tossups but win with bonuses.

cvdwightw · Post by **cvdwightw** » Wed May 09, 2012 5:57 pm

Discounting negs and powers, and things like "five points per answer" bonus parts, every tossup earned is worth 10+Y points, where Y is a random variable that can take values of 0, 10, 20, or 30 points. (More precisely, it's 10+Y1+Y2+Y3, where Y1, Y2, and Y3 are random variables that take the value 10 with some probability p and 0 otherwise, where p changes depending on if it's an easy, hard, or middle part. However, without bonus-part-level data, it's near-impossible to get good estimates for p) We can generally assume that the tournament-wide team PPB is roughly the expected value of Y.

So we expect Team 1 to get 10+PPB1 points on each tossup answered correctly, and Team 2 to get 10+PPB2 points.

The question is, what happens if PPB1 and PPB2 are very big, and what happens if PPB1 and PPB2 are very small?

Suppose that Team 1 gets x tossup questions, and Team 2 gets x-n tossups. Then Team 1's expected total points is x*(10+PPB1) and Team 2's expected total points is (x-n)*(10+PPB2). If the expected result is a tie, Team 1 needs to score x = n*(10+PPB2)/(PPB2-PPB1) tossups. If x is less than that, then we expect Team 1 to have more points than Team 2; if x is more, then we expect Team 2 to have more points than Team 1.

So with Team 1 having 20 ppb and Team 2 having 25 ppb, Team 1 needs to score x = 7n tossups for their expected scores to be equal. Practically, this means that if Team 1 scores 7 tossups and Team 2 scores 6 tossups, or if Team 1 scores 14 tossups and Team 2 scores 12 tossups (no longer possible under NAQT rules), then the two expected scores are equal. However, if Team 1 scores 12 tossups and Team 2 scores 11 tossups, Team 1's expected score is less than Team 2's; and if Team 1 scores 12 tossups and Team 2 scores 10 tossups, then Team 1's expected score is more than Team 2's.

Consider instead Team 1 having 15 ppb and Team 2 having 20 ppb. Now Team 1 and Team 2's expected scores are the same if Team 1 scores 6n tossups. In other words, Team 1 with 12 tossups and Team 2 with 10 tossups would have each team's expected total points the same.

This can get especially pronounced at low PPB; for instance, teams with 6 PPB and 10 PPB, the scores are the same for 5n tossups scored by Team 1. In other words, Team 1 scoring 10 tossups and Team 2 scoring 8 would produce an expected tie.

Here's how to read the attached figure.

The x-axis is your opponent's bonus conversion. The y-axis is the number of tossups you need to get in order to expect to tie your opponent if you only get 1 more tossup than your opponent. Each solid line represents your bonus conversion. So the leftmost line represents if you had a bonus conversion of 4. Anything on the line, you expect to tie (note a step size of 1 in opponent's BC and 2 in your BC). Anything above the line, your expected score is less than your opponent's (you expect to lose). Anything below the line, your expected score is more than your opponent's (you expect to win). To give an extreme example, suppose that you had a bonus conversion of 4 and were playing an opponent with a bonus conversion of 18. If you scored 2 tossups to their 1, you would expect the same score. However, if you scored 3 tossups (and they scored 2), you would expect to lose.

As another example, suppose you expect to score 20 ppb, and your opponent scores 24. If you score 8 tossups (to their 7), you expect to win; if you score 9 tossups (to their 8), you expect to lose.

Two observations:
1) In games where lots of tossups are answered, you can win more with a bonus conversion disadvantage if your bonus conversion is high. For instance, if you answer 10 tossups to your opponent's 9, you can win a game in which your BC is 22 and your opponent's is 25, but you can't win if both BCs are shifted down by 10 (to 12 and 15, respectively).
2) Conversely, in games where few tossups are answered, bonus conversion differences are magnified (e.g, if your BC is 4 and you score 5 tossups to your opponent's 4, you can beat a team with a BC of 7; if your BC is 14 and you score 5 tossups to your opponent's 4, you can beat a team with a BC of 19).

What does this all mean? It means that a 1-tossup advantage is more important between teams with very high bonus conversions. Shrinking the parameter space of "reasonable bonus conversions" from 4-26 to 4-20 while keeping tossups still at a high accessibility means that you will overvalue bonus conversion compared to tossup conversion.

EDIT: tl;dr version: a 1-tossup advantage is much more important when fewer tossups are converted (see inset). Teams with high bonus conversions are better equipped to overcome a 1-tossup disadvantage against a team with lower BC than teams with low bonus conversions. Teams that are approximately equal to their opponent but get dealt a bad first few bonuses have the optimum strategy of trying to slow the game down as a faster game with more tossups gives them less opportunity to overcome their bonus disadvantage with a tossup advantage (see inset).

Dominator · Post by **Dominator** » Wed May 09, 2012 9:38 pm

cvdwightw wrote:What does this all mean? It means that a 1-tossup advantage is more important between teams with very high bonus conversions. Shrinking the parameter space of "reasonable bonus conversions" from 4-26 to 4-20 while keeping tossups still at a high accessibility means that you will overvalue bonus conversion compared to tossup conversion.

EDIT: tl;dr version: a 1-tossup advantage is much more important when fewer tossups are converted (see inset). Teams with high bonus conversions are better equipped to overcome a 1-tossup disadvantage against a team with lower BC than teams with low bonus conversions. Teams that are approximately equal to their opponent but get dealt a bad first few bonuses have the optimum strategy of trying to slow the game down as a faster game with more tossups gives them less opportunity to overcome their bonus disadvantage with a tossup advantage (see inset).

Yes, this was very thorough, but I can't tell if you are merely providing facts to be used in an argument or if you are presenting an argument. Your use of "overvalue" leads the reader to understand that you do not want bonus conversion to drive wins, which according to your data means that bonus conversions should be relatively high. Amirite?

cvdwightw · Post by **cvdwightw** » Thu May 10, 2012 2:47 pm

Dominator wrote:Your use of "overvalue" leads the reader to understand that you do not want bonus conversion to drive wins, which according to your data means that bonus conversions should be relatively high. Amirite?

You were right. Somewhat paradoxically, a further analysis has me justifying that conclusion for an entirely different reason, one you originally proposed.

One of the traits of NAQT play is that there are so many categories (especially "non-canonical" categories) that a good team is bound to run into something it has little to no knowledge of all the time. In some instances, they get lucky and it's their opponent's bonus; in other instances, they end up with the bonus on CS or hockey or sociology or whatever minor category they don't know much about.

A "top bonus conversion" of 20 means that for every one of those unfortunate bonuses, there are about 1.5 bonuses the best team should 30. A "top bonus conversion" of 25 means that for every one of those unfortunate bonuses, there are about 4 bonuses the best team should 30.

{1,1,19,3} is an extreme space of 24 bonuses converted {0,10,20,30} points for a 20 ppb average. {1,1,7,15} is an extreme space for a 25 ppb average. Suppose that a team is extremely unlucky and grabs the 0 point and 10 point bonuses in their first 2 tries, and grabs 10 total tossups to the other team's 9. The other team has the exact same space but avoids the "bad luck" bonuses - averaging 21.36 ppb in the 20 ppb case and 26.81 in the 25 ppb case. We'll round those down to realistic values of 21.11 and 26.67.

In the first case, the team with 10 tossups has a 29.1% chance of winning and a 76.4% chance of at least forcing overtime against a team with 9 tossups and a perfectly average "avoiding bad bonuses" conversion. In the second case, the team with 10 tossups has a 16.1% chance of winning and a 49.0% chance of at least forcing overtime. I haven't done the same analysis for 11 tossups-10 tossups or 12-11, but I'd expect similar results. Those are the only cases that really matter (with everything else, no matter what the bonus conversions are, the team with more tossups wins; tossup conversion equal, team with better bonus conversion always wins). So, in the case of compressed bonus conversion due to ridiculous hard parts, the team that grabs the "bad luck" bonus has a better chance of winning.

If 20% of the buzzes in the game are buzzer races (that's about 5 in this scenario) and the other tossups are split equally, then the "bad luck bonus" team has basically a 35.2% chance of winning (assuming overtime is 50% win) in the 20 ppb scenario and a 28.9% chance of winning in the 25 ppb scenario. The basic idea (that the "bad luck bonus" team has a <50% chance of winning if tossups are split equally except 50-50 buzzer races) remains no matter how many buzzer races there are.

So essentially, one team getting 2 "bad luck" bonuses and the other evenly-matched team getting 0 changes the chances of winning more when the two teams have higher bonus conversions. However, if one team is at 20 ppb and one team is at 25 ppb due to skill differences, then the "bad luck" bonuses have basically no effect on the 25 ppb team winning (the "bad luck" team wins ~99.8% of the time), whereas if both are at 20 ppb due to ridiculous hard parts, then the "bad luck" bonuses have quite a bit of an effect. Thus, making hard parts easier lessens the effect of the "bad luck" bonuses on matches between teams that aren't equally matched.

Stained Diviner · Post by **Stained Diviner** » Thu May 10, 2012 5:21 pm

I don't agree with the assumption going into Dwight's analysis, which is that differences in bonus conversion are due to luck rather than skill. If our goal is to avoid having matches decided by bonus conversion because it's all a matter of luck, then our solution should be to eliminate bonus questions.

I am of the opposite opinion. I believe that bonus conversion should matter. Furthermore, as shown by Dwight's number crunching, it is a challenge to get bonuses to matter because they only make a difference when the tossups result in a tie or near tie, or when the difference in bonus conversion is very large. I think that tournaments should aim for a bell shaped distribution with a large standard deviation so that bonuses do matter.

It comes down to competing theories as to why bonus conversion differs between teams. If it is due to luck, then you want bonuses not to matter, which is achieved by a small spread and to a lesser extent by high conversion. If it is due to knowledge, then you want bonuses to matter, which is achieved by a large spread and to a lesser extent by low conversion.

cvdwightw · Post by **cvdwightw** » Thu May 10, 2012 11:08 pm

Systematic differences in tournament-wide bonus conversion are due to skill. Game-to-game fluctuations in bonus conversion around that tournament-wide bonus conversion, which is what I was looking at in my last argument, are due to random luck. Similarly, to some extent, buzzer races are due to some combination of non-knowledge factors (priming, buzzer speed, and random luck).

My argument is twofold, and I think I buried the second part of the argument:
1) In matches between two evenly-matched teams, the "luck of the bonus draw" plays more of a factor in matches between two high-skill teams than matches between two lower-skill teams, because a team that gets one of the few bonuses on which it knows very little has difficulty making up the difference in bonus conversion necessary to win (whereas teams that have many such bonuses have a greater emphasis on converting tossups and are more likely to have their opponent stuck with the same kind of "bad luck" bonus).
2) Compressing the range of reasonable bonus conversions, by introducing prohibitively difficult hard parts, leads to games between two good teams being more likely to be decided based on the "luck of the bonus draw" and buzzer races than on any skill differences between the teams. In other words, you are overvaluing the random noise in the bonus conversion data because the skill difference has been compressed.

Basically, it's the exact converse of the "muddy battlefield" hypothesis, which argues that making bonuses too easy tells us nothing about the relative differences between two teams.

The Quizbowl Resource Center

MSNCT/HSNCT bonus difficulty discussion

MSNCT/HSNCT bonus difficulty discussion

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: MSNCT/HSNCT bonus difficulty discussion

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: 2012 NAQT MSNCT: April 21-22, Chicago

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion

Re: MSNCT/HSNCT bonus difficulty discussion