personal stats data analysis

Old college threads.
User avatar
AuguryMarch
Lulu
Posts: 80
Joined: Mon Nov 10, 2003 2:47 pm
Location: Pittsburgh, PA

personal stats data analysis

Post by AuguryMarch »

Hey all,

I was having a conversation with Andrew about personal stats, which led to the following idea.

PPG, as as been discussed ad naseum, is a poor way to measure contribution to team, or ability. (People racking up stats against shitty teams, etc etc.) The PATh stat is interesting in that it tries to take shadow effect into account, but I had an idea for a statistic that might measure player ability in another informative way.

With the amount of data that exists on round by round scoring, we now can compare the points you score against the win percentage of the team you play against. If you were to do a scatter plot of win percentage of team versus the player's points against them, you could then get an idea of whether there was any trend. A simple linear regression would yield a slope coefficient that would be very informative. If you had a positive slope, then that means that as the team you play against gets better, your points increases. (This is possible, but unlikely for most players). For most players, I would imagine that the slope is negative; that is, as the team you play gets better, your points go down. My contention is that, aside fro wanting to compose a team of people who score the most points, you want to optimize for most points balanced by players with the highest slope coefficients. I'm going to think about how to combine ppg and this slope coefficient to give a more balanced perspective on player skill. Any thoughts would be appreciated.

Also, if I get any free time, I'd love to run the regresssions for all players above, say, 10ppg at a recent tournament (illinois open? upcoming acf fall?) If anyone is interested in doing some of the grunt work, please let me know. Otherwise it might be a while before I get around to it.

Paul
User avatar
Skepticism and Animal Feed
Auron
Posts: 3238
Joined: Sat Oct 30, 2004 11:47 pm
Location: Arlington, VA

Post by Skepticism and Animal Feed »

For your team at IO:

Andrew = 13.69
Jerry = -34.39
Paul = -12.73

Special thanks to my Ti-83+. Unforunately, I won't be able to calculate this any further; the above is all I'm going to have time to do.
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
User avatar
grapesmoker
Sin
Posts: 6345
Joined: Sat Oct 25, 2003 5:23 pm
Location: NYC
Contact:

Post by grapesmoker »

This is an interesting idea. Some observations:

1. A significant upward slope (for example, in Andrew's case) would seem to indicate a larger knowledge base relative to the rest of the field, especially if the questions get harder as the tournament goes on. Of course, we know Andrew's knowledge base is larger; I'm just talking about what the data might say independently of other knowledge.

2. The intercept would seem to be a rough measure of how many points you would score against your own teammates per round, since point where win percentage is zero indicates the weakest team in the field.

3. A roughly steady or not too steeply declining slope indicates a basic core-level of knowledge in which the player is hard to beat. Meaning, someone might know one or two areas so well that he or she is unbeatable on those questions and can expect to land them almost every round (unless up against someone just as good).

Obviously, this does not account for the vagaries of the packets or the strength of one's teammates. I don't know how one would take the shadow effect into account for this. Maybe if I have some time, I'll look at the numbers from 2005 ACF Nationals and post the results.

Meanwhile, some more from Illinois Open (y = points scored, x = win percentage of opponent):

Matt Lafer: y = -19.256x + 62.436
Ryan Westbrook: y = -29.314x + 46.864
Seth Kendall: y = -69.066x + 99.342
Seth Teitler: y = -5.2029x + 52.445
Dave Rappaport: y = -3.3937x + 14.061
Adam Kemezis: y = -38.462x + 62.692
Will Turner: y = -21.38x + 43.835
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

Some further observations:

I used data from two UCLA teams at Technophobia and one UCLA team at WIT, neither of which were praised for high quality packets or consistent difficulty.

What I found was that the slope is a relatively bad predictor of player strength at low PPG levels (<10 PPG, perhaps <20). I had interesting intercepts such as -2.80 and 1.79 for two players with <10 PPG at Technophobia, although the players to whom those intercepts correspond both had significant positive slopes (8.10 and 9.06).

Another interesting phenomenon was that of splitting players up onto different teams. Charles and I played on the same team at WIT and different teams at Technophobia. Both of our negative slopes increased significantly from WIT to Technophobia. Some of this may have been due to a shadow effect. I am interested in seeing if this phenomenon holds when players switch from the primary point-scorer to a supporting role and vice versa, or when solid number one players get better supporting casts. If the change in slope when these scenarios occur can be quantified, then we may have a decent quantification of the shadow effect.

Lastly, inputting a theoretical opposition winning percentage of .500 into the regression equation proved very close to the actual ppg (despite opponents' winning percentages not adding up to .500). I have a feeling that this should hold for other tournaments of more consistent packet quality. If it does, it would be possible to use this statistic (the output of the regression equation at x = 0.5) as an accurate estimate of PPG, thus eliminating any need for combining PPG with the regression equation since it can be fairly well approximated from the equation itself.

Therefore the best players would be those with the optimal regression equation as a whole, not just the slope, and PPG could be totally discarded as a measure of player strength. Data for different combinations of players at different tournaments would be needed to normalize these slopes (both field strength and the shadow effect would come into play; however, the equation seems roughly independent of the quality of the packets).
User avatar
grapesmoker
Sin
Posts: 6345
Joined: Sat Oct 25, 2003 5:23 pm
Location: NYC
Contact:

Post by grapesmoker »

The problem with quantifying shadow effect is that it is dependent on your area of knowledge and the area of knowledge of those you play with. I'm interested in hearing if people have any ideas on how to take that into account.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

Jerry, there's no way to do easily that b/c it would require looking into what people got what post hoc.

What are we looking for here? Given just what people got (TU) in particular games, we're looking for a statistic that measures how well they perform indep of competition. One way is just to multiply how many pts she gets by the difference between the number of wins the team she is playing against and the number of wins her team has overall. This is a positive number if the the opposing team is better and if she got more than a TU for every neg. It's negative in other situations. So you sum up the pts(opp wins - your wins) for all your games, and you should get a measure of performance indep of comp. Anybody wants to try this?

As an aside, I believe that getting a neg entails more than just -5 for your team. For the most part, other teams (at competent levels) usually pick up 10 (not counting bonuses) after your neg, so the usual PPG scores are inflated if someone just buzzes in a lot. That is, it underestimates the effect of negs on the team performance, hence overestimating performance of individuals who neg a lot and get a lot of questions. I like to assume that for the most part, people from opposing teams will pick up the negs you take (for a tourney like ACF fall, e.g.), so that your "net PPG" is really (10*TU - 15*NEG) / GAMES. (I call it the Kohan index for no apparent reason.)

Ray, the UCLA one.
User avatar
AuguryMarch
Lulu
Posts: 80
Joined: Mon Nov 10, 2003 2:47 pm
Location: Pittsburgh, PA

Post by AuguryMarch »

Great dialogue. I really think we're onto something good here.

Something else just occurred to me. For players who in general score a lot of points, it somehow doesn't make sense to compare them to someone who scores very few. For example, the fact that Lafer drops 19 ppg from 62.. is that a meaningful comparison to me, who drops 12ppg from my meager 14 (yeah, yeah, i suck)? It seems that what you really want is to compare people's percent dropoff to really get a measure of their ability to get questions off of anyone. Note, this is a different statistic then the raw numbers as it abstracts away players' actual ppg, but is way better if you want to compare people's slope coefficients. So in this case, what you'd really want is to divide the slope coefficient by the intercept, which would yield the percent of your ppg that drops off in going from playing the worst to the best team.

I agree that this statistic is still incomplete because of the shadow effect. I'll try and think about that puzzle next.

Thoughts?
User avatar
Stained Diviner
Auron
Posts: 5085
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Post by Stained Diviner »

It seems to me that the unfair part of doing PPG during a tournament is due much more to the level of your teammates than it is to the level of competition. Over the course of a tournament, you go against a lot of different teams, and their abilities pretty much average out. You are always with your teammates, however, and that can make a huge difference.

I think it would be interesting to calculate the percentage of tossups a person answers when his teammates don't buzz in. That is, if your teammates buzz in (counting powers, regular +10s, and negs; not counting late wrong buzzes) on half the questions and you answer one-tenth of the questions, then you answered 20% of the questions that your teammates did not buzz in on. If your teammates buzz in on a quarter of the questions and you answer one-tenth of the questions, then you answered 13% of the questions that your teammates did not buzz in on. The 20% player is better than the 13% player. If this was thought out better, somebody could figure out a way to account for the fact that you are more locked out when somebody powers than when somebody gets a regular +10.

The slopes you are calculating above are a measure of how well somebody comes through in the clutch, in a way. It would be interesting to take two tournaments with a lot of players in common and measure the correlation coefficient of the slopes. (This is different than the correlation coefficients for each line, which could also help determine whether or not you are on to something. I am talking about making the x-coordinates each person's slope or y-intercept from Tournament A and making the y-coordinates the same people's slopes or y-intercepts from Tournament B.)
David Reinstein
Head Writer and Editor for Scobol Solo, Masonics, and IESA; TD for Scobol Solo and Reinstein Varsity; IHSSBCA Board Member; IHSSBCA Chair (2004-2014); PACE President (2016-2018)
MikeWormdog
Lulu
Posts: 61
Joined: Tue Apr 13, 2004 4:09 pm
Location: Yale University

Post by MikeWormdog »

As an aside, I believe that getting a neg entails more than just -5 for your team. For the most part, other teams (at competent levels) usually pick up 10 (not counting bonuses) after your neg, so the usual PPG scores are inflated if someone just buzzes in a lot. That is, it underestimates the effect of negs on the team performance, hence overestimating performance of individuals who neg a lot and get a lot of questions. I like to assume that for the most part, people from opposing teams will pick up the negs you take (for a tourney like ACF fall, e.g.), so that your "net PPG" is really (10*TU - 15*NEG) / GAMES. (I call it the Kohan index for no apparent reason.)
I would disagree with Ray's assertion, as negs are technically only 5 points more harmful than not buzzing in or buzzing in after the question is over and missing it. These aren't accounted for in stats, either.

Ray's Kohan index would equate a player who goes 5-2 (50-30=20) in one round with one who goes 2-0 (20-0=20). However, getting five questions is much better than getting two, even with 10 points lost with the negs plus whatever number of bonus points the other team got. Not ringing in equals not getting points for your team. Trying to imply that teams benefit from negs but not questions the other team didn't buzz in on (due to uncertainty or lack of knowledge, speed, whatever) isn't a fair system.

Back to the main topic, I am interested in what you all come up with, though I'm not sure if one stat can really show how "good" a player is. I think everyone who plays knows that points per game can be somewhat misleading as is PATH or whatever other stat you want to put out there.

I think simple playoff ppg or PATH (or failing playoffs, stats against teams that finished in the top tier at a tournament) or whatever are as good a statistical measure as any, since they show how someone fares against relatively level/good competition.

An alternative to that could be calculating a player's stats against teams that scored a certain number of points at a tournament --say, 200 ppg (this could vary from tournament to tournament depending on diffficulty). Simpler than regressions, I know, but I think these sorts of stats would be easier to work with. You could input any ppg value of opposing teams and figure out how well each player does against any type of competition (good, bad, average, whatever).

Mike
Last edited by MikeWormdog on Tue Nov 08, 2005 3:12 am, edited 1 time in total.
User avatar
Matt Weiner
Sin
Posts: 8145
Joined: Fri Apr 11, 2003 8:34 pm
Location: Richmond, VA

Post by Matt Weiner »

ReinsteinD wrote:I think it would be interesting to calculate the percentage of tossups a person answers when his teammates don't buzz in. That is, if your teammates buzz in (counting powers, regular +10s, and negs; not counting late wrong buzzes) on half the questions and you answer one-tenth of the questions, then you answered 20% of the questions that your teammates did not buzz in on. If your teammates buzz in on a quarter of the questions and you answer one-tenth of the questions, then you answered 13% of the questions that your teammates did not buzz in on. The 20% player is better than the 13% player. If this was thought out better, somebody could figure out a way to account for the fact that you are more locked out when somebody powers than when somebody gets a regular +10.
This is essentially how PATH worked.
Matt Weiner
Advisor to Quizbowl at Virginia Commonwealth University / Founder of hsquizbowl.org
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

MikeWormdog wrote:Ray's Kohan index would equate a player who goes 5-2 (50-30=20) in one round with one who goes 2-0 (20-0=20). However, getting five questions is much better than getting two, even with 10 points lost with the negs plus whatever number of bonus points the other team got. Not ringing in equals not getting points for your team. Trying to imply that teams benefit from negs but not questions the other team didn't buzz in on (due to uncertainty or lack of knowledge, speed, whatever) isn't a fair system.
Yes, I agree, and I'd like to explore this a bit. Not that this all depends on the effectiveness of your teammates. Going 2-0 in 2 real opportunities can be just as good as going 5-2 in 7 real opportunities, if you assume that taking yourself out of the equation, your team is as good as the other team on all remaining questions. Of course, that's not a safe assumption, but if your teammates are stronger than you, you'd have a more legitimate claim that your 2-0 is just as good. The Kohan stat takes only the net pt changes into account, and does not take into account the (obvious) observation that good players should buzz in more, b/c they're likely to get questions right.

(While we're on the topic, just want to pt out that when one negs early on a question, she's also preventing the opposing team from negging, whereas if she gets a question right later on a question, the saving for the opposing team is smaller. What I mean is, depending on when the question is gotten, the opportunity for the other team to neg may be reduced. If a priori you believe your team is better, then you should try not to neg early b/c the other team can very well neg later on--another obvious conclusion.)

In general, it just bugs me that some aspect of negging and turning the question over is not taken care of by the stats. When you get a TU right, no one hears anything else. When you get it wrong, the other team hears the whole TU (for the most part), and there's got to be some advantage gained through that. Now that I think about it, is it really that bad that you go 2-0 vs. 5-2 if a priori your team and the opponent's bonus conversion can not be differentiated? In one case, you get an opportunity to score 60 on bonuses, on the other you get a net gain of 30 opportunity pts over your opponent (90 for you, 60 for the other). I guess if you believed your team has more knowledge, then you'd prefer the 5-2. O/w you'd take your 2-0 and hope your teammates beat the other team. Of course, if you play solo or with less than the full team, then you'd definitely prefer going 5-2.

Ok, this reply has gone on long enough, I don't have any great solns to this value determination problem either, may be a mathematician can help us out, b/c I don't know much about game theory. As a computer scientist, I'd code this up as a reinforcement learning problem and apply value iteration followed by value determination using the scoring as state transitions, but this would take a nontrivial amt of work to formulate. May be the AI people out there can help. But I think my earlier idea of calculating sum(ppg*(opp wins - team wins)) might still do some good.

BTW is there a way to get SQBS to calculate PATH?
User avatar
ValenciaQBowl
Auron
Posts: 2558
Joined: Thu Feb 05, 2004 2:25 pm
Location: Orlando, Florida

Post by ValenciaQBowl »

I want to start by saying I don't understand almost any of the above discussion's technical takes on calculations of stats but understand the desire to find concrete ways to measure players' worth/quality/skill/etc. However, at one level, I wonder if any such measure is possible. On my best Valencia teams, my best player, Jim, was very aggressive and tended to pick up nearly all of the dead ducks. Amy and Elissa, strong players in their own right, didn't particularly care if he did so after negs, which put them even further into his "shadow." And after Jim left, Amy often would let dead ducks get picked up by her teammates if she figured they could, which probably brought them more out of her "shadow."

Then there are the times I've been fortunate to team with people like Kelly McKenzie, Seth Kendall, and/or Raj and have purposely sat briefly on things I thought they would know (particularly if we weren't playing the strongest teams) to ensure that I wouldn't neg on something that's in their strength. Such behaviors are pretty common on the circuit, I'm guessing.

In other words, no statistical measure of the strength of players who team with elite scorers will be able to take such factors as deference (or lack thereof) or less concern about who gets the toss-up into account. However, teammates usually know who's important and who's not, and many things go into such a determination.

Nonetheless, I look forward to seeing what y'all figure out.
--cborg
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

one thought on negs....there are certain situations where it makes sense to buzz extremely aggressively and I'm not sure how'll account for them.

for example, I have played virtually the entirety of my career without a teammate with legitimate science knowledge (that includes when I had teammates :))...in that situation, when playing teams with legitimate science players, I tended to often buzz relatively early on science questions with a "best guess" under the rationale that a -5 was well worth taking 75% of the time when compared to a potential 80 point swing, since otherwise we would never get the tossup anyway.

I don't think that ppg (let alone a metric increasing the effect of negs) accurately reflected the utility of this strategy.
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

Getting back to Paul's slope coefficient idea. Do you think it overemphasizes complacency? I mean, to get a positive slope, you could also just do poorly against the worst teams. I know I tend to do this, just b/c I don't feel like I have to perform very well for us to win those games, and sometimes just take TUs off esp on stuff not in my area of knowledge, and this would definitely make the slope more positive. Also, don't these slopes result from regression fits with different correlation coefficients? Does it make sense to compare people based on this statistic with different levels of confidence (which essential result from diff r^2)? (I don't want to poke holes in this--best QB discussion in months in my humble opinion, just hoping to pt out some things you could elaborate or develop.)

Ray.
User avatar
grapesmoker
Sin
Posts: 6345
Joined: Sat Oct 25, 2003 5:23 pm
Location: NYC
Contact:

Post by grapesmoker »

Here's an idea for taking into account shadow effect that I came up with during lecture this morning.

I start with a possibly controversial, though I believe roughly true, assumption which does not depend on any specific knowledge of any given player's ability. That assumption is as follows: if a player contributes P percent of a team's total points, then if any other player is removed from the team, the first player will, on average, pick up P percent of the questions that would have been answered by the player no longer on the team. In other words, you have as much of a chance to pick up tossups from a player who was but is no longer shadowing you as you do answering a tossup for your team in general.

Now, obviously this isn't a statement that actually holds true for every single situation. There are variabilities within question sets and within teams. Nonetheless, I am going to make what I will call the ergodic quizbowl hypothesis and propose that those variabilities average themselves out over the course of a tournament. The postulate I begin with may not be true for any single given situation, but I would like to contend that it is true on the average.

How does this look mathematically? If the player indexed by i scores P_i points, then the shadow effect of player j on player i is given by:

(1) (P_i * P_j)/(sum from k = 1 to N of P_k) where N is the total number of players on the team. I prefer to work with PPG, but we could of course use total points instead; it wouldn't change anything.

Now, a few arguments in favor of equation (1). First off, note that the equation is symmetric under interchange of P_i and P_j. This is good, because since shadow effect results from knowledge overlap, we would expect it to correspond to the intersection of the sets of knowledge of player i and player j, which intersection is also symmetric under interchange of index. Secondly, one glance at the dimensions tell you that the shadow factor has dimensions of points and that a really good player and a really bad player shadow each other (by the virtue of the symmetry) for the same (small) total number of points. This makes sense too, because you would not expect a strong player to somehow have a ton of shadow from a weak player, nor vice versa, since the knowledge set of a strong player is much larger than that of the weak player, so the relative size of the intersection (relative to one's total knowledge) is correspondingly smaller. Another consequence of this formula is that in general, one can never remove a player from one's team and expect the team to profit from it unless the player puts up negative points. The point of all this is to show that this equation is consistent with our common-sense understanding of shadow and that it does not produce any pathological results.

How can we systematically apply this idea? It seems that a controlled experiment would involve the same team playing one tournament with four players and another tournament with three and then comparing the result. Another comparison can be made between teams that switched one of their players between two tournaments. I'll do an example here:

At the 2004 ACF Nationals, Seth Teitler had 34 PPG and I had 25.5 (for the stats that were kept). The total sum of points per game scored by my team on tossups was 119.5 PPG. Therefore, I scored about 21.3% of my team's tossup points. If Seth were removed from the team, at that event I could expect to gain 0.213*34, or about 7.3 PPG, based on the naive assumption that my chance of getting Seth's questions is proportional to my total contribution to the team.

Of course, this showcases a problem with the method. Seth and I are both physics types, so I would be much more likely to pick up any questions that were left as a result of his absence than, say, Jeff Hoppes would be. But still, it's not too far fetched. At the 2005 ACF Nationals I had 44.58 PPG, a gain of 19.08. So at first glance, it looks like I profited more than the above calculation would indicate from the replacement of Seth with Ray Luo. But of course, cross-year comparison's are hard to make because knowledge is gained in the intervening period. Nevertheless, as an order-of-magnitude calculation, I think this method is sound.

One way to possibly increase the reliability of this shadow metric is to do the linear fit talked about before, and then utilize the intercepts instead of strict PPG. The benefit here is that if I'm right in my interpretation of the intercept, then it would indicate how well you do against your own teammates rather than any other team. Unfortunately, data for 2004 ACF Nationals is not available in SQBS format, so I don't know how many points who scored in each individual game. I still think it would be an interesting analysis to do if someone could get hold of the appropriate data sets.

Jerry
Last edited by grapesmoker on Tue Nov 08, 2005 5:42 pm, edited 1 time in total.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

I guess the cool part about Jerry's idea is that you could calculate these stats for every pair of players on your club, then decide how to field a team. You'd just minimize the total amt of shadowing for every pair in your group of four. Any how, here're the IO standings for the top 8, based on the Kohan index I naively proposed earlier. I think this suggests that shadowing will be a bigger factor than this simple -10 stat I proposed, since ranking movements were minimal, and Kohan will reflect PPG in most situations (but if you take a data set like WIT...).

1. Seth-Kentucky, 58.5.
2. Andrew-Somewhere, 54.5.
3. Andrew-Philosophers, 47.0.
4. Seth-Chicago Zeus, 46.0.
5. Matt-Christ, 43.0.
6. Adam-Mich, 33.0.
7. Will-Mich, 28.0.
8. Ryan-Christ, 21.5.

(Finally, for fellow ex-Berkeleyites, does using the Kohan index mean that Selene Koo was the greatest QB player ever?)

Ray.
csrjjsmp
Lulu
Posts: 47
Joined: Wed Nov 10, 2004 6:46 am

Post by csrjjsmp »

ReinsteinD wrote: I think it would be interesting to calculate the percentage of tossups a person answers when his teammates don't buzz in. That is, if your teammates buzz in (counting powers, regular +10s, and negs; not counting late wrong buzzes) on half the questions and you answer one-tenth of the questions, then you answered 20% of the questions that your teammates did not buzz in on. If your teammates buzz in on a quarter of the questions and you answer one-tenth of the questions, then you answered 13% of the questions that your teammates did not buzz in on. The 20% player is better than the 13% player. If this was thought out better, somebody could figure out a way to account for the fact that you are more locked out when somebody powers than when somebody gets a regular +10.
That seems to fail if you have the same player getting the same questions right, but with different teammates.
User avatar
setht
Auron
Posts: 1205
Joined: Mon Oct 18, 2004 2:41 pm
Location: Columbus, Ohio

Post by setht »

recfreq wrote:(Finally, for fellow ex-Berkeleyites, does using the Kohan index mean that Selene Koo was the greatest QB player ever?)

Ray.
"Was"?
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

setht wrote:
recfreq wrote:(Finally, for fellow ex-Berkeleyites, does using the Kohan index mean that Selene Koo was the greatest QB player ever?)

Ray.
"Was"?
Was for a while, then you came along. Was that what I was implying? I mean, is that?

BTW just an anecdote, the Kohan terminology comes from a certain Berkeley player who would have set records for lowest scores according to his namesake index for performing with reckless abandon during tournaments (in all of the history of qb, you will not find a lower Kohan scorer). Interested parties are directed to the UCLA lexicon (http://quizbowl.bol.ucla.edu/).

Ray.
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

recfreq wrote:Also, don't these slopes result from regression fits with different correlation coefficients? Does it make sense to compare people based on this statistic with different levels of confidence (which essential result from diff r^2)?
Almost every regression line I have fitted has an r^2 less than 0.3 (my regression line for WIT has an exciting r^2 value of 0.0002), which means that we would pretty much have to throw everything out if we wanted to use confidence levels in this analysis.
grapesmoker wrote:It seems that a controlled experiment would involve the same team playing one tournament with four players and another tournament with three and then comparing the result.
We would also have to control for questions of approximately equal difficulty. A 25 ppg player on ACF Fall questions may drop to a 15 ppg player on ACF Regionals questions (a significant drop, in my opinion) just because of a smaller knowledge base, even if all other things (field, teammates, etc.) are held roughly constant. Accordingly, a controlled experiment that could reliably and accurately measure this is almost impossible.

I started to work with the results from last year's ICT, which are in SQBS format. I realized NAQT has two major differences from mACF: power 15s and variable numbers of tossups per round. Both of these would definitely affect the formula. Powers can be taken out of the equation by treating them as 10-point questions, but how do you correct for different numbers of tossups per round? Go with PPTH in each round, or possibly normalize to PP20H? I'm not confident either of these methods will work quite as well as for untimed tournaments. Let's leave NAQT out of this until we can come up with a good measure for mACF tournaments, then adapt it for NAQT.

It seems that the variability we are going to get in any measure of player strength is going to be massive (at least partially due to the variation in packet difficulty from tournament to tournament, or in tournaments not well edited, from round to round). We're just going to have to ignore it until we can formulate a good model and say what is indicative of what, and then work on ways to reduce it.
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

Another justification for using the Kohan index. If you're scoring one tossup for every neg, then you're not really doing that well, but your PPG will always be positive, so you're making a positive contribution to the team? With the Kohan index, we make sure to tell you that, no, you're probly not, and it's reflected in your negative Kohan score.

Bye.
Ray.
User avatar
Stained Diviner
Auron
Posts: 5085
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Post by Stained Diviner »

That seems to fail if you have the same player getting the same questions right, but with different teammates.
A. I don't think that that would happen very often except in the case of specialists who don't go outside their own field. In that case, these overall measures are of limited value anyways.
B. If it did happen, it would still be true that one person may have had to buzz in earlier on those tossups they got, and the other person may have heard more of the tossups they missed. The person on the weaker team should have gotten more tossups.
David Reinstein
Head Writer and Editor for Scobol Solo, Masonics, and IESA; TD for Scobol Solo and Reinstein Varsity; IHSSBCA Board Member; IHSSBCA Chair (2004-2014); PACE President (2016-2018)
User avatar
grapesmoker
Sin
Posts: 6345
Joined: Sat Oct 25, 2003 5:23 pm
Location: NYC
Contact:

Post by grapesmoker »

cvdwightw wrote:
grapesmoker wrote:It seems that a controlled experiment would involve the same team playing one tournament with four players and another tournament with three and then comparing the result.
We would also have to control for questions of approximately equal difficulty. A 25 ppg player on ACF Fall questions may drop to a 15 ppg player on ACF Regionals questions (a significant drop, in my opinion) just because of a smaller knowledge base, even if all other things (field, teammates, etc.) are held roughly constant. Accordingly, a controlled experiment that could reliably and accurately measure this is almost impossible.
I don't see why we would need to control for difficulty in order to find relative shadow. After all, if the questions are different in difficulty from set to set, they are different for all the players involved. If other variables are held roughly constant, such as teammates and field, the relative shadow should remain unchanged.

I also disagree that we can't do a controlled experiment. There are many tournaments throughout the year that are roughly the same in difficulty. PUBfest, WIT, Technophobia, Bulldogs over Broadway, and BLaST, are traditionally roughly the same in terms of difficulty. Illinois Open is closer to ACF Regionals. ACF Nationals can only be compared with itself or similar tournaments like last year's Manu or Chicago Open.
I started to work with the results from last year's ICT, which are in SQBS format. I realized NAQT has two major differences from mACF: power 15s and variable numbers of tossups per round. Both of these would definitely affect the formula. Powers can be taken out of the equation by treating them as 10-point questions, but how do you correct for different numbers of tossups per round? Go with PPTH in each round, or possibly normalize to PP20H? I'm not confident either of these methods will work quite as well as for untimed tournaments. Let's leave NAQT out of this until we can come up with a good measure for mACF tournaments, then adapt it for NAQT.
Again, doesn't seem like it would matter for the purposes of shadow by the same argument as before.
It seems that the variability we are going to get in any measure of player strength is going to be massive (at least partially due to the variation in packet difficulty from tournament to tournament, or in tournaments not well edited, from round to round). We're just going to have to ignore it until we can formulate a good model and say what is indicative of what, and then work on ways to reduce it.
I don't think that's the case. I think we are placing too much emphasis on the variability of tournament sets. It seems to me that if we take shadow effects into account the way I described, we should find that roughly the same players come out on top because the variabilities should just average themselves out over time. I'll admit that I don't have a lot of confidence in the ability of the method to distinguish between middle-of-the-pack players. The resolution is limited there. But if we're talking about the top 10 or 15 at any tournament, we should be able to say something significant about that if we combine it with the regression fit. I haven't thought of a good way to do it yet, but I'm working on it.
Jerry Vinokurov
ex-LJHS, ex-Berkeley, ex-Brown, sorta-ex-CMU
presently: John Jay College Economics
code ape, loud voice, general nuissance
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

grapesmoker wrote:I don't see why we would need to control for difficulty in order to find relative shadow.
Ah. I misunderstood the purpose of the experiment. I thought we were testing the validity of the regression line rather than the corrections due to shadow effect.
It seems to me that if we take shadow effects into account the way I described, we should find that roughly the same players come out on top because the variabilities should just average themselves out over time.
This was more in response to Ray's post about needing a certain confidence interval, which just isn't going to happen.
Nathan wrote:when playing teams with legitimate science players, I tended to often buzz relatively early on science questions with a "best guess" under the rationale that a -5 was well worth taking 75% of the time when compared to a potential 80 point swing, since otherwise we would never get the tossup anyway
I think this is a strategy which cannot be taken into account by any non-subjective measure. The higher your team's bonus conversion, the more sense it makes to risk this neg, since by waiting you're virtually guaranteeing the other team points. The expectation value of this buzz varies with the confidence you have in your guess.

Suppose we were actually able to figure out that a certain player's guesses were right some P percent of the time (this would encompass all buzzes not founded on pure knowledge, but some combination of knowledge and guesswork). Then if P > 500/(BC+15), where BC is your team's bonus conversion, it makes sense to buzz in.

For instance, if you think your team can get 10 points on the bonus, then if you're more than 20% positive your guess is right, you should go ahead and buzz in, because in the long run this strategy will gain you points. This is perhaps a surprisingly low number, but it follows from the assumption that the other team will get the question right regardless of if you buzz in or not.
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

"For instance, if you think your team can get 10 points on the bonus, then if you're more than 20% positive your guess is right, you should go ahead and buzz in, because in the long run this strategy will gain you points. This is perhaps a surprisingly low number, but it follows from the assumption that the other team will get the question right regardless of if you buzz in or not."

Exactly. And I think people have always instinctively understood that negs are more acceptable against very good teams than against very bad teams. Indeed, I would suggest that true "sharpshooters" (those who put up 49-0 stats in tourneys) are by definition not playing anywhere near up to their full potential. If you never have a neg, you are not buzzing fast enough and in my (admittedly anecdotal) experience...true sharpshooters tend to put up poor numbers against good teams.

btw, aside to Ray -- I think you're wrong, check the stats of either myself or T. Andy Wang to see lower Kohan metrics.
User avatar
Stained Diviner
Auron
Posts: 5085
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Post by Stained Diviner »

For instance, if you think your team can get 10 points on the bonus, then if you're more than 20% positive your guess is right, you should go ahead and buzz in, because in the long run this strategy will gain you points. This is perhaps a surprisingly low number, but it follows from the assumption that the other team will get the question right regardless of if you buzz in or not.
Exactly wrong. This assumes that there is no chance that one of your teammates will answer the question correctly or that the other team will make a mistake because they buzz in too early.

I agree that people who never neg are playing below their potentials, but somebody who gets half of their live buzzes correct is not helping their team significantly because they are giving the other team almost as much as they are taking for their own team. (I'm defining live buzz as one that takes place before the other team and more than a second before time is called. After the other team buzzes, we all know to wait, and a second before time is called anything goes.)
David Reinstein
Head Writer and Editor for Scobol Solo, Masonics, and IESA; TD for Scobol Solo and Reinstein Varsity; IHSSBCA Board Member; IHSSBCA Chair (2004-2014); PACE President (2016-2018)
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

"Exactly wrong. This assumes that there is no chance that one of your teammates will answer the question correctly or that the other team will make a mistake because they buzz in too early."

I have played on numerous occasions where that exact assumption is warranted (besides, it doesn't have to be "no chance" -- I'm sure that one could compute the effect of a 5% chance of either and demonstrate that you should instead buzz if you are 25% likely to get the answer right).
User avatar
Captain Sinico
Auron
Posts: 2675
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Post by Captain Sinico »

The simple model proposed fails because it neglects the utility lost by failing to hear the remainder of the question. For example, if I know physics pretty well (say I can be 20% certain off any leadin, 50% certain after midrange clues, and 100% certain from the giveaway) but my opponents don't know it so well (they're 10% until the giveaway, after which they're 40%), the model dictates there is positive value for buzzing after the leadin in some circumstances, but this is clearly not the case.

MaS
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

The model assumes that in all cases the chance someone on the other team will get the tossup is significantly greater than the chance someone on your team will.

Good, pyramidal questions should and usually will differentiate between teams with more and less knowledge on the subject, as measured in the "certainty" that the answer you are thinking of is correct, but this difference will take on a range of values throughout the question. The difference between a team that is good in an area and a team that is not as good in an area should not be prominent during the opening clues, since these are supposed to be the "hardest" clues. Towards the middle of the question the difference should become very large, as one team is more and more certain of the correct answer while the other's certainty lags behind. By the end of the question, if it is something both teams are now certain of, then it becomes little more than a buzzer race.

Now, if I (or a teammate) am a good player on this subject, I should wait until I am more certain, because buzzing in with a higher certainty earns positive points more of the time. However, if my entire team is bad at a subject, and your team is good at this subject, then your certainty level is much greater than mine and if we both wait until someone on our respective teams is, say, 90% certain, then your team will always beat mine.

This implies that the only ways for me to beat your team to questions on this subject are either to have you neg or to buzz when I am less certain. If you buzz in when you are 90% certain, then 10% of the time you will neg, and assuming I am 100% off the giveaway, my team will get the question right 10% of the time. However, if my teammates are essentially zero percent certain off the giveaway (my knowledge on this is significantly greater than each of theirs), and I buzz in when I am 40% certain, then my team will lose 5 points 60% of the time but gain 10+bonus points 40% of the time. If I employ the first strategy, I expect to gain .1(10+B) points, where B is some arbitrary number of bonus points roughly based on bonus conversion. If I employ the second strategy, I expect to gain .4(10+B)-.6(5) points. These two strategies are equal when I expect no bonus points, and in all cases where I expect to get any points on the bonus, the second strategy is a better bet. Furthermore, if I am less than 100% certain off the giveaway, the the strategies are equal when I expect negative bonus points, so the second strategy always benefits my team more than the first.

In addition, you stand to gain more points by waiting until you are 90% certain than by letting my team buzz in at 40% certainty. Therefore, by buzzing in early, I both increase my team's expected points and decrease that of your team, and thus this is a sound strategy for me. However, this strategy would make no sense for you.
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

grapesmoker wrote:How does this look mathematically? If the player indexed by i scores P_i points, then the shadow effect of player j on player i is given by:

(1) (P_i * P_j)/(sum from k = 1 to N of P_k) where N is the total number of players on the team. I prefer to work with PPG, but we could of course use total points instead; it wouldn't change anything.

Of course, this showcases a problem with the method. Seth and I are both physics types, so I would be much more likely to pick up any questions that were left as a result of his absence than, say, Jeff Hoppes would be. But still, it's not too far fetched.
One problem with this is that it does not account for aggression. For instance, let's assume that Player A and Player B have roughly the same knowledge overlap with Player C. Furthermore, Player A and Player B have roughly similar PPG, but A is a much more aggressive player than B. I would argue that because A is buzzing in on more questions than B is (despite having the same PPG), C will expect to score more points if A is removed from the team than if B is.

I propose an alternative method:
Let T_k be the number of tossups answered correctly for each person on the team.
Let G_k be the number of negs answered correctly for each person on the team.

Then a new measure of "expected gain" would be
(2) [10*(T_i)/(sum from k=1 to N of T_k)*(G_j + T_j) - 5*(G_i)*(G_j)/(sum from k=1 to N of N_k)]/(games)

This method is fairly close (but I don't think equivalent) to adding a correction factor of [10*(T_i)*(G_j)/(sum from k=1 to N of T_k)]/(games) to the original proposed formula, to compensate for the idea that a player's negs take away both tossups and negs.

Actually, the more I think about this, it makes sense that a player will absorb some percentage of tossups as tossups and some percentage of negs as negs. However, a player will also absorb some percentage of tossups as negs and some percentage of negs as tossups. This would require two "correction factors" to Jerry's formula, and I'm not sure I have the right correction factor for the one I propose. I'll think about this more, but the formula definitely should have two correction factors built into it.
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

Dwight, I don't get the same answer as you. Assuming 1 on 1 game, and the fact that I don't care about the future, i.e. even though I have more clues coming, I just want to buzz now (which is essentially what you're assuming), I'd buzz when the expected gain > 0:

P_t(10 + BC1) - (1-P_t)(5 + 10 + BC2) > 0

I write P_t b/c it really depends on when in the question I buzz in. This simplifies to:

(25 + BC1 + BC2) P_t > (15 + BC2)

So if we suck (BCs are zero), I should buzz when P_t > 0.6. If we're both great (BCs are 30), I should buzz when P_t > 0.53. If my BC1 > your BC2, then I should buzz with a lower P_t, b/c I could just make up by mistakes by scoring the BO--i.e. I just keep buzzing til I get a bonus, then I score big. If my BC1 < your BC2, then I should stay conservative, b/c o/w I would turn it over and you'd wait til the end to get the TU then kill me on the BO. This is not what is predicted earlier.

Note how wierd these results are! This is b/c I assumed that I don't care about P_{t+1}, P_{t+2}, ..., P_T (end of question). Even if I were knowledgeable, I should wait and get the TU with higher P_t and still grab that tasty BO. But our model ignores it, and just says, well, just buzz now and hope to get it right. I think to do this right, you'd write a equation involving P_{t+1} = P_t / a, for 0<a<1, then add the P_{t+j} terms into the equation above and find when this is > 0, solving for P_t. I don't have the time or energy to do this right now, since I've got to pack for a trip to Wash DC. Anybody want to work this out?
Ray Luo, UCLA.
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

Come to think of it, but last suggestion is not so good. All I can come up with is that using that very simple formula, you'd actually buzz in earlier in a 1 on 1 game if you can score well on bonuses, b/c the bonus is worth taking the risk for. One should take other teammates, difficulty of question, and how fast clues would increase your P_t. I suspect that the reason we'd wait is to get a clue where P_t changes from 0.1 to 1.0, b/c P_t certainly doesn't rise linearly (or smoothly). Given that there's no way to predict those things in a question, I resolve to claim that QB is all about finding when to buzz when your P_t = 1, b/c come to think of it, P_t = 0.4 really doesn't make sense, it's a categorical decision problem, either Andrew Jackson, or not Andrew Jackson, and if I'm sitting on Andrew Jackson with P_t = 1, then shame on me.
Ray Luo, UCLA.
Rothlover
Yuna
Posts: 815
Joined: Wed Feb 25, 2004 8:41 pm

Post by Rothlover »

Anyone trying to put any of this to work on a larger field, like ACF nats or ICT (given any of the proposed values)?
Dan Passner Brandeis '06 JTS/Columbia '11-'12 Ben Gurion University of the Negev/Columbia '12?
User avatar
Captain Sinico
Auron
Posts: 2675
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Post by Captain Sinico »

Right. Saying "you have to guess if you want to beat someone who knows a lot more than you do" is hardly very novel. What I'm saying is, any model that just looks at points and certainty at a point is inadequate because it will always fail to account for additional utility from hearing more of the question. I'll make an attempt at a more complete one. If you buzz at a point where the probability of your being right is p, your bonus conversion is b, your opponent's probability of knowing the question by the end is p', your opponent's bonus conversion is b' and the probability that, at a later point in the question, you would have gotten the question is l, the utility of your buzz in points is (10 + b)*p - (1 - p)*[5 + p'*(10 + b')] - l*(10 + b) + (1 - l)*[5 + p'*(10 + b')] (the 5 in the term of (1 - l) assumes that you neg if you're wrong later.) You should buzz at the point of maximum utility; that is the absolutely optimal strategy which maximizes your utility (i.e. the number of points you can expect to score.)
Therefore, for example, if you're not very certain now (p is low) but you don't think the other team is (p' is low) and you think you'll have a better chance later (l is high), you should buzz later. If l ever drops below 1/2, you should be buzzing. Obviously, maximizing this utility is a lot more complex than simply buzzing when p > 500/(b+15), but this utility is a much better representation of the actual optimal point to buzz, because it accounts for the utility (significant) lost by forgoing the ability to buzz on the rest of the question.
However, this doesn't really matter in practice, usually, since, as has been noted, what I've called p tends to go from negligable ("this is an English poet") to damn near unity ("this is the author of “Go, lovely Rose!â€
User avatar
cvdwightw
Auron
Posts: 3291
Joined: Tue May 13, 2003 12:46 am
Location: Southern CA
Contact:

Post by cvdwightw »

I think the whole idea of my first post on this topic of "when to buzz" was that if p'>p for all points in the question (thus l'>l), then there are circumstances where taking the calculated risk of negging makes a lot of sense, and therefore we cannot treat all negs as being equally bad.
ImmaculateDeception wrote:Basically, p should only be taking intermediate values if...the question's vague.
Then I guess I should assume that this model works fairly well for CBI.
User avatar
Captain Sinico
Auron
Posts: 2675
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Post by Captain Sinico »

cvdwightw wrote:I think the whole idea of my first post on this topic of "when to buzz" was that if p'>p for all points in the question (thus l'>l), then there are circumstances where taking the calculated risk of negging makes a lot of sense, and therefore we cannot treat all negs as being equally bad.
See, any time l'>l, you should be buzzing.
cvdwightw wrote:Then I guess I should assume that this model works fairly well for CBI.
No. CBI's use of hose questions, questions that don't ask for anything, and questions without uniquely identifying clues (that's pretty much all of their questions) makes it one of the few formats in existence where p can take intermediate values.

MaS
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

If you don't know much about a subject but have "qb memory" then there are plenty of occasions where p has an intermediate value.

i.e. the question is clearly going for an equation and something about blood flow comes up....I'm going to buzz with "Henderson-Hasselbach" and in qb terms I would wager there's at least a 50% chance that's the correct answer.
User avatar
Captain Sinico
Auron
Posts: 2675
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Post by Captain Sinico »

... No. As I said, if you don't know anything, then the probability that your buzz will be right is hard for you to determine but (for a good question) you can be sure it's pretty low. If you know, for example, that the Hagen-Pousille equation (I assume that's what you meant...) was developed to describe blood flow, then you can try buzzing off that clue; I wish you luck. However, a question with that as an early clue is a bad question precisely because it makes such a buzz profitable (you're necessarily guessing if that's an early clue since it's not uniquely identifying even among "canonical" answers, but you'd be unwise not to do so because a preponderance of the time, the question with that clue will have that answer.)
However, if you're badly outgunned, knowledge-wise, then guessing early is a good strategy, since you will probably lose later in the question. Again, this isn't anything novel, strategy-wise; maximizing the utility of the buzz as I've defined it reflects what everyone has known to be the optimal strategy, as it must independent of everything if it's a good equation. The art of playing lies in evaluating the parameters quickly, but the utility comes out of simple definitions, assuming only that your objective is to score the most points (i.e. Edwards' principle.)

MaS
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

I meant Henderson-Hasselbach -- which is indeed used in some relationship to human blood flow (ask a med student)...and that clue comes up frequently.

thus, throw in the Freeburg corollary: most qb questions are bad questions and one should only assume pyramidity with certain editors.
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

a quick google search demonstrates that the Henderson Hasselbach equation has something to do with calculating blood ph and is indeed often used in a medical environment.
User avatar
Stained Diviner
Auron
Posts: 5085
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Post by Stained Diviner »

There are few rounds that don't have any points to make a reasonable guess. From http://www.naqt.com/samples/t1162packet_1.pdf, you get:
>>>
name this novel in which Edward nearly loses the throne to his double

It is best known as the site of a September 1950 amphibious landing

name this twin brother of Louis XIV imprisoned in metallic headgear

Quantized in units of Planck's constant over two pi

one step closer to total dissolution in January 2003

as his name comes from the Phoenician for “lordâ€
David Reinstein
Head Writer and Editor for Scobol Solo, Masonics, and IESA; TD for Scobol Solo and Reinstein Varsity; IHSSBCA Board Member; IHSSBCA Chair (2004-2014); PACE President (2016-2018)
User avatar
Captain Sinico
Auron
Posts: 2675
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Post by Captain Sinico »

The Henderson-Hasselbach equation gives the Bronstead-Lawrey acidity of a stationary buffered solution. While it can be used on human blood the same as any such fluid, it has nothing to do with blood flow, per se. I don't need to ask anyone to tell you that. Anyway, this prooves the point: you can't be very certain buzzing if you don't know much. Buzzing with "Henderson-Hasselbach equation" from "an equation and something about blood flow" is fine, but you can't be sure from that and shouldn't be. Therefore, a question that makes that buzz profitable probably is a bad question.
As for bad questions, I've never played you on bad questions, so I don't know how you'd play on them. I do know that I've observed you buzz what I would have to consider much too agressively, given what you purport to know, at Chicago Opens 2003-2005 (which, I think you will agree, were tournaments with excellent questions, written and edited by excellent authors.) Perhaps I'm wrong... but Connecticut comes to mind.
Also, the question isn't whether you can guess. You can always guess, given any question at all. The question is when is it profitable to guess; when is the best time for you to buzz? The answer is at the point when your expected points are optimal; when your utility is maximal.

MaS
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

I think we're talking at cross-purposes here:

I mispoke when I said "blood flow"...I should have just said "equation" and "blood"....it is actually extensively used in medicine.

the point is, I have buzzed off that clue and been right at least 50% of the time...which, considering my lack of science knowledge, makes it the right buzz at the right time...which was my ownly point -- that a lack of knowledge in an area combined with a fair amount of "qb memory" often leads to middle values for "p"

as for my negging propensities, they are well documented.
Susan
Forums Staff: Administrator
Posts: 1812
Joined: Fri Aug 15, 2003 12:43 am

Post by Susan »

Hey,

I got bored and generated the linear regressions (as described in Paul's initial post) for everyone at IO who scored over 10ppg.

Somewhere Among Us A Stone Is Taking Notes:
Andrew: y = 13.694x + 63.338
Paul: y = -12.739x + 19.732
Jerry: y = -34.395x + 45.478

Kentucky:
Seth: y = -69.066x + 99.342

DePauw:
Adam:y = -3.2805x + 17.771

Three Boys and a Goy:
Dan: y = -5.1887x + 18.094
Chris: y = -17.453x + 31.226
Loren: y = -9.9057x + 19.953
Victor: y = -6.1321x + 21.566

Christ vs. AZ
Ryan: y = -29.314x + 46.864
Matt: y = -19.256x + 62.436

Chicago B:
Bruce: y = -16.437x + 29.547
Harry: y = -3.3465x + 13.74

UIC:
Rom: y = -37.097x + 35.161

Chicago Zeus
Seth: y = -5.2029x + 52.445
Selene: y = -26.431x + 24.422
me: y = -15.869x + 29.958

Philosophers With Hammers
Andrew: y = -33.962x + 65.981
Chris: y = -56.604x + 57.302

Michigan:
Adam: y = -38.462x + 62.692
Will: y = -21.38x + 43.835
Dave: y = -3.3937x + 14.061
User avatar
Birdofredum Sawin
Rikku
Posts: 400
Joined: Tue Nov 09, 2004 11:25 pm
Location: Mountain View

Post by Birdofredum Sawin »

I haven't really been able to follow most of this discussion, but I have one question and one comment.

First, the question. Can somebody explain to a layman what these numbers mean? Does a "slope" of -60 mean you had an absolutely terrible tournament? Should one expect a number of players to have a positive slope? Long arrays of equations mean almost nothing to us humanities people, or at least to this humanities person.

Second, the comment. I think there's been some confusion about negging, and overstatement of the stigma that ought to be attached to negs. Let's say I'm playing Seth Teitler. The first sentence of a tossup indicates that the answer is going to be a female figure from Aztec myth. I don't actually "know" the answer, but I do know a) that Seth knows many more myth clues than I do and b) that I can only think of two or three possible answers anyway. Clearly, the smartest thing for me to do is buzz right in and say "Coatlicue." By the same token, if Seth is playing me and the first sentence of a tossup indicates that the answer is going to be a 20th century American philosopher, he would be well-advised to buzz in and say something like "Quine," knowing that he's almost certainly not going to get the question otherwise.

It's not enough to say that "good editors wouldn't let such questions through," because in fact they do. A number of editors are good enough to remove blatant giveaways from the opening clues of tossups, but very few editors are meticulous enough to edit every single tossup such that the first sentence doesn't contain tacit giveaways. Such clues pinpoint the time and place in which a person lived, for instance, making it plausible to guess "the only Japanese World War II admiral I can think of" if the first sentence allows you to deduce that the answer is a Japanese naval figure who was active in the 1930s. (The answer won't always be Yamamoto, as Illinois Open proved; but the buzz is still a sound one against a team with any history knowledge at all.)

Andrew
yoda4554
Rikku
Posts: 254
Joined: Thu Aug 11, 2005 8:17 pm

Post by yoda4554 »

The basic idea is to try to put a correspondence between the winning percentage of the opposing team and a player's expected personal score. People are using calculators to approximate actual data into a best-fit first-degree equation, where x is the winning percentage of the opposing team, y is the player's score according to the line.

So, the y-intercept (the number without a variable next to it) is the expected score against a team that lost every game. As has been said, in most cases a player is going to get fewer points as the opposing teams' winning percentage increases, so therefore most players will have a negative slope. The more negative the slope is, the more your score drops as your competition's ability increases.

How negative that slope is seems to be a function of the people you're playing with as much as anything, and to a lesser extent, your depth in comparison to breadth. For example, a very good generalist with weaker teammates (e.g. Seth Kendall), will likely get lots of points against the weakest team, as he'd be by far the most knowledgable player on most topics, and drop off as the competition gets harder, giving him a high y-intercept and a very negative slope. A player with very strong teammates probably will have a less negative slope, since he already has to face fairly difficult competition to get any tossups, even against a weak team. For that matter, a player with a lot of depth in one area and not much breadth will likely get tossups in that area against everyone and not much else, so his score won't change much even as the field gets stronger, giving him a slope close to zero.

As has been pointed out, though, even the best-fit line is pretty far from the actual data, since there aren't many data points and there can be significant variability across packets that would affect player performance, so these equations aren't really reliable. If question difficulty is constant, a positive slope should rarely happen, because that indicates that you do better against better teams. Then again, as people are discussing, strategy changes significantly against teams of different levels of quality, so that might be expected to happen for some people. But overall, I think there are too many situation-specific variables and not enough data from any one tournament for any of this to mean that much.
User avatar
recfreq
Wakka
Posts: 167
Joined: Mon Nov 07, 2005 10:11 pm
Location: Japan.

Post by recfreq »

(Well, I'm not a med student, but I'm in a neuroscience Ph. D. program, so I don't know if I'm totally qualified in saying that Henderson-Hasselbalch is just basically the definition of pKa for an acid rearranged. I think I've heard the blood clue once, but I'm not sure if it's really indicative of a good question, since you could apply H-H to any acid-base systems, like buffers. On the other hand, the blood clue does come up.)

According to my analysis earlier, which is a bit oversimplified, I actually think that, taking bonus conversion into account, the more knowledgeable player should actually buzz in earlier, as to maximize the opportunity to hear bonuses, which she should ace. The less knowledgeable player should wait until she is sure, b/c negging would imply that the better player will get to grab a bonus that she'd ace. Of course, this comes with all the assumption we mentioned earlier, but I just felt that the bonus conversion should not be neglected here. Of course, if you were playing in a TU-only tourney and you just care about pts scored, then you'd behave in the opposite way.

BTW I hope the equations I used weren't confusing. Just to clarify, I just wrote down the expectated value for pts scored in buzzing on a question: so you either neg, or you get right. If you neg, the other team gets it at the end (as assumed), and you lose -5-10-BC2 (other team scores same as you lost pts). If you get it right, then 10+BC1. So the expectation gives

p(10+BC1) - (1-p)(5+10+BC2),

and we buzz when this is > 0, assuming TUs are independent trials, and p percent of the time you get it right. Of course, Mike's analysis is much more comprehensive, taking future expected reward into account, but this simple formulation can be a starting pt for understanding the process involved.

Ray.
Ray Luo, UCLA.
User avatar
Matt Weiner
Sin
Posts: 8145
Joined: Fri Apr 11, 2003 8:34 pm
Location: Richmond, VA

Post by Matt Weiner »

Birdofredum Sawin wrote:Let's say I'm playing Seth Teitler. The first sentence of a tossup indicates that the answer is going to be a female figure from Aztec myth. I don't actually "know" the answer, but I do know a) that Seth knows many more myth clues than I do and b) that I can only think of two or three possible answers anyway. Clearly, the smartest thing for me to do is buzz right in and say "Coatlicue." By the same token, if Seth is playing me and the first sentence of a tossup indicates that the answer is going to be a 20th century American philosopher, he would be well-advised to buzz in and say something like "Quine," knowing that he's almost certainly not going to get the question otherwise.
Has anyone ever actually employed this strategy? Buzzing with "Quetzlcoatl" when it's, say, an NAQT Division II set that makes it clear an Aztec deity is being sought is more of a deduction based on the limitations of the difficulty level. At normal invitationals or above--perhaps even more so at poorly edited ones than well edited ones, since you never know what impossible questions might have been let in--you really can't predict that it won't be the long-awaited premier of a Centeotl tossup. The strategy of just randomly guessing seems like it would rarely work, subjectively, and while I've heard many people talk about it, I can't say I've ever seen it done on a consistent basis rather than on one or two questions as a joke. The probability that there will be a poorly placed buzzer race clue, or that your more knowledgeable opponent will have a moment and blank out after buzzing, seems greater than the probability of getting it right in this fashion and implies that one should wait.
ReinsteinD wrote:His novella The Day the Leader Was Killed recounts Anwar Sadat's assassination.
Isn't this the perfect example of a difficulty-based "guess" situation? Most tournaments have one and only one writer who discusses Egyptian history in his books within their answer space.
Matt Weiner
Advisor to Quizbowl at Virginia Commonwealth University / Founder of hsquizbowl.org
NotBhan
Rikku
Posts: 306
Joined: Tue Dec 16, 2003 12:30 pm
Location: Parts Unknown

Post by NotBhan »

Matt Weiner wrote:
Birdofredum Sawin wrote:Let's say I'm playing Seth Teitler. The first sentence of a tossup indicates that the answer is going to be a female figure from Aztec myth. I don't actually "know" the answer, but I do know a) that Seth knows many more myth clues than I do and b) that I can only think of two or three possible answers anyway. Clearly, the smartest thing for me to do is buzz right in and say "Coatlicue." By the same token, if Seth is playing me and the first sentence of a tossup indicates that the answer is going to be a 20th century American philosopher, he would be well-advised to buzz in and say something like "Quine," knowing that he's almost certainly not going to get the question otherwise.
Has anyone ever actually employed this strategy?
Sure. I know I've gambled like this on a few questions on myth or classical history when facing Seth Kendall, for instance, and I know other players (e.g. Freeburg) have gambled on science questions against me. I used that approach quite a bit on CS questions back when I knew nothing about CS. It pays off sometimes, doesn't pay off other times. But if I was far overmatched on some subject and the answer space seemed to be down to 3 or 4 options, I would take the gamble.

--Raj Dhuwalia
"Keep it civil, please." -- Matt Weiner, 6/7/05
Nathan
Lulu
Posts: 97
Joined: Tue Mar 02, 2004 11:42 am

Post by Nathan »

Matt: yup, people do use that strategy. actually, there are two different circumstances: one is the obvious one -- where you don't know the subject and your opponent does; the other is where you do know the subject but have been playing qb long enough to know that there's only a couple possible answers (if that)....such as Andrew's Yamamato example. another one would be any question that mentions the "Russian fleet" -- one should always immediately buzz with Tsushima Strait. of course, the Russian fleet was also at the Battle of Copenhagen -- probably by far the most important naval battle to not show up in qb as a tossup (at least, I've never heard one)....but, no one writes that tossup.
Locked