Sponsored by the Partnership for Academic Competition Excellence @PACENSC
Shcool wrote:Philosophically, I wonder if this is the best way to go.
...The fact that you are talking about a tiebreaker somewhat alleviates this...
Are you talking about 35 PPG over the course of an entire tournament that is true round-robin, or one that has several divisions of teams? If it is the former kind of tournament, then state so explicitly; if it is the latter one, then your argument holds little weight since common opponents must be factored into any metric attempting to break a tie between 2 teams with identical records. A 35 PPG differential means little, if anything, if the only common opponent between Team A and Team B is the other one. Team A may have played some of their 6 Divisional games against Middle School teams while Team B played games against Dorman B, Charter C, RM D, among others? 35 PPG more for Team A means very little if they finish with the same record as Team B, but lost to them head-to-head. If I am missing some piece of your argument, please clarify your post.
To continue with sports analogies, there is overwhelming evidence that the Patriots were the best team in the NFL last year. However, that does not mean that they should be considered the NFL Champions. Titles and playoff berths go to teams that earn them through criteria decided ahead of time, not to teams that prove themselves the greatest statistically.
I think that it is actually useful to determine how predictive any given stat is in the outcome of any given game (in order to better quantify "upsets" for instance). However, seeing what you are actually trying to do now, this seems to be a project reserved for a later time.Captain Scipio wrote:Dwight: I think you're misunderstanding the nature of what I'm proposing to do here. We don't want to compare W-L because that isn't a tiebreaker; only W-L against the same team. We can easily determine how predictive, for example, PPG differential is in the outcome of any game, but that isn't very useful because we can't make the same comparison to head-to-head unless in the case of a repeat matchup, which means we can't isolate the other factors (so no direct comparison can be made.) Only in the case of a repeat matchup can we isolate all the factors. Also, the fact that the tie-breakers are correlated isn't important; the proposed measurement measures only the differences between them.
Captain Scipio wrote:I'll add the caveat that I'm currently confused about one thing in these data: how can a team hold points per game but not point difference if they've played the same number of games? Perhaps I've misunderstood what Dwight meant by point differential; I took that to mean difference in total points scored. Dwight, please let me know what's up; I can update this easily to reflect whatever changes.
This is exactly what I meant, and exactly what I think that statistic means (which is why it would be useful as a tiebreaker).hwhite wrote:If Team A has 350 PPG and 275 PPGA while Team B has 300 PPG and 200 PPGA, then Team A has higher PPG while Team B has higher point differential. Usually that means that Team B is better at answering tossups (hence less chance for the opponent to score) but worse at bonuses (hence lower PPG).
cvdwightw wrote:Harry, can you elaborate about the margin of error? I think Mike is saying exactly that when he claims that no statistic significantly outperforms any other, though he hasn't quantified that significance/error.
Considering that this is just from one small, isolated circuit that doesn't run a lot of tournaments (as compared to, say, the Midwest), we should be able to find (hopefully) a near-equivalent amount of data from the Midwest, Northeast, Mid-Atlantic, and Southeast circuits. Plus, there's an entire high school circuit, if we can find small enough tournaments that run double RR or single RR + playoff brackets. I'd say it's feasible to get a sample size of ~200-250 games if we work at it and include anything between teams of the same record (not ideal, but hey, it's the best we can do if we're looking at 250 games).Schweizerkas wrote:Since error scales like 1/sqrt(n), this means we might need 6 times more data than we currently have. Whether that's feasible or not I don't know.
What does this even mean? All the proposed tiebreakers and combinations of tiebreakers hold the following: it is better to win a game than not, it is better to answer tossups than not, it is better to answer bonus parts than not. We're using West Coast data because I know where those stats are and no one else has volunteered data.theMoMA wrote:Moreover, they have the benefit of being both intuitive and positive. It makes a lot of sense that the better team will score more points against common opponents, or score more points per bonus on a differing schedule. Furthermore, it's a positive tiebreaker; you start from zero and go up, there is a goalpost out there, and once you pass it and another team doesn't, you win the tiebreaker. Which is more appealing, that a team should strive to score as many total points and as many points per bonus as possible, or that a team should be hope that their margin of victory in one game (or some amalgamation of all of the proposed tiebreakers that historically boosts correlation by X%) was good enough that results from 1994 Wahoo Wars combined with data from Tartan Tussle XX will indicate that they have a 2.5% better chance of winning a follow-up game?
theMoMA wrote:I don't think we can work with the assumption that data trends in past quizbowl matches necessarily predict game results of future matches. I'm unconvinced that the body of quizbowl match results as a whole to this point represent the expected outcomes of matches to come, and I certainly reject outright the idea that a data set that mixes non-common and common-opponent schedules, is heavily skewed towards sketchily edited west coast sets, TRASH regionals, and IS set tournaments, and has a whole bevy other other problems has any useful extrapolative value whatsoever. These data stem from activities whose commonality barely extends past the use of questions and buzzers. Who says that stirring up all of these (or any other concoction) yields something that will be predictive for future quizbowl as a whole, or more importantly, any individual tournament?
theMoMA wrote:I hold that there is a hefty burden that resides with those who advocate using past data to remake the tiebreaker system, and that that burden is to show that there is a predictive relationship between what has happened in the past and what will happen in the future. Unless someone can show that one tiebreaker stands above the rest regardless of the type of questions, level of competition, a team's slate of opponents, and a plethora of other variables, I don't think we can safely use this kind of data at all.
Andrew, unless I'm horribly mischaracterizing your argument, you are appearing to state that we cannot use the data that we have because it is not at all useful. Do you agree with the following method:theMoMA wrote:I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
theMoMA wrote:You have not addressed my concerns, and your statement about error bounds reflects a fundamental misunderstanding of what I'm saying. Your error bounds are useless outside of the data themselves. You've yet to show that these data have any value outside of themselves (ie, some kind of extraordinary power to predict future action), and until you do so, I will continue to reject what you're doing. I do hold that your data are useless, just like golf ball trajectory data are useless in determining who should win quizbowl tiebreakers. Until you show that the data are applicable to the situation at hand, I hold that we have no reason to assume that the data are valuable. When Dwight says "I argue that [feeding a bunch of data from past tournaments into a machine and coming up with a statistical tiebreaker] is independent of question quality and independent of strength of schedule, why on earth should we take him at face value? This is the major contention in using past data; you can't simply argue it away by putting "I argue" in front of an opinion.
Moreover, why would the burden be on me to get you data "to my liking"? I am the one making objections here; either find a way to counter them, find new data, or abandon your argument. Don't tell me that I have to counter my own argument for you. And stop mischaracterizing my argument. I am not opposed to looking at data, I am opposed to assuming that the data are useful in describing the situation at hand, which I find a hefty precondition to looking at the data.
I merely offer PPG and PPB as reasonable, intuitive, and positive. I am by no means saying that these are the only reasonable, intuitive, and positive tiebreakers that exist. The fact that some people see head-to-head as a legitimate tiebreaker doesn't do anything to my argument; those people can show up and convincingly justify their beliefs as such, which would only show that there can be more than one legitimate tiebreaker. Or they can be wrong. Neither of these possibilities undermines what I'm saying. I see no reason to accept the "other people believe differently and appeal to some of the same things you do, abandon your argument" argument.
It may very well be that the current mode of tiebreaking is an untested dogma, but you've got a responsibility to show that your test is actually the correct one. You haven't done anything to shift the burden back to me. Show that your data are meaningful, or be forced to submit to bottom-up instead of top-down tiebreakers.
Captain Scipio wrote:How hard would it be to make splits for, like, record or tournament type using your script?
T-test is used for sample means. Z-test is used for sample proportions. We're comparing proportions, not means. Really, the only criticisms that you can make are:cdcarter wrote:Dwight, Not to be a stats nitpicker, but you probably should be using a t-test here.
cvdwightw wrote:T-test is used for sample means. Z-test is used for sample proportions. We're comparing proportions, not means.cdcarter wrote:Dwight, Not to be a stats nitpicker, but you probably should be using a t-test here.
Users browsing this forum: No registered users and 0 guests