The Quizbowl Resource Center

Posted: **Wed Jan 27, 2010 1:32 pm**

naqt.com wrote:Last year, NAQT began soliciting community input on a revision of its S-value system for ranking teams that competed at its Sectional Championship Tournaments (SCTs) and Community College Sectional Championship Tournaments (CC SCTs). We've looked at a number of models, and are now presenting the one we think is most applicable for public comment.

source

Posted: **Wed Jan 27, 2010 2:05 pm**

Some quick comments:

1. We'd like to thank everyone who has commented on S-value reform, especially Dwight Wynne. Those who have been following the discussion will recognize that we used a number of elements Dwight proposed in this thread.

2. Ideally, we'd like to see comments and questions by tomorrow night so we can issue CCCT invitations as soon as possible. Obviously, if someone identifies a major flaw in this model, we will have to work out an alternate solution.

3. As the last note on the naqt.com page indicates, we haven't yet decided what to do about combined-field SCTs. The proposal on the table right now is a straightforward multiplier to each team's stats (improving them if a DII team played DI questions, reducing them if a DI team played DII questions). If anyone has a clever idea for how to resolve this with actual data, now would be a great time to propose something. (I'll warn everyone that actual data is very thin on the ground.)

4. This is a draft reformed S-value system for 2010 only. We believe that this system works well; it was chosen from the available options as a balanced combination of mathematical simplicity, public transparency, and (most important) a fair ranking of teams. However: it is entirely possible that some other system might do a better job of predicting ICT performance. To that end, we plan to collect a variety of proposed S-values and S-value-like-systems, including all of those discussed on this forum, and hold a competition between them in the summer of 2010. Objective: calculate a bunch of ranking systems, then compare them to the actual 2010 ICT and see which system actually does the best job of projecting ICT performance.

Edit: coherence

Posted: **Wed Jan 27, 2010 2:06 pm**

5. Sometime this afternoon R. will be posting a spreadsheet to naqt.com applying the draft system above to some past SCT data. (We've done this in-house for the 2003-2009 SCTs, so we thought we should share the numbers.)

Posted: **Wed Jan 27, 2010 2:36 pm**

bt_green_warbler wrote:To that end, we plan to collect a variety of proposed S-values and S-value-like-systems, including all of those discussed on this forum, and hold a competition between them in the summer of 2010. Objective: calculate a bunch of ranking systems, then compare them to the actual 2010 ICT and see which system actually does the best job of projecting ICT performance.

How are you proposing to account for teams whose SCT lineup differs from their ICT lineup? And teams using autobids for whom there are no SCT data? These are going to represent a non-trivial portion of the ICT field.

Posted: **Wed Jan 27, 2010 2:39 pm**

Obviously toss out the autobids because there's no base to project from. Probably some kind of adjustment for roster change (depends a lot on the exact quality of the change).

We've got a decade's worth of SCT and ICT data to work with, so there should be plenty of teams with consistent rosters to be confident in the quality of whatever we project.

Posted: **Wed Jan 27, 2010 3:07 pm**

Re: combined fields.

If you want to restrict the data you use only to NAQT tournaments, then it is possible you could do the following:

1) Take all combined DI/DII fields over all past SCTs that had both A) a team qualify for ICT on a set of questions not from their division and B) at least one team from part A had all people responsible for non-negligible (defined as you see appropriate) amounts scoring on the same team at both SCT and ICT.
2) Calculate the raw D-values of those teams (since the order-of-finish correction probably results in worse predictive value) . Call this actD
3) Calculate the raw D-values of each team that attended ICT that had all people responsible for non-negligible amounts of scoring on the same team at both SCT and ICT, other than those that qualified at combined fields. (Hopefully there are still around more than 5 or 6 in both DI and DII.)

To convert the D-values of DI teams that played DII SCTs:

4) Perform a regression with the DI teams' D-values from step 2 with respect to numerical order of finish. (Obviously, choose the most logical form of regression based on apparent shape of the data)
5) Find the expD = regression model's expected D-value of those DI teams that qualified at combined-field SCTs, given their final placing at ICT. If you're lucky, then there will be multiple such points.
6) Create adjD = expD/actD for each data point that you have.
7) Find AdjD = the mean adjD across all DI data points.
8) Find rawD for the combined field teams, perform the order-of finish correction, and then multiply that number by AdjD.

To convert the D-values of DII teams that played DI SCTs, just follow steps 4-8, but replace each instance of DI with DII.

Disclaimer: I have not actually looked to see if enough data points of the type required exist. Also, since I only spent an hour thinking about this, there may be flaws in the methodology itself. But, I hope it is a helpful suggestion.

Edit: The requirement A in step 1 was incorrect. It is now fixed.
Edit 2: On second look, both were incorrect.

Posted: **Wed Jan 27, 2010 3:23 pm**

I think this looks pretty good. I appreciate Dwight's proposal for adhering to some of the concepts I used in my initial S-Value revision proposal, while also ironing out the problems that my initial stat had. Particularly, the per-question calculations were a necessary adjustment.

As an aside, I strongly believe that NAQT should stop allowing mixed formats to be played on the DII set. It seems easier to eyeball good performance of a DII team on the DI set than the other way around. Also, knowledge scales down pretty well, but it doesn't always scale up, so you're in much more danger of inviting a DI team that looked good on DII questions but is overmatched at ICT than the other way around. Finally, it seems right to privilege the integrity of the DI field over that of the DII field. If teams must be invited to ICT based on translated stats, they should be DII teams.

Posted: **Wed Jan 27, 2010 3:24 pm**

evilmonkey wrote:Take all combined DI/DII fields over all past SCTs that A) had both a DI AND a DII team qualify for ICT and B) at least one qualifying team in each division had all people responsible for non-negligible (defined as you see appropriate) amounts scoring on the same team at both SCT and ICT.

...

Disclaimer: I have not actually looked to see if enough data points of the type required exist.

I did something roughly similar to this last week and ran into a wall because there's not enough data.

In the last five years, exactly three teams have attended ICT with the same roster that they qualified for from a combined field: 2007 Stanford DII, 2005 LSU and Rice.

That seemed like not enough stats to actually compute anything useful.

Posted: **Wed Jan 27, 2010 3:28 pm**

theMoMA wrote:It seems easier to eyeball good performance of a DII team on the DI set than the other way around.

This seems intuitively correct to me; I'm not, however, sure that this effect is worth the obvious downside. That is: that the DI set may do a worse job of ranking averageish DII teams. (Consider the case of a site with six DII teams and one DI team.)

Edit: note that the existing policy minimizes the number of teams that play the wrong set, and that the usual case for combined fields is a small number of DI teams outnumbered by DII teams.

Posted: **Wed Jan 27, 2010 3:31 pm**

I don't believe that accurately ranking the D2 field should be as big a concern as accurately ranking the D1 field. Let's be honest here, one field doesn't have anywhere near as many implications for true national championships as another.

Posted: **Wed Jan 27, 2010 3:39 pm**

Jeremy Gibbs Free Energy wrote:I don't believe that accurately ranking the D2 field should be as big a concern as accurately ranking the D1 field.

Certainly if we had to choose just one of them, we would choose DI. The question at stake here is: how many DII teams should we be willing to to rank less-accurately to ensure how much extra precision in the DI rankings?

Posted: **Wed Jan 27, 2010 3:49 pm**

Why not have a system where any DII teams in a combined field either take a written team test, or play one DII packet against empty chairs? It wouldn't have to be part of the formula, but you could use it to eyeball rough translations between DII skill and performance on the DI set, and also to correct any terrible outliers.

Posted: **Wed Jan 27, 2010 3:53 pm**

bt_green_warbler wrote:
Jeremy Gibbs Free Energy wrote:I don't believe that accurately ranking the D2 field should be as big a concern as accurately ranking the D1 field.
Certainly if we had to choose just one of them, we would choose DI. The question at stake here is: how many DII teams should we be willing to to rank less-accurately to ensure how much extra precision in the DI rankings?

Jeff, I think you're missing a key point here, which is that the vast majority of D2 teams that just miss out due to poorer accuracy are composed of younger players and retain D2 eligibility - that is, assuming any reasonable improvement, they should have a relatively good chance of playing D2 ICT next year. On the other hand, many players in D1 don't have a "next year" - they're grad students or seniors that will be graduating at the end of the year.

Posted: **Wed Jan 27, 2010 3:57 pm**

That's an excellent point, Dwight. I'll run it past R.

Posted: **Wed Jan 27, 2010 4:01 pm**

bt_green_warbler wrote:
evilmonkey wrote:Take all combined DI/DII fields over all past SCTs that A) had both a DI AND a DII team qualify for ICT and B) at least one qualifying team in each division had all people responsible for non-negligible (defined as you see appropriate) amounts scoring on the same team at both SCT and ICT.

...

Disclaimer: I have not actually looked to see if enough data points of the type required exist.
I did something roughly similar to this last week and ran into a wall because there's not enough data.

In the last five years, exactly three teams have attended ICT with the same roster that they qualified for from a combined field: 2007 Stanford DII, 2005 LSU and Rice.

That seemed like not enough stats to actually compute anything useful.

Well, running the numbers might give you a good ballpark figure for the multiplier, at least.

Also, perhaps use non-NAQT tournaments to fill in for lack of SCT data? You could:
Lets say there is a team that attended a combined field SCT, and attended ICT with the same lineup as they attended a tournament x.
Calculate the D-values for all "teams that played x with roughly the same lineup as they played SCT and ICT" using stats from x.
Find the relationship between SCT D-values and Tournament x D-values.
Find the relationship between Tournament x D-values and ICT order of finish.

Perhaps this would give more data points? I would suggest past ACF Regionals or Nationals as likely "Tournament X's".

Obviously, this also would require some digging to ascertain - perhaps i'll try this later tonight.

Posted: **Wed Jan 27, 2010 4:07 pm**

bt_green_warbler wrote:This seems intuitively correct to me; I'm not, however, sure that this effect is worth the obvious downside. That is: that the DI set may do a worse job of ranking averageish DII teams. (Consider the case of a site with six DII teams and one DI team.)

Do averageish DII teams qualify to DII ICT? Do the average DII teams from a field from some region new enough to quizbowl or experiencing enough of a down year to require a combined field qualify for DII ICT? If not, then mis-ranking them is little sin.

Also note that the amount of scaling down and scaling up is much larger when we have DI teams play the DII SCT. In the base case, teams playing a set of level 3 qualify for a set of level 5, and teams playing a set of level 6 qualify for one of level 7. In the "mixed field plays DII SCT" case, teams playing a set of level 3 qualify for a set of levels 5 and 7. In the "mixed field plays DI SCT" case, teams playing a set of level 6 qualify for a set of levels 5 and 7. I think the hazard of misranking the DI teams who will be able to compete at ICT is therefore much greater.

Posted: **Wed Jan 27, 2010 4:17 pm**

evilmonkey wrote:Also, perhaps use non-NAQT tournaments to fill in for lack of SCT data?

Something like this is an excellent idea. You wouldn't actually need combined fields at all, since all we're trying to do is quantify how much harder DI SCT is than DII SCT. Therefore, you would compare a bunch of teams that that played SCT in either division to a third tournament that attracted teams from both divisions. (ACF Fall? EFT?) Granted, you'd still have to factor out an unknown quantity for the different distribution...

Posted: **Wed Jan 27, 2010 4:21 pm**

Crazy Andy Watkins wrote:Do averageish DII teams qualify to DII ICT? Do the average DII teams from a field from some region new enough to quizbowl or experiencing enough of a down year to require a combined field qualify for DII ICT? If not, then mis-ranking them is little sin.

Also note that the amount of scaling down and scaling up is much larger when we have DI teams play the DII SCT. In the base case, teams playing a set of level 3 qualify for a set of level 5, and teams playing a set of level 6 qualify for one of level 7. In the "mixed field plays DII SCT" case, teams playing a set of level 3 qualify for a set of levels 5 and 7. In the "mixed field plays DI SCT" case, teams playing a set of level 6 qualify for a set of levels 5 and 7. I think the hazard of misranking the DI teams who will be able to compete at ICT is therefore much greater.

Whether combined-field teams end up on the waitlist bubble probably depends on the actual strength of the Division II field, and I'm not thrilled with trying to predict that into the future (especially for variable numbers of new quizbowl programs).

We don't assume that our difficulty codes scale linearly with actual question difficulty; if we thought they did, we would just use them directly to compute this multiplier. But it's probably true that they don't; that is, the DI SCT --> DI ICT transition might be objectively harder than DII SCT --> DI SCT (there just isn't another set in between like HSNCT to make us use a different number).

Posted: **Wed Jan 27, 2010 9:24 pm**

I've got a question on the order-of-finish calculation: Team A 9-3, 13ppt, 290 D. Team B 9-3, 14ppt, 280 D. Teams are tied but not for a trophy/title, and so in official standings, Team B is ranked higher than Team A (because of PPT). Would that necessitate the OOFA?

I had some more complicated scenarios, but in all likelihood, they probably reduce to how that discrepancy is settled. I would suggest that to avoid any ambiguity and/or strange behavior, in such a case rank tied teams by D-value rather than PPT before applying the OOFA. Also, to clear up, teams are always ranked on all standard (i.e. scheduled, non-tiebreaker) games, like they are at ICT, correct?

The one thing I worry about such a system is that it might cause bad behavior in the case of fields that are split for playoffs, where one team is crushed by the higher bracket and the other destroys the lower bracket. Most likely, the SOS adjustment fixes that, but I'm not certain.

Next, for the SOS calculation, "the field as a whole" means nationwide/worldwide, correct?

Posted: **Thu Jan 28, 2010 1:16 am**

jonpin wrote:Next, for the SOS calculation, "the field as a whole" means nationwide/worldwide, correct?

Correct.

jonpin wrote:I've got a question on the order-of-finish calculation: Team A 9-3, 13ppt, 290 D. Team B 9-3, 14ppt, 280 D. Teams are tied but not for a trophy/title, and so in official standings, Team B is ranked higher than Team A (because of PPT). Would that necessitate the OOFA?

I had some more complicated scenarios, but in all likelihood, they probably reduce to how that discrepancy is settled. I would suggest that to avoid any ambiguity and/or strange behavior, in such a case rank tied teams by D-value rather than PPT before applying the OOFA. Also, to clear up, teams are always ranked on all standard (i.e. scheduled, non-tiebreaker) games, like they are at ICT, correct?

I believe that (as you suggest) the D-value itself is used to rank teams with identical records for non-trophy/title games. I'll ask R. to clarify this in the final version.

jonpin wrote:The one thing I worry about such a system is that it might cause bad behavior in the case of fields that are split for playoffs, where one team is crushed by the higher bracket and the other destroys the lower bracket. Most likely, the SOS adjustment fixes that, but I'm not certain.

The SOS adjustment is intended to correct for this sort of thing. Did you have a specific kind of "bad behavior" in mind?

Posted: **Thu Jan 28, 2010 1:35 am**

R. confirms that teams with identical record will be ranked by D-value (rather than applying other paper tiebreakers).

jonpin wrote:Also, to clear up, teams are always ranked on all standard (i.e. scheduled, non-tiebreaker) games, like they are at ICT, correct?

Unlike the ICT situation, we will use data from tiebreaker games in the D-value model.

Posted: **Thu Jan 28, 2010 2:07 am**

Tossup points per tossup heard. This is found for a team by computing each game's tossup points per tossup heard and averaging those values.

Is there a particular reason why this is not "Total tossup points / Total tossups heard"? In timed rounds, this can be different:

Game 1: 300 points on 10 tossups heard. PPTUH = 30
Game 2: 300 points on 20 tossups heard. PPTUH = 15
Current NAQT PPTUH: (30+15)/2 = 22.5
Sum Points / Sum Heard = 600 / 30 = 20

Posted: **Thu Jan 28, 2010 2:23 am**

dschafer wrote:Is there a particular reason why this is not "Total tossup points / Total tossups heard"?

Precisely for the reason you mention. The other option weights the points per tossup heard by round length, which is equivalent to weighting by game speed. That doesn't really seem justified. This part of the D-value is a stand-in for the team's overall ability to score on tossups, and having a slow moderator does not make a team better at buzzing.

Edit: A slow moderator might indeed make a team better at scoring on tossups, but that's not the point here.

Posted: **Thu Jan 28, 2010 2:26 am**

Exactly what Avram just said. An earlier draft did use total tossup points/total tossups heard, and R. pointed out to me that this was generating more variation than we wanted (the final value was fluctuating depending on whether or not a team's games against its strongest opponents also featured the fastest moderators).

Posted: **Fri Jan 29, 2010 5:41 pm**

In the absence of major changes identified in this discussion, we've gone ahead and produced CCCT invitations using this system.

Posted: **Fri Jan 29, 2010 7:09 pm**

Commentary on the first actual application of the D-value
The first fourteen bids (plus the automatic to Lamar-Orange) cannot be questioned. Looking at the teams with values 15th-26th, only nine of them can qualify. None below this region, with the possible exception of Coffeyville, have a solid argument that they were deserving of a bid, with Coffeyville having a raw score of 136.

The top Alabama block (four teams adjusted to 161) illustrates one potential concern for the D-value. After the preliminaries*, Gadsden did not play any further games, and their performance caused the D-values of teams above them to be raised. However, all of those teams likely deserved a bid anyway, and in a 4-year SCT, Gadsden would in fact have played later games, so this is not a big problem. Put all four of those teams in the field. Also put NE Alabama B in. This leaves us with 7 teams for 4 spots.

The ensuing situation is, as was warned, the classic S-value dilemma. Gulf Coast had better statistics than Lake Sumter, North Florida, or South Georgia, but worse stats than Manatee. Meanwhile, Lake Sumter was not just above Manatee, but in a higher playoff bracket.# While the new rules mathematically prevent any team with a lower record advancing before a team with a higher record at the same site, the previous rules permitted such lapping, but not when the passing team was in a lower bracket than the passed team. If Manatee is to qualify (and they had a higher raw D-value than any of the Alabama teams previously under consideration), Lake Sumter must go in, leaving only two more spots available.

Even though SC-M put up much better stats against a similar schedule to NFla A and SGeo, it would be hard under the old system and not possible under the new system to not qualify those other two, but each of them was ahead of SC-M on the basis of a ten-point win (South Georgia's even appears to have been in overtime).

One potential concern I have is if Pasco's raw D-value fell from 138 to 128, that would damage the adjusted D-values of several teams above them. I'm not quite sure how to fix this.

Gulf Coast had a better raw value than Sumter, NFla A and SGeo, as well as the just-missed Pasco A, but they lost twice to a statistically weaker team and that appears to be what put them out.

I do wonder: Under the old S-value, where passing was possible, would Gulf Coast have qualified at North Florida or South Georgia's expense, while Manatee would still have qualified? Would Pasco be the last team in?
By my figuring, there would be a handful of teams on the bubble with two spots left: NFla A and SGeo (with SC-M behind them already in) and Pasco (also behind those teams); Pensacola and Gulf Coast (the latter with better stats); Cowley and Coffeyville (the latter with better stats, but also behind another weaker team).

*-The Alabama format appears to be a 2x7 RR where you play a team from the opposite pool on your bye, followed by the top four from each group crossing over. The bottom three teams from each group did not compete in playoffs.

#-The South Florida format appears to have been a 2x8 RR followed by a split into 1st-3rd (crossover), 4th-6th (crossover), 7th-8th (full RR) for three more games. Thus 1-6 in the final standings were in the top playoff group, with 7-12 in the next group.

Posted: **Fri Jan 29, 2010 10:36 pm**

Jon's up to something cool here. I think it might be illuminating to run the same stats in the old S-value program and compare the results.

Posted: **Sat Jan 30, 2010 2:05 am**

I'm only an observer, so maybe my opinion doesn't count, but I don't really like the ironclad order-of-finish requirement for inviting teams. Even if the objectively (statistically) second-best team wins a fluke final match against the best team, the preponderance of the evidence still suggests that the statistically top team will perform better at the next-level tournament (ICT or whatever) than the other team.

Consider the hypothetical situation where the statistically best team is noticeably better than the second in PPTH, PPB, etc, but lost a fluke match to the third-place team by one question. The top two teams have one loss each because the second-best team lost by 150 to the best team. Thus, there is no advantaged final to protect the best team, which loses by one question to the statistically second-best team.

I don't necessarily have an exact idea for implementation, but I would wish for a way to make the S- or D-value reflect a balance between the order-of-finish and the objective statistics in cases where those two measures conflict.

Posted: **Sat Jan 30, 2010 2:12 am**

bt_green_warbler wrote:5. Sometime this afternoon R. will be posting a spreadsheet to naqt.com applying the draft system above to some past SCT data. (We've done this in-house for the 2003-2009 SCTs, so we thought we should share the numbers.)

Did this happen?

Posted: **Sat Jan 30, 2010 2:18 am**

Frater Taciturnus wrote:
bt_green_warbler wrote:5. Sometime this afternoon R. will be posting a spreadsheet to naqt.com applying the draft system above to some past SCT data. (We've done this in-house for the 2003-2009 SCTs, so we thought we should share the numbers.)
Did this happen?

Not yet. I'd post the draft I sent to R., but it doesn't contain the game-by-game adjustment described above in this thread.

Posted: **Sat Jan 30, 2010 2:24 am**

Sun Devil Student wrote:I'm only an observer, so maybe my opinion doesn't count, but I don't really like the ironclad order-of-finish requirement for inviting teams. Even if the objectively (statistically) second-best team wins a fluke final match against the best team, the preponderance of the evidence still suggests that the statistically top team will perform better at the next-level tournament (ICT or whatever) than the other team.

This was a difficult decision for us (and one on which there was no clear consensus among the community-note this page). There are definitely good arguments on both sides of this issue.

Posted: **Mon Feb 01, 2010 1:15 pm**

Frater Taciturnus wrote:
bt_green_warbler wrote:5. Sometime this afternoon R. will be posting a spreadsheet to naqt.com applying the draft system above to some past SCT data. (We've done this in-house for the 2003-2009 SCTs, so we thought we should share the numbers.)
Did this happen?

D-values for the 2009 SCT are now available on naqt.com. Sorry for the delay.

The Quizbowl Resource Center

Comment on the draft 2010 S-value here

Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here

Re: Comment on the draft 2010 S-value here