The D-value SOS calculation is broken

Old college threads.
Locked
User avatar
ryanrosenberg
Auron
Posts: 1890
Joined: Thu May 05, 2011 5:48 pm
Location: Palo Alto, California

The D-value SOS calculation is broken

Post by ryanrosenberg »

Last Saturday, DePaul played the Missouri SCT site, a combined D1/D2 field using the D2 set. Of DePaul's opponents, only WUSTL A and C averaged 20 PPB on the D2 set, with over half coming in under 15 PPB.

And what did this leave DePaul with? The second-highest strength of schedule (SOS), not in the Missouri field, but of every team that played SCT. This is due to a confluence of three serious flaws with the SOS calculation, which I'll lay out below.

To review, SOS is calculated as tossup points per tossup heard (TUPPTH) of the teams you played in their other games, divided by the tossup points per tossup heard over all SCT sites.

1. Tossup points per tossup heard is a heavily field-dependent measure of team strength. The current SOS calculation cannot differentiate reasonably strong teams in good fields from weak teams in weak fields. Looking at the current D-values list, Chicago D (15.69 PPB) has a worse TUPPTH than Ohio State C, Vanderbilt A, and Colorado A (10.61, 10.00, and 8.41 PPB, respectively). So playing Chicago D, a respectable opponent by any standard, lowers the SOS of Chicago D's opponents more than if they had hypothetically been able to play OSU C or Vandy A. That double-penalizes teams at the Northwestern SCT -- not only do they have to face a fairly strong team as their weakest opponent, but their SOS takes a huge hit for it.

2. The effect of point 1 is exacerbated in round-robin scheduling. Let's take the example of two four-team round robin tournaments. The first has four strong teams in the 17-19 PPB range. The second has two of those teams, and then two teams in the 10-12 PPB range. In the first tournament, each team will get roughly half the tossups per round, so each team's opponent's TUPPTH will be ~5.00 (assuming as many powers as negs). In the second tournament, the two strong teams will get about 80% of the tossups against the two weak teams and 50% against each other. The weak teams will get 20% against strong teams and 50% against each other. So a strong team will have played one strong team (~7.72 TUPPTH in other games) and two weak teams (~3.64 TUPPTH in other games), for a SOS of ~5.00. So a strong team gets the benefit of beating up weak teams without any hit to SOS! This effect would be further exacerbated by a final between the two strong teams, which will boost their SOS without dramatically reducing their TUPPTH.

More generally, in a round robin, the number of tossup points scored by your opponents in other games is equal to {(10 x Tossup Conversion Rate) + (5 x Power Rate) - (5 x Neg Rate)}. This measure doesn't really vary too much from site to site, since even in games between relatively weak teams, almost all tossups are still converted, and power rates aren't significantly lower than games between good teams.

3. There is no D2 conversion for opponents' TUPPTH in the SOS calculation. Following from the last sentence of point 2, the tossup conversion rate in combined fields is artificially raised, since a D1 team's opponents are being measured on their ability to convert D2 tossups rather than D1 tossups. It seems fairly clear that if you had forced the Missouri site to play on D1 questions, many more tossups would have gone dead, and the SOS of all teams would be much lower. However, teams at all non-combined sites are being judged on their ability to convert D1 questions, so comparing those two measures seems illogical.


How should NAQT fix the strength of schedule calculation for future years? Use points per tossup heard rather than tossup points per tossup heard, which will incorporate a non-competitive measure of team strength (bonus conversion) and adjust for the strength good teams in very competitive fields. Additionally, NAQT should apply a D2 conversion factor to the SOS of combined fields to avoid comparing field strengths on two very different sets.
Ryan Rosenberg
North Carolina '16
ACF
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: The D-value SOS calculation is broken

Post by theMoMA »

It might be useful to share a bit of data from our statistical survey of D values and ICT performance. Using the new D value calculation, the historical r^2 between D value and ICT performance for teams maintaining substantial roster continuity between the tournaments is about 0.75 (roughly speaking, this means that about 75% of a team's ICT performance is predicted by the team's D value). When the SoS is removed from D value, the r^2 drops to about 0.62, so that just over 62% of ICT performance is predicted by the team's D value. SoS improves the predictiveness of D value by similar amounts regardless of whether you use old or new D value, or whether you look at ICT prelims or overall scoring.

I certainly credit the idea that the SoS measure could stand to improve (and note that, if D value suffers these issues, then it's something ACF will have to look at for A value as well, as the two are almost exactly equivalent as to how they calculate strength of schedule; both have used tossup points per tossup heard from the beginning). The numbers do show, however, that SoS in its current form has a large positive impact on the ability of D value to predict ICT performance.

The DII SoS certainly could be calculated in the way that Ryan suggests, but this would mean that the translation coefficients for tossup and bonus points would have to be raised, because they are currently calibrated for untranslated SoS. This may prove to be an even more accurate way to generate D values from DI teams playing on the DII set, and I'm glad that Ryan suggested it so we can look into those numbers, but I do want to point out that the DII translations are indeed calibrated based on past ICT performances, and so teams playing on the DII set do not have an unfair leg up. In other words, if the DII SoS measure were calculated the way that Ryan suggests, the resulting necessary increases to the translation coefficients for tossup and bonus points would result, on the whole, in similar final D values for DI teams that played the DII set.

As for the larger picture, it certainly appears that the current calculation can produce unexpected and likely inaccurate SoS numbers for a few individual teams for the reasons that Ryan suggested, and as a result, I would like to investigate the effect of changing SoS to be based on a holistic team performance instead of just tossup performance. (I hope that our changes this year, which were largely intended to improve D value's performance for a small subset of teams that play DII or at very weak fields, demonstrate our commitment to making D value as accurate as it can be, even if the number of affected teams is very small.) But I'd also like to point out that the SoS is, on the whole, not "broken" in its present form; it is a large net positive for the accuracy of D value, improving D value's predictive ability by nearly 15%. To be totally clear, this is no reason to keep things exactly the way they are, and it's very possible that D value's predictive ability would be even better with a tweaked SoS calculation. But it is a reason to be confident that, in its present form, D value (and A value) is doing a very good job inviting the correct teams.
Andrew Hart
Minnesota alum
User avatar
Judson Laipply
Rikku
Posts: 492
Joined: Sat May 05, 2007 10:02 pm
Location: Bucyrus, Ohio

Re: The D-value SOS calculation is broken

Post by Judson Laipply »

theMoMA wrote:It might be useful to share a bit of data from our statistical survey of D values and ICT performance. Using the new D value calculation, the historical r^2 between D value and ICT performance for teams maintaining substantial roster continuity between the tournaments is about 0.75 (roughly speaking, this means that about 75% of a team's ICT performance is predicted by the team's D value). When the SoS is removed from D value, the r^2 drops to about 0.62, so that just over 62% of ICT performance is predicted by the team's D value. SoS improves the predictiveness of D value by similar amounts regardless of whether you use old or new D value, or whether you look at ICT prelims or overall scoring.


Can you check the correlation with ACF Nationals performance? I know the formats aren't all that similar but teams that are good at one tend to be good at the other and ACF Nationals has not recently suffered from cutting out top 25 teams (according to polls) from attending its tournament which would be a large confounding variable in that r^2 value.
James L.
Kellenberg '10
UPenn '14
UChicago '20
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: The D-value SOS calculation is broken

Post by Fado Alexandrino »

What about looking at SCT ppb only? I've strongly suspected this for years now while looking at recently underranked teams like Chicago B.
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
User avatar
Judson Laipply
Rikku
Posts: 492
Joined: Sat May 05, 2007 10:02 pm
Location: Bucyrus, Ohio

Re: The D-value SOS calculation is broken

Post by Judson Laipply »

Aaron Manby (ironmaster) wrote:What about looking at SCT ppb only? I've strongly suspected this for years now while looking at recently underranked teams like Chicago B.
PPB should definitely be more strongly included in the D-Value as it is a direct measure of team strength (albeit scattered by how strong the bonus rollercoaster is). However, converting tossups is also a skill.

One idea that I liked is to base SOS on field power numbers and PPB weighted in some empirically valid way.


EDIT: Accidentally a few words
James L.
Kellenberg '10
UPenn '14
UChicago '20
User avatar
naan/steak-holding toll
Auron
Posts: 2515
Joined: Mon Feb 28, 2011 11:53 pm
Location: New York, NY

Re: The D-value SOS calculation is broken

Post by naan/steak-holding toll »

Aaron Manby (ironmaster) wrote:What about looking at SCT ppb only? I've strongly suspected this for years now while looking at recently underranked teams like Chicago B.
PPB is a good but not perfect measure because teams are often able to outperform their PPB by nailing a few categories and playing strategically elsewhere to get to the crucial tossup plurality / majority. Examples of this might include the Ike-led Illinois team circa 2012 (and arguably 2013 as well) and the Myers-led MSU team's penchant for pulling off upsets this year.

Also, SCT PPB can get pretty heavily distorted by how well you cover categories less common in mACF tournaments. This is why I think most teams' PPB on SCT is equal tracks pretty well with their mACF PPB, except upper level teams (who tend to be more outsized in skill at arts and other categories more emphasized in mACF) and those reliant on a single generalist, with CE/geo master Jakob Myers again being an exception.
Will Alston
Dartmouth College '16
Columbia Business School '21
User avatar
heterodyne
Rikku
Posts: 427
Joined: Tue Jun 26, 2012 9:47 am

Re: The D-value SOS calculation is broken

Post by heterodyne »

As James gestured towards, any correlational argument does seem to fail to account for teams that got prevented from qualifying by this problem - and given the relative difficulty of qualifying via D-value in previous years, I can't imagine the effect is insignificant.
Alston [Montgomery] Boyd
Bloomington High School '15
UChicago '19
UChicago Divinity '21
they
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: The D-value SOS calculation is broken

Post by theMoMA »

heterodyne wrote:As James gestured towards, any correlational argument does seem to fail to account for teams that got prevented from qualifying by this problem - and given the relative difficulty of qualifying via D-value in previous years, I can't imagine the effect is insignificant.
For teams that "comfortably" qualified (teams with new D values of 350 or greater--no teams of this strength are going to be left out of the ICT sample by virtue of failing to qualify), adding the SoS factor doubles the predictiveness of D value with respect to either ICT overall pp20tuh (from an r^2 of 0.29 to 0.58) or ICT prelims pp20tuh (0.32 to 0.71). (Note that r^2 for smaller samples, such as high-D-value teams, tends to be lower than the r^2 of the overall series.)
Andrew Hart
Minnesota alum
User avatar
theMoMA
Forums Staff: Administrator
Posts: 5993
Joined: Mon Oct 23, 2006 2:00 am

Re: The D-value SOS calculation is broken

Post by theMoMA »

Disclaimer: this is a post containing my personal reflections as someone who has been involved in the development of D value and A value and not an official statement of either NAQT or ACF.

As a community, we've developed the D/A value model and, over time, incrementally tweaked it to better pick out the most qualified teams for ICT and ACF Nationals. NAQT used to invite teams based on S value, which was more opaque than D value and had various other flaws, and took up D value after asking for community input on a new, "open-source" method for inviting teams to ICT. (Another disclaimer: D value, whose name honors Dwight Wynne, was based on his substantial improvements to a basic framework I devised.) Later, ACF needed a qualification procedure when it moved to an invitation-based Nationals, and adopted the framework of D value with a couple of tweaks to the SoS to alleviate weak-field issues; this became A value (a name that, sadly, does not honor yours truly), which to my knowledge ACF hasn't changed since adopting. This year, NAQT changed D value to alleviate weak-field issues and better assess the performances of teams playing on the DII set. That brings us to the present state of these statistics, which are now almost entirely identical. While the organizations' goals and requirements have not always been the same, the resulting statistics have.

There are two ways to go from here. The first is, as NAQT has recently done, to analyze the data we have to make incremental changes to D/A value that will improve their predictiveness. For instance, now that Ryan has identified a SoS issue with D/A value that appears to have both empirical and conceptual validity, at least with respect to a few teams, we can do a statistical survey to see whether tweaks to remedy that issue would have a positive impact on the predictive ability of D/A value. This is what I'd like to do with D value/ICT data before next year's SCTs.

The second path is to look for a radically different model for comparing teams that can accommodate the necessary factors of team performance, field strength, and, for NAQT's purposes, combined field translations. Joe's points-per-bonus suggestion above is an example of this approach (though I suspect that, because tossup performance is the key skill of quizbowl--previous work I've done suggests that tossup performance is about 13 times more predictive of a team's chance of winning than bonus performance--this would not be a particularly fruitful path to go down).

To put it in metaphorical terms, we've built a prediction engine out of various moving parts: tossup performance, bonus performance, strength of schedule, and (for D value but not A value) DII translations. We can either decide to tool up various parts of the current engine, or to build a new one from scratch.

Either way, the data that you'd need to analyze (historical SCT/ICT and Regionals/Nationals results) is entirely public. I enjoy working on these projects, but I want to be totally clear about my own conception of the goal in doing so: I think we should improve the current model rather than devise a new one. To the extent I've worked with the data, they suggest that D value does a very good job at picking out the most qualified teams for ICT. I also think that, with minor changes, it could possibly do an even better job for the subset of teams affected by possible SoS deficiencies that Ryan pointed out, and possibly after other tweaks we haven't yet envisioned, but I'm not comfortable saying so definitively until I can look at the data. I haven't worked on similar projects with A value (I would be happy to do so if ACF were interested in looking into improvements to A value in the future), but my unsupported intuition is that A value works for the same reason D value works, and that it generally does a very good job (that could perhaps be even better with SoS tweaks or other improvements we haven't foreseen).

I say all this to make this point: just because my sense is that D/A value don't need a major overhaul, and just because my work in this area is focused on optimizing the current model rather than creating a new one, doesn't mean that others can't or shouldn't look for new ways to tackle the same problem. It also doesn't mean that I should be the only person looking for ways to improve the current model (though, like I said, I enjoy doing so, and am happy to follow up on suggestions that people have, so this is definitely not a message that "you fix it yourself or nothing gets done"). The data are out there for anyone to use, and I'm interested to see what people can do with it.
Andrew Hart
Minnesota alum
jonah
Auron
Posts: 2383
Joined: Thu Jul 20, 2006 5:51 pm
Location: Chicago

Re: The D-value SOS calculation is broken

Post by jonah »

I'll add that anyone on Andrew's quest (or something similar) who wants data from NAQT's public website in a more convenient format is welcome to contact me ([email protected]) and I'll make reasonable efforts to provide it. (It might have to wait until the summer, but I'll do what I can.)
Jonah Greenthal
National Academic Quiz Tournaments
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: The D-value SOS calculation is broken

Post by Fado Alexandrino »

I was exaggerating when I said ppb meant everything (not looking at you Fred), but it definitely should be the starting point for future improvements of the system.

I did a quick excel calculation with 2017 and these teams that had identical enough rosters between SCT and ICT: Stanford, Berkeley A, Northwestern, Toronto, Duke, Berkeley B, McGill, Chicago B, NYU, MIT, Amherst, UCSD, Kenyon, Louisville, Missouri. I took their SCT PPB, their D-value order of finish and their ICT order of finish. The Spearman rank correlation was 0.82 for SCT PPB and ICT finish, while it was 0.75 for D-value order of finish and ICT finish. Part of the 2017 wonkniess probably has to do with Duke's low finish, McGill's high finish, and Missouri's high finish. Deleting just Duke gets you 0.92 for PPB and 0.89 for D-value. For 2016, this correlation increases to 0.93 for PPB and 0.87 for D-value.

I think an easy fix for D/A value would be to optimize weights for all the parameters - all parameters in the SOS calculation, ppb, pptuh, etc. Two things that could also be helpful would be avg. opponent ppb, as the SOS as mentioned by Ryan is not perfect, and power rate. Completely anecdotally, my D2 ICT saw Columbia come 9th after being placed in a circle of death with us and Harvard but was 6th in bonus conversion. They had an anomalously high SCT power percentage for their D-value rank.
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
User avatar
ryanrosenberg
Auron
Posts: 1890
Joined: Thu May 05, 2011 5:48 pm
Location: Palo Alto, California

Re: The D-value SOS calculation is broken

Post by ryanrosenberg »

Is NAQT planning to make changes to the D-value calculation for this year's SCT?
Ryan Rosenberg
North Carolina '16
ACF
User avatar
ThisIsMyUsername
Auron
Posts: 1005
Joined: Wed Jul 15, 2009 11:36 am
Location: New York, NY

Re: The D-value SOS calculation is broken

Post by ThisIsMyUsername »

Nearly a year has passed since Ryan's original post in this thread, and this year's SCT is fast approaching. Have any steps been taken to tweak the strength-of-schedule metric in the D-value calculation, in response to the flaws that Ryan demonstrated?
John Lawrence
Yale University '12
King's College London '13
University of Chicago '20

“I am not absentminded. It is the presence of mind that makes me unaware of everything else.” - G.K. Chesterton
User avatar
Judson Laipply
Rikku
Posts: 492
Joined: Sat May 05, 2007 10:02 pm
Location: Bucyrus, Ohio

Re: The D-value SOS calculation is broken

Post by Judson Laipply »

ThisIsMyUsername wrote: Fri Jan 25, 2019 3:31 pm Nearly a year has passed since Ryan's original post in this thread, and this year's SCT is fast approaching. Have any steps been taken to tweak the strength-of-schedule metric in the D-value calculation, in response to the flaws that Ryan demonstrated?

Doubly noted. As someone who has been personally fucked or near fucked by this very issue the last three years, I'd like to know if we should just register for the SCT site at whatever site has the weakest D1 field to maximize our chance of qualifying relative to the UChicago thunderdome that will inevitably happen at Northwestern. (EDIT: It has been pointed out to me that this should be noted as sarcasm and a comparison to what happened several years ago when a bid was taken but then later not used IIRC by a team at a site very far from their school)
James L.
Kellenberg '10
UPenn '14
UChicago '20
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: The D-value SOS calculation is broken

Post by Fado Alexandrino »

Canada's SCT D1 field is looking at lot like what the Chicago/Chicago/Northwestern/WUSTL site looked like in 2016. I hope we don't end up in a million way circle of death and knock each other out of qualifying.
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
User avatar
setht
Auron
Posts: 1205
Joined: Mon Oct 18, 2004 2:41 pm
Location: Columbus, Ohio

Re: The D-value SOS calculation is broken

Post by setht »

NAQT is discussing possible changes to the D-value calculation (and the SOS calculation in particular); we'll see what the results look like once we finish running the numbers.


Looking back at Ryan's initial post, I disagree with some of his arguments.
ryanrosenberg wrote: Wed Feb 07, 2018 1:54 pm Last Saturday, DePaul played the Missouri SCT site, a combined D1/D2 field using the D2 set. Of DePaul's opponents, only WUSTL A and C averaged 20 PPB on the D2 set, with over half coming in under 15 PPB.

And what did this leave DePaul with? The second-highest strength of schedule (SOS), not in the Missouri field, but of every team that played SCT. This is due to a confluence of three serious flaws with the SOS calculation, which I'll lay out below.

To review, SOS is calculated as tossup points per tossup heard (TUPPTH) of the teams you played in their other games, divided by the tossup points per tossup heard over all SCT sites.

1. Tossup points per tossup heard is a heavily field-dependent measure of team strength. The current SOS calculation cannot differentiate reasonably strong teams in good fields from weak teams in weak fields. Looking at the current D-values list, Chicago D (15.69 PPB) has a worse TUPPTH than Ohio State C, Vanderbilt A, and Colorado A (10.61, 10.00, and 8.41 PPB, respectively). So playing Chicago D, a respectable opponent by any standard, lowers the SOS of Chicago D's opponents more than if they had hypothetically been able to play OSU C or Vandy A. That double-penalizes teams at the Northwestern SCT -- not only do they have to face a fairly strong team as their weakest opponent, but their SOS takes a huge hit for it.
If I understand Ryan correctly here, he is saying that if we imagine swapping Chicago D with Ohio State C, Vanderbilt A, or Colorado A (i.e. having the two teams swap which SCT sites they played at), we should keep their TUPPTH fixed when trying to predict the resulting TUPPTH_Opp and SOS values for the other teams at the Northwestern site. However, as Ryan points out, TUPPTH is a heavily field-dependent measurement—presumably if Chicago D played at a weaker site than Northwestern, their TUPPTH would rise. Similarly, if Ohio State C, Vanderbilt A, or Colorado A had played at the 2018 Northwestern SCT site, I would imagine their TUPPTH would drop. And then every other team at the Northwestern site would have a lower TUPPTH_Opp and a lower SOS.

ryanrosenberg wrote:2. The effect of point 1 is exacerbated in round-robin scheduling. Let's take the example of two four-team round robin tournaments. The first has four strong teams in the 17-19 PPB range. The second has two of those teams, and then two teams in the 10-12 PPB range. In the first tournament, each team will get roughly half the tossups per round, so each team's opponent's TUPPTH will be ~5.00 (assuming as many powers as negs). In the second tournament, the two strong teams will get about 80% of the tossups against the two weak teams and 50% against each other. The weak teams will get 20% against strong teams and 50% against each other. So a strong team will have played one strong team (~7.72 TUPPTH in other games) and two weak teams (~3.64 TUPPTH in other games), for a SOS of ~5.00. So a strong team gets the benefit of beating up weak teams without any hit to SOS! This effect would be further exacerbated by a final between the two strong teams, which will boost their SOS without dramatically reducing their TUPPTH.

More generally, in a round robin, the number of tossup points scored by your opponents in other games is equal to {(10 x Tossup Conversion Rate) + (5 x Power Rate) - (5 x Neg Rate)}. This measure doesn't really vary too much from site to site, since even in games between relatively weak teams, almost all tossups are still converted, and power rates aren't significantly lower than games between good teams.
I don't agree with the proposed numbers of this thought experiment either. In particular, I don't think that "even in games between relatively weak teams, almost all tossups are still converted" is a good assumption. Unlike the first thought experiment ("what if two teams swapped SCT sites"), I think we can do a bit of a check on this one.

The teams with the three lowest records at the 2018 Northwestern SCT were Chicago D, Chicago C, and Illinois A. In 6 games between those teams, pairs of teams combined to answer 19, 20, 20, 21, 22, and 22 tossups, for an average of 20.67 tossups answered per match (or 1.33 tossups dead per match). The teams combined to power 2, 3, 3, 4, 4, and 7 tossups, for 3.83 tossups powered per match.

The teams with the three lowest records at the 2018 Mizzou SCT were SIU-Edwardsville, WUSTL E, and East Central. In 3 games between those teams, pairs of teams combined to answer 16, 16, and 16 tossups, for an average of 16 tossups answered and 6 tossups dead per match. The teams combined to power 2, 3, and 3 tossups, for an average of 2.67 tossups powered per match.

The big difference here is the tossup conversion rate. Looking back at Ryan's "two strong, two weak" hypothetical RR tournament, I think the claim that "the two weak teams will get 50% against each other" is especially suspect: it seems like it would be more consistent with real results to project that the two weak teams would each answer about 38% of the tossups in their match. (While we're looking back at Ryan's numbers, can someone explain where the 7.72 and 3.64 figures came from? It seems to me that the "TUPPTH in other games" [not involving one of the strong teams] should be 8 [the second strong team] and 3.5 [the two weak teams].) I suspect it would also be more realistic to set the percentage of tossups answered by a weak team playing against a strong team to less than 20%, but let's ignore that for a moment. Just plugging a 38% answering rate for both teams in the weak vs. weak projection gives a TUPPTH of 2.9 for the two weak teams (in games not involving one of the strong teams), and thus a TUPPTH_Opp for a strong team of (8 + 2.9 + 2.9)/3 = 3.93. This is noticeably lower than the TUPPTH_Opp for a strong team in a "four-strong-teams" field (which we've projected to be ~5). If we include factors like "power rates in games between weak teams actually are different (in particular, are lower) than in games between strong teams," the SOS for the "two strong + two weak" field will decrease further.

ryanrosenberg wrote:3. There is no D2 conversion for opponents' TUPPTH in the SOS calculation. Following from the last sentence of point 2, the tossup conversion rate in combined fields is artificially raised, since a D1 team's opponents are being measured on their ability to convert D2 tossups rather than D1 tossups. It seems fairly clear that if you had forced the Missouri site to play on D1 questions, many more tossups would have gone dead, and the SOS of all teams would be much lower. However, teams at all non-combined sites are being judged on their ability to convert D1 questions, so comparing those two measures seems illogical.
SOS shows up in the D-value calculation only through the combination TUPPTH x SOS x DC_T. I'm told that DC_T is meant to correct both a team's own TUPPTH and its SOS, in cases where a team plays on the "wrong" set. I don't have a strong evidence-based sense of exactly how much of an adjustment should be made to TUPPTH and SOS in cases where a team plays on the "wrong" set, but I wanted to point out that the current D-value calculation is meant to include an adjustment to the SOS in such cases.

ryanrosenberg wrote:How should NAQT fix the strength of schedule calculation for future years? Use points per tossup heard rather than tossup points per tossup heard, which will incorporate a non-competitive measure of team strength (bonus conversion) and adjust for the strength good teams in very competitive fields. Additionally, NAQT should apply a D2 conversion factor to the SOS of combined fields to avoid comparing field strengths on two very different sets.
Having disagreed with Ryan on various points, I wanted to say that I am (and NAQT is) open to the idea of tweaking the D-value calculation, possibly along the lines suggested here. But I think we'll want stronger modeling/theoretical justifications, and/or improved prediction of ICT performance.
Seth Teitler
Formerly UC Berkeley and U. Chicago
President of NAQT
Emeritus member of ACF
User avatar
Important Bird Area
Forums Staff: Administrator
Posts: 6113
Joined: Thu Aug 28, 2003 3:33 pm
Location: San Francisco Bay Area
Contact:

Re: The D-value SOS calculation is broken

Post by Important Bird Area »

naqt.com wrote:NAQT has made minor revisions to the D-Value system used to compare the performance of teams at different SCTs. ...
Based on research on past SCT and ICT performances, NAQT found that this revision improved the predictiveness of D-Values by a modest, but still significant, amount. We wish to thank Ryan Rosenberg for his suggestion that spurred this research and revision.
full details
Jeff Hoppes
President, Northern California Quiz Bowl Alliance
former HSQB Chief Admin (2012-13)
VP for Communication and history subject editor, NAQT
Editor emeritus, ACF

"I wish to make some kind of joke about Jeff's love of birds, but I always fear he'll turn them on me Hitchcock-style." -Fred
User avatar
Fado Alexandrino
Yuna
Posts: 834
Joined: Sat Jun 12, 2010 8:46 pm
Location: Farhaven, Ontario

Re: The D-value SOS calculation is broken

Post by Fado Alexandrino »

Thanks Jeff. Would it be possible to see past year's D-value rankings using this new formula?
Joe Su, OCT
Lisgar 2012, McGill 2015, McGill 2019, Queen's 2020
Locked