This Time, A Stern Warning

Elaborate on the merits of specific tournaments or have general theoretical discussion here.
User avatar
Carlos Be
Wakka
Posts: 199
Joined: Sun Jun 25, 2017 11:34 pm

Re: This Time, A Stern Warning

Post by Carlos Be »

kdroge wrote: Wed Apr 15, 2020 6:41 pm I'm not sure exactly what harsh measures would entail- certainly, a 12-month ban from all quiz bowl activities, a publicly-accessable post to a person google searching their name (to be taken down after 3 years) detailing the cheating, and a formal letter to whoever is in charge of academic integrity at their school relating what has happened would be a good place to start.
How could this be implemented? Any formal punishment would require a formal authority to institute that punishment. I don't know what that authority could be.
Justine French
UCLA
User avatar
Skepticism and Animal Feed
Auron
Posts: 3209
Joined: Sat Oct 30, 2004 11:47 pm
Location: Arlington, VA

Re: This Time, A Stern Warning

Post by Skepticism and Animal Feed »

Jesus fuck. This is one of the most disappointing moments in the history of quizbowl. Like many of you, I have known Eric for over a decade, respected him greatly, and cheered for him. I was in the room when Eric finally won a championship at ACF Nationals and felt so happy for him that I made my way to the stage and hugged him. I do not recall ever hugging anyone else at a quizbowl tournament. Eric was also notably a victim of cheating. At the 2010 ICT, Harvard played Penn in a one-game playoff for the right to face Chicago in the finals. Penn led at halftime, when Andy Watkins suddenly took over the game and "won". To this day, I think it's unfair that NAQT declared Chicago the winner of that ICT when Watkins was exposed, Penn deserved consideration as well. I felt so strongly about this that I mailed my individual 2010 ICT Champion trophy to Eric.

It matters to me, though, is that prior to this new cheating scandal Eric had an unblemished record not just as an elite player, but as a writer, editor, quizbowl theorist, and mentor to a great many players. That's why I am going to chime in on the side of those calling for a moderate punishment, like a 1-year ban. Not because cheating isn't bad or online cheating isn't as bad as IRL cheating (a weakness of that argument is that, right now, online quizbowl is the only quizbowl we have) but because Eric was an excellent quizbowl citizen before.

I understand that, to some, this will come off as cronyism, as people being lenient to their own friends, as a double standard for well-connected and established players. I think there's a difference, though. I am for leniency for Eric not because he scored a lot of points and held a lot of important editorships, but because he was a genuinely good person while doing all of that. Let us not forget that Andy Watkins was an excellent science player (yes, even when he did not have the answers ahead of time), had many friends among quizbowl's leaders, held a lot of important editorships, and was even an NAQT member. And nobody really had any trouble with lifetime banning him from everything.
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
ArnavS
Lulu
Posts: 51
Joined: Fri Feb 19, 2016 12:57 pm

Re: This Time, A Stern Warning

Post by ArnavS »

I'm slightly less optimistic than you are about how that isn't suggesting a "double standard for well-connected and established players." Certainly nobody would argue that simply being good at the game entitles you to leniency. The difficulty is that a lot of the more innocuous bases are only unlocked if you are a well-known, top-scoring player.

Quiz bowl is often a community that places a higher premium on skill than effort (it doesn't matter that you are willing to be TD for an important tournament or head editor for an important set; you need to also be selected.) The same goes for more "personal" things like mentorship (you need to actually be an elite player, before people can see you as a mentor). And indeed, if someone is less central in the community, the extent to which people would even know about their personal characteristics is limited. Even normal socializing at tournaments is determined by who you know, who knows you, etc.

Think of it like being rich; it's hard to be a major philanthropist, and to invest heavily in your local community, or even for random people on the street to know who you are, if you weren't sitting on loads of money to begin with.

But I'm also really skeptical about gentleness towards people that we feel good about. It's notoriously difficult to separate "so-and-so is a good dude" from "I really like so-and-so." I know top-notch quiz bowlers who are morally upright but personally abrasive (and certainly, I've been an ass to people in the past without ever being a top-notch quizbowler.) Call it self-interest, but I wouldn't want a system where people like that suffer.

In other words, we should be writing rules and community norms to guard against emotional bias, not baking it right in. Hell, this thread alone proves that it's easy for a community as enlightened and self-aware as this one to start bandwagoning (i.e., everyone started off suspicious, then said they believed Chris and Eric, and now we're here.)

Though with that, I'm going to contradict everything I just said. These are theoretical concerns; in this actual case, I agree with you. I think Eric is a genuinely good guy. I've thought this for years (although I've only met him once, which perhaps says something about whose characteristics we know and who is invisible.) And I don't think a (relatively benign, in the grand scheme of things) moment of weakness changes that assessment. Human beings sometimes succumb to temptation, and I think long patterns of good behavior matter a great deal.

Edit: Clarity in opening.
"We're not going to pay you to come to our tournaments" --- Paul Kasiński
NYU, 2014-2018
University of British Columbia, 2018-Present
gyre and gimble
Tidus
Posts: 730
Joined: Tue Feb 17, 2009 2:45 am

Re: This Time, A Stern Warning

Post by gyre and gimble »

Like Bruce says, Eric has a lot of goodwill built up in this community. Whether you think he has spent some or all of it, that should count for something. In the same way that not everyone gets the same punishment for the same crime in our criminal justice system, I think we should be more lenient on Eric than for someone who has not contributed as much to quizbowl (as a writer, mentor, organizer, and leader--not as a player).

I'm not so concerned about the correlation between someone's playing skill and their ability to build up goodwill. Especially in recent years, there have been lots of people in this community who are not elite players but are nevertheless well known for their contributions to quizbowl. There are also players who may get name-recognition but have contributed relatively little to the community (e.g., me). I don't think it's too difficult, especially for those who are more plugged in (i.e., those whose attitudes and judgments toward cheaters are likely to have a greater impact), to keep these things straight.

It takes a lot of effort and intent (far more than it takes to cheat at online quizbowl, I might add) to do as much as Eric has done for quizbowl. To me, this is very different from a philanthropist who just throws money at problems.
Stephen Liu
Torrey Pines '10
Harvard '14
Stanford '17
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

I'll preface this by saying this post, perhaps with the exception of a few points of clarification, will be my last post in the thread.

I've made an 18-minute Youtube video explaining what my basic method was, and how one might tweak the method to get it to apply in the general case. Eric agreed to let me discuss his case in this video, but he and I agreed to refer to him as "Player Y" in the video so that Youtube's algorithms don't index him. That being said, Eric still appears in the Terrapin Online stats when I hover over them.

Please feel free to DM me with questions. The dataset that I used and compiled is below, anyone is free to use it without accrediting me.

Link to Stats Page

EDIT -- Just to be 1000% clear, I'm not accusing anyone of cheating in this video. I use the hypothetical example of Jordan Brownstein to clarify one of my points.
Last edited by Ike on Fri Apr 17, 2020 3:05 pm, edited 1 time in total.
Ike
UIUC 13
User avatar
k120
Lulu
Posts: 28
Joined: Mon Dec 25, 2017 12:36 am
Location: Chicago

Re: This Time, A Stern Warning

Post by k120 »

gyre and gimble wrote: Thu Apr 16, 2020 2:58 pm not everyone gets the same punishment for the same crime in our criminal justice system
Hold on, is this supposed to be a good thing?

Anyway, I think that whether or not it has a perfect correlation with playing skill, there are some people who are very well-known and well-regarded in quizbowl, and way more people who are not. If you're willing to be lenient with Eric because he's a good person, then you should be ready to assume that a random high schooler who you've never heard of is an equally good person and be as lenient with them for googling clues as you are with Eric.

The other argument in favor of Eric is the contributions he's made to quizbowl. I don't think that's a very good argument against someone having cheated or that their cheating is less bad than anyone else's. It's true that losing Eric would probably be a more significant loss to quizbowl than losing an unknown high schooler. But there are punishments, such as a one-year online ban, that wouldn't cause a huge loss to the community.
Kenneth Martin
Evanston Township HS '18
Beloit College '22
User avatar
Skepticism and Animal Feed
Auron
Posts: 3209
Joined: Sat Oct 30, 2004 11:47 pm
Location: Arlington, VA

Re: This Time, A Stern Warning

Post by Skepticism and Animal Feed »

k120 wrote: Fri Apr 17, 2020 3:03 pm
gyre and gimble wrote: Thu Apr 16, 2020 2:58 pm not everyone gets the same punishment for the same crime in our criminal justice system
Hold on, is this supposed to be a good thing?
Who are we to say whether or not it is a good thing, but lesser penalties for first-time offenders and/or people who have a long track record of being good citizens is a really common feature of criminal justice systems all over the world.
Bruce
Harvard '10 / UChicago '07 / Roycemore School '04
ACF Member emeritus
My guide to using Wikipedia as a question source
gyre and gimble
Tidus
Posts: 730
Joined: Tue Feb 17, 2009 2:45 am

Re: This Time, A Stern Warning

Post by gyre and gimble »

k120 wrote: Fri Apr 17, 2020 3:03 pm
gyre and gimble wrote: Thu Apr 16, 2020 2:58 pm not everyone gets the same punishment for the same crime in our criminal justice system
Hold on, is this supposed to be a good thing?

Anyway, I think that whether or not it has a perfect correlation with playing skill, there are some people who are very well-known and well-regarded in quizbowl, and way more people who are not. If you're willing to be lenient with Eric because he's a good person, then you should be ready to assume that a random high schooler who you've never heard of is an equally good person and be as lenient with them for googling clues as you are with Eric.

The other argument in favor of Eric is the contributions he's made to quizbowl. I don't think that's a very good argument against someone having cheated or that their cheating is less bad than anyone else's. It's true that losing Eric would probably be a more significant loss to quizbowl than losing an unknown high schooler. But there are punishments, such as a one-year online ban, that wouldn't cause a huge loss to the community.
Well, the argument I am making is both a mix, and neither, of the two you've discussed. We should be lenient with Eric not because he's a good person in general, or because his skills are valuable to quizbowl, but because he has done lots of good things to make this game and this community better.

To be clear, I am not advocating that Eric deserves a free pass. I'm not even sure how much more lenient we should be with him than with the average person. But I do think that the great amount of good Eric has done for this community should be a factor in situations like these.
Stephen Liu
Torrey Pines '10
Harvard '14
Stanford '17
User avatar
Footscray Western Bulldogs
Lulu
Posts: 78
Joined: Mon Mar 30, 2015 10:43 pm
Location: Tucson, AZ

Re: This Time, A Stern Warning

Post by Footscray Western Bulldogs »

If we are to treat cheating at online tournaments as an existential threat to the quizbowl community, it's really not acceptable to refer to it by a cutesy euphemism.
Sam Rombro
Arizona '20
Maryland '18
Writer, NAQT (inactive)
User avatar
naan/steak-holding toll
Auron
Posts: 2335
Joined: Mon Feb 28, 2011 11:53 pm
Location: New York, NY

Re: This Time, A Stern Warning

Post by naan/steak-holding toll »

Footscray Western Bulldogs wrote: Fri Apr 17, 2020 4:19 pm If we are to treat cheating at online tournaments as an existential threat to the quizbowl community, it's really not acceptable to refer to it by a cutesy euphemism.
I think in this case "biking" is useful because it refers to a very specific form of cheating, and in this case one that isn't always as statistically obvious, based on the evaluations of the discourse. Since it's a term that exists already, we might as well use it.

Also, the phrase "get off your bike" is hilarious. As in "get off your bike, and who you crappin?"
Will Alston
Bethesda Chevy Chase HS '12, Dartmouth '16, Columbia Business School '21
NAQT Writer and Subject Editor
John Quincy Adams's Alligator
Rikku
Posts: 385
Joined: Fri Feb 24, 2017 4:41 pm

Re: This Time, A Stern Warning

Post by John Quincy Adams's Alligator »

naan/steak-holding toll wrote: Fri Apr 17, 2020 4:23 pm
Footscray Western Bulldogs wrote: Fri Apr 17, 2020 4:19 pm If we are to treat cheating at online tournaments as an existential threat to the quizbowl community, it's really not acceptable to refer to it by a cutesy euphemism.
I think in this case "biking" is useful because it refers to a very specific form of cheating, and in this case one that isn't always as statistically obvious, based on the evaluations of the discourse. Since it's a term that exists already, we might as well use it.

Also, the phrase "get off your bike" is hilarious. As in "get off your bike, and who you crappin?"
Biking isn't a term that I've ever seen used outside of the discord/meme pages (places that are intentionally less serious than the forums!), and it's going to be a confusing term because plenty of people will read these threads that don't interact with those media. Terms like "online cheating" or "googling" or whatever else will almost certainly be more parseable for when these threads get revisited in the future, or when people who mainly interact with the forums read this. We can make a start by _not_ reifying terms like "biking" and giving them legitimacy. Also, the second point strikes me as exactly the point Sam was trying to make - why are we trying to use phrases or words that provoke hilarity/absurdity to discuss something we're treating as a deeply serious issue?
Vishwa Shanmugam
Downingtown STEM '18
UMD '22
User avatar
Santa Claus
Wakka
Posts: 196
Joined: Fri Aug 23, 2013 10:58 pm

Re: This Time, A Stern Warning

Post by Santa Claus »

naan/steak-holding toll wrote: Fri Apr 17, 2020 4:23 pm I think in this case "biking" is useful because it refers to a very specific form of cheating
Does it? What does it mean?

I thought it meant “cheating in quiz bowl, in any context”, though I’ve also heard that it means “cheating on the Discord”, “cheating via Googling/looking things up”, and “cheating by looking at the packet during a reading of a posted set”. I’m sure most people don’t have any idea what it means, considering how any time any one says it any where people need to explain that it’s slang with no definite etymology and no fixed definition.
Kevin Wang
Arcadia High School 2015
Amherst College 2019

2018 PACE NSC Champion
2019 PACE NSC Champion
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

All this talk about proscriptions feel odd to me. Whether it's in poker, where it's common to use silly sounding terms like "angle shots" to describe acts of questionable integrity, to the goofy sounding term of "doping" to describe the serious act of using drugs to cheat, or "phishing" to describe the act computer of what Amit did to acquire the question set, it's pretty common for an initially silly term to become part of the common parlance.
Ike
UIUC 13
User avatar
5 Fingaz to the Male Gaze
Wakka
Posts: 172
Joined: Sun Oct 15, 2017 10:01 pm
Location: Chicago, IL

Re: This Time, A Stern Warning

Post by 5 Fingaz to the Male Gaze »

Ike wrote: Fri Apr 17, 2020 4:44 pm All this talk about proscriptions feel odd to me. Whether it's in poker, where it's common to use silly sounding terms like "angle shots" to describe acts of questionable integrity, to the goofy sounding term of "doping" to describe the serious act of using drugs to cheat, or "phishing" to describe the act computer of what Amit did to acquire the question set, it's pretty common for an initially silly term to become part of the common parlance.
I don't particularly have a problem with the term being "silly," but I do have a problem with the fact that this term is used by a relative minority of people and its usage can thus lead to confusion. Already, in this thread, there appears to be confusion over what the precise definition of "biking" is -- I myself was also unaware that it refers to a specific form of cheating.
Wonyoung Jang
Belmont '18 // UChicago '22
ACF; NAQT
Jack
Lulu
Posts: 87
Joined: Thu Sep 28, 2017 5:07 pm

Re: This Time, A Stern Warning

Post by Jack »

Yeah, I'm gonna go ahead and presume a healthy majority of people currently involved in college quiz bowl don't know that "biking" means some form of cheating, and even fewer know it specifically refers to what "Player Y" did. Just say "googling tossups."

The statistical analysis in the video is definitely a lot more convincing and robust than the earlier so-called 'haruspicy.' I do worry that its applications might be limited to cases like this with very strong players at higher difficulties, but since our collective concerns should be placed, I think, on preventing cheating in the first place and coming up with the right response to current, proven instances of cheating rather than perfecting a way to detect cheating after it happens in the future, I don't think it's worth delineating. This is good work.

I also want to call attention (again) to the fact that the original post in this thread suggested multiple cheaters. I don't really know if it's a good idea to push for a call to out these individuals, but it seems odd that this has, to an extent, been swept under the rug and ignored for the time being -- which is understandable, in part, since what has been revealed to be true is a big deal and it's not easy to focus on multiple things at a time, but I think it's bad to introduce the possibility of more bad actors, since apparently it "is more likely than not the parties named during the course of my investigation cheated" (emphasis mine), and then not really follow up on it. People who played and are innocent deserve to have their names cleared and I hope eventually something like that can happen. Not that people are going around throwing wild accusations anymore, but still. If I had played, I think I would be at least somewhat miffed if it didn't get resolved.

On the other hand, if "parties" only referred to two people, and it was the two people who have been extensively discussed in this thread, then I would hope someone with knowledge of the situation would say something like "yes, all suspected cheaters have either confessed or been 'acquitted,'" or something like that.
Jack
Bermudian Springs HS
Princeton University '21
User avatar
Stained Diviner
Auron
Posts: 4859
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Re: This Time, A Stern Warning

Post by Stained Diviner »

I appreciate Ike's video and method, but the method is very limited. It only applies to somebody who cheats whenever he is having a bad game or possibly to somebody who tries to cheat all the time but has limited success for whatever reason. If somebody only cheats in close matches, somebody on a strong team only cheats against teams that he is worried about losing to, or somebody only cheats against teams that he dislikes, then the method does not work. Also, it uses powers, so it does not apply to tournaments without powers or to situations where somebody waits before buzzing to make the cheating less obvious. Based on what I saw, it also does not correct for the fact that some tournaments have more generous power placements than others.

This is better than nothing--it gave much stronger statistical support to this situation than other methods did, and it possibly could help in other situations--but nobody should think that they can use Ike's method to end quizbowl cheating. To be clear, I am not criticizing Ike, who as far as I can see handled this situation very well.
David Reinstein
PACE VP of Outreach, Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (2011-2017), IHSSBCA Board Member, IHSSBCA Chair (2004-2014), PACE Member, PACE President (2016-2018), New Trier Coach (1994-2011)
ArnavS
Lulu
Posts: 51
Joined: Fri Feb 19, 2016 12:57 pm

Re: This Time, A Stern Warning

Post by ArnavS »

Skepticism and Animal Feed wrote: Fri Apr 17, 2020 3:16 pm
k120 wrote: Fri Apr 17, 2020 3:03 pm
gyre and gimble wrote: Thu Apr 16, 2020 2:58 pm not everyone gets the same punishment for the same crime in our criminal justice system
Hold on, is this supposed to be a good thing?
Who are we to say whether or not it is a good thing, but lesser penalties for first-time offenders and/or people who have a long track record of being good citizens is a really common feature of criminal justice systems all over the world.
I'm not sure this analogy quite works. A judge is responsible for actively learning about the defendant and deciding whether they've been upstanding; the online quizbowl community couldn't do that about a random highschooler, even if we wanted to. Also, the community's interests are necessarily rather limited; we care mainly about what people have contributed to quizbowl, and not in the other facets of their lives.

Lesser penalties for first-time offenders make sense, though.

Edit: Also, a vote from me against the use of phrases like "biking." Will says above that the term designates "a very specific form of cheating"; maybe I just haven't followed carefully, but I haven't the faintest idea what form that is.
"We're not going to pay you to come to our tournaments" --- Paul Kasiński
NYU, 2014-2018
University of British Columbia, 2018-Present
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

One more fun tidbit for those who wondered what would happen if we ran the same algorithm for Andy Watkins, a known cheat.

Counting only power-marked tournaments, the surviving stats from that era indicated Watkins played SCT 2009, THUNDER I, Terrapin 2009, and Penn Bowl 2009. That's four regular difficulty tournaments, and here are the 49 data points using the same method of tabulating power counts.

At the 2010 ICT, Watkins went 14 games out of 15 where he powered at least one tossup. Using the binomial distribution, one can see that the probability that he does this is 0.001061841221, or about 1/941. This is dangerously close to the 1/1000 threshold I've proposed for "further investigation is required." However, it is worth noting that my method assumes that ICT will be a regular difficulty tournament -- I think the true p value would go down if I had stats from themore difficult tournaments of the era, which I could not find. (FICHTE stats would be really helpful if anyone has those.) At the 2011 ICT, Watkins went 13 games out of 14 where he powered at least one tossup. Using the binomial distribution, one can see that the probability that he does this is .001877101026, or about 1/532. The probability (again assuming that ICT 2011 is a regular difficulty tournament,) of doing that well at both of those tournaments is simply calculated by multiplying those probabilities, which amounts to about 1/500,000. Counting the fact that ICTs, especially the 2010 and 2011 iterations, are much, much, much, harder than regular difficulty tournaments, I think it's safe to say that Watkins is truly full of shit.

Again there are caveats with what I did -- was Watkins improving during this time? Did I screw up and mistake another mononymically-named Andy at Harvard with the other Andy? (I don't think so.) That being said, 49 data points is pretty strong (even if they're drawn from only four tournaments) and in my opinion, the insights I gained while analyzing biking is probably a bit more applicable to cheating in general.
Ike
UIUC 13
User avatar
vinteuil
Auron
Posts: 1455
Joined: Sun Oct 23, 2011 12:31 pm

Re: This Time, A Stern Warning

Post by vinteuil »

Ike wrote: Sat Apr 18, 2020 7:13 amThe probability (again assuming that ICT 2011 is a regular difficulty tournament,) of doing that well at both of those tournaments is simply calculated by multiplying those probabilities, which amounts to about 1/500,000.
Doesn't this assume that the probabilities are independent? Is that a good assumption for two highly correlated events like subsequent performances on a similar set?
JR
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

vinteuil wrote: Sat Apr 18, 2020 10:41 am
Ike wrote: Sat Apr 18, 2020 7:13 amThe probability (again assuming that ICT 2011 is a regular difficulty tournament,) of doing that well at both of those tournaments is simply calculated by multiplying those probabilities, which amounts to about 1/500,000.
Doesn't this assume that the probabilities are independent? Is that a good assumption for two highly correlated events like subsequent performances on a similar set?
Probably not a good assumption.

Though, in this case I would argue that 2010 ICT to 2011 ICT are two very different sets (perhaps more different than any other pairs of consecutive ICTs. I generally tend to think of 2011 as being the first set that really caught up to modern standards of writing in NAQT, especially since they brought on Seth Teitler since he was done playing. If you have the opportunity, give those two different sets a readthrough, the difference is pretty whack!
Ike
UIUC 13
User avatar
1.82
Rikku
Posts: 357
Joined: Thu Feb 05, 2015 9:35 pm
Location: a vibrant metropolis, the equal of Paris or New York
Contact:

Re: This Time, A Stern Warning

Post by 1.82 »

vinteuil wrote: Sat Apr 18, 2020 10:41 am
Ike wrote: Sat Apr 18, 2020 7:13 amThe probability (again assuming that ICT 2011 is a regular difficulty tournament,) of doing that well at both of those tournaments is simply calculated by multiplying those probabilities, which amounts to about 1/500,000.
Doesn't this assume that the probabilities are independent? Is that a good assumption for two highly correlated events like subsequent performances on a similar set?
As far as I can tell, going beyond the obvious issues with the minuscule sample sizes in question, Ike's method makes the pretty fundamental assumption that performances by players across multiple tournaments are both independent and identically distributed, when in fact there is no real reason to think that either is the case. No amount of raising (1-f) to the nth power will make up for that.

In the example that Ike presents of the 2010 and 2011 ICTs, he himself admits that Andy Watkins's known cheating would fail his own arbitrary threshold for requiring further investigation, which doesn't speak well for the ability of this advanced statistical method to actually identify cheaters. Of course, nobody had to use any statistical manipulation to observe that Andy Watkins was probably cheating; they observed his high point totals and his unbelievably low neg totals and concluded that something was wrong. Neither was any proof of his wrongdoing furnished by statistical methods, because proof came from the server logs that demonstrated his improper access to questions. Given that this kind of statistical manipulation was unnecessary to identify a potential cheater and insufficient to prove cheating, it would have been useless in the Andy Watkins case.

Personally, I don't play online tournaments because I don't enjoy the experience. However, if I had been interested in playing online tournaments, I certainly wouldn't be now! As a generally mediocre player who sometimes has a very good day when the conditions are right and the set is right, I am led to conclude from this discussion that a very good performance at an online tournament could lead to accusations, whether private or public, followed by the use of these statistical methods. Suffice it to say that because of my disbelief in the underlying assumptions, I absolutely do not believe any of these P-values that are being spit out. I have no reason to believe that a so-called "1/1000" event actually only occurs one in a thousand times, and we have no empirical way of saying that this is the case, just numbers massaged atop more numbers. False certainty is worse than no knowledge at all.
Naveed Chowdhury
Maryland '16
Georgia Tech '17
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

Eh, if anything the issue is with the fact there's just a lot of missing data, not the method itself. There's no data that survives for Minnesota Open 2008, 09, 10, or FICHTE I and II which would be incredibly helpful. I played Watkins at three of those tournaments, and I recall at two of them, he didn't power a single tossup in either of those. I don't remember what happened at FICHTE II.

The only thing we can work on is regular difficulty data, which is markedly easier. If you can't understand that, nor why it's interesting that based on old data Watkins put up a performance he should only repeat 1/941 tournaments if ICT were regular difficulty is worthy of highlighting as a "fun tidbit," then I think no amount of sense will convince you of anything.
Ike
UIUC 13
User avatar
vinteuil
Auron
Posts: 1455
Joined: Sun Oct 23, 2011 12:31 pm

Re: This Time, A Stern Warning

Post by vinteuil »

Ike wrote: Sat Apr 18, 2020 3:35 pm Eh, if anything the issue is with the fact there's just a lot of missing data, not the method itself. There's no data that survives for Minnesota Open 2008, 09, 10, or FICHTE I and II which would be incredibly helpful. I played Watkins at three of those tournaments, and I recall at two of them, he didn't power a single tossup in either of those. I don't remember what happened at FICHTE II.
This is interesting and useful evidence, but it's more along the lines of what Naveed pointed out ("eye test") than the quantitative methods you're otherwise advocating. (Although I do take your point: if we had the same data for Watkins as we do for Eric, any calculation we run on them would have substantially reduced uncertainty.)
JR
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

vinteuil wrote: Sat Apr 18, 2020 3:50 pm
Ike wrote: Sat Apr 18, 2020 3:35 pm Eh, if anything the issue is with the fact there's just a lot of missing data, not the method itself. There's no data that survives for Minnesota Open 2008, 09, 10, or FICHTE I and II which would be incredibly helpful. I played Watkins at three of those tournaments, and I recall at two of them, he didn't power a single tossup in either of those. I don't remember what happened at FICHTE II.
This is interesting and useful evidence, but it's more along the lines of what Naveed pointed out ("eye test") than the quantitative methods you're otherwise advocating. (Although I do take your point: if we had the same data for Watkins as we do for Eric, any calculation we run on them would have substantially reduced uncertainty.)
Yeah, agreed with you. The reason I chose Watkins to do this historical exercise is to show that the method (likely) yields results on historical data, especially since NAQT explicitly announced that they could not find any statistical evidence of Watkins' performance being foul. I could have done this on Joshua Alman too, but I suspect that we all know how that's going to bear out.
Ike
UIUC 13
User avatar
Stained Diviner
Auron
Posts: 4859
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Re: This Time, A Stern Warning

Post by Stained Diviner »

I'm not sure about that with Alman. He put up very good numbers at MIT Penn-ance, where he might have cheated as well for all we know. He played fewer tournaments, so it is difficult to compare between tournaments. His numbers before 2012 are much lower than his 2012 numbers, so on the one hand it is suspicious by any statistical method, but on the other hand there is the theoretical possibility of improvement as a player.
David Reinstein
PACE VP of Outreach, Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (2011-2017), IHSSBCA Board Member, IHSSBCA Chair (2004-2014), PACE Member, PACE President (2016-2018), New Trier Coach (1994-2011)
User avatar
Ike
Auron
Posts: 1050
Joined: Sat Jul 26, 2008 5:01 pm
Contact:

Re: This Time, A Stern Warning

Post by Ike »

Stained Diviner wrote: Sat Apr 18, 2020 4:50 pm I'm not sure about that with Alman. He put up very good numbers at MIT Penn-ance, where he might have cheated as well for all we know. He played fewer tournaments, so it is difficult to compare between tournaments. His numbers before 2012 are much lower than his 2012 numbers, so on the one hand it is suspicious by any statistical method, but on the other hand there is the theoretical possibility of improvement as a player.
Actually it was established he cheated at that tournament by acquiring the question set beforehand as well.
Ike
UIUC 13
User avatar
Auks Ran Ova
Forums Staff: Chief Administrator
Posts: 4143
Joined: Sun Apr 30, 2006 10:28 pm
Location: Minneapolis
Contact:

Re: This Time, A Stern Warning

Post by Auks Ran Ova »

Ike wrote: Sat Apr 18, 2020 4:54 pm
Stained Diviner wrote: Sat Apr 18, 2020 4:50 pm I'm not sure about that with Alman. He put up very good numbers at MIT Penn-ance, where he might have cheated as well for all we know. He played fewer tournaments, so it is difficult to compare between tournaments. His numbers before 2012 are much lower than his 2012 numbers, so on the one hand it is suspicious by any statistical method, but on the other hand there is the theoretical possibility of improvement as a player.
Actually it was established he cheated at that tournament by acquiring the question set beforehand as well.
Yeah, it was a whole thing!
Rob Carson
University of Minnesota '11, MCTC '??
Member, ACF
Member, PACE
Writer and Editor, NAQT
User avatar
Stained Diviner
Auron
Posts: 4859
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Re: This Time, A Stern Warning

Post by Stained Diviner »

In that case, I would say that the pattern Ike identified in Eric's stats also shows up in Josh's stats. He got powers in every round of SCT, every round but 2 of ICT, and every round but 1 of Penn-ance. That's, uh, very impressive for a guy who got a combined total of 7 powers in three NAQT tournaments before 2012, one of which was played on an A Set and the other two of which were DII.
David Reinstein
PACE VP of Outreach, Head Writer and Editor for Scobol Solo and Masonics (Illinois), TD for New Trier Scobol Solo and New Trier Varsity, Writer for NAQT (2011-2017), IHSSBCA Board Member, IHSSBCA Chair (2004-2014), PACE Member, PACE President (2016-2018), New Trier Coach (1994-2011)
User avatar
Captain Sinico
Auron
Posts: 2867
Joined: Sun Sep 21, 2003 1:46 pm
Location: Champaign, Illinois

Re: This Time, A Stern Warning

Post by Captain Sinico »

There are two things I'd like to add here. The first is to say that Kurtis' post somewhat up-thread is something we all ought to read carefully. I share with Kurtis some experience of dealing with cheating in contexts outside quizbowl and come to highly similar conclusions.

One clarifying role of Kurtis' post is that it led me to think about this: there seems to be a near consensus that, for cheating purposes, online quizbowl and the in-person game are different -- most of us have trusted that most people in the in-person game were acting above-board, and by and large this was true and, I'd add, will be again, if we continue to do the right things. On the other hand, it's sensible to look at the norms and expectations in online quizbowl as though it's a separate game, but this only to a point -- it seems unreasonable, for instance, to silo off the online game such that it has no consequences in "the real world". That said, if we're going to treat online quizbowl as a separate activity where cheating is much more rampant, we can draw a clear baseline at consistently and strongly sending the signal that cheating is absolutely intolerable, as Kurtis said. There are many other things we might and should do in this regard -- obviously, that signal is a lot more effective and meaningful if we're able to catch or prevent cheating much better than we do now, for instance -- but you can't fix a rotten situation without at least putting that floor in.


The second thing is to note or remind us that there's a key distinction between evidence that merely raises or quantifies suspicion and evidence that, on its own, is damning. While the highest degree of skepticism of statistical analysis of the latter sort is very sensible and even necessary, we would be patently unwise to totally shoot down statistical methods in the former category.

In this regard, let's address a good point raised earlier. We have to accept that the distribution of powers in a round is neither perfectly independent nor identically distributed in general – there are many obvious things that impact this, question difficulty and opponent quality being the most obvious. That said, it's eminently reasonable to allow suspicions to be raised when streaks in that statistic occur that would be highly improbable if power numbers were IID and, in fact, the factors that move the distributions away from IID can allow us to say the situation is even worse, like when a player has few powers on an easy set against easy competition, but then a large number on a harder set against better competition. Explanations like "The set played to this player's strengths" or "The player improved a lot in the interval" or "The player just had a really good day" or "The field was very weak" or perhaps even "The environment helped the player disproportionately, relative to others" can be cogent and are often testable by statistic or eye. Placing the number in its context -- is it a floor on P? How robust are the assumptions? Etc. -- means the assumptions come out in the wash where in matters – when we debate what these numbers mean. However, we lose that ability to even consider these matters if we simply discard any statistical method that isn't perfect, even for exploratory purposes.

Finally, I'd add three further, related reasons that using statistics to at least inform accusation is advisable. First, accusations of cheating of various kinds and levels of formality are far more common than I suspect most of us realize, even in the in-person game. Second, as has been noted, it's generally accepted to at least suspect someone on the basis of an "eye test", which seems almost certain to be worse than all but the crudest statistical methods and which leaves us to strongly consider: "Whose eyes?"*. The third reason is that both of these things are exacerbated to an extreme degree in the online version of the game. The state of quizbowl is one seething with constant allegations (or worse) of cheating right now -- it seems clear that people are and will continue accusing many others of cheating on the basis of bad or no evidence. Realize that, even in the in-person game, there remains and maybe will always be+ a considerable proportion of people in and on the edges of our game who are happy to level accusations of impropriety for any strong performance, arbitrarily defined (stronger than them, stronger than they've seen before, etc.) It is imperative that we have at least something objective and better than "This person strikes someone as suspicious."

*Regarding this: I'm very skeptical of the "eye tests", even from pretty knowledgable people. I've seen so many false positives, even from my own eyes, that they're barely worth mentioning. However, the "eye test" is also very prone to false negatives. Consider that a lot of people's eyes reported sharp and sudden improvement ex post facto with respect to Andy Watkins. I'll say what I was on the record for at the time: I never suspected a thing, even in the middle of him taking every science tossup off me in an ICT he most our stole chance at (the same one Bruce mentions, if memory serves). I'll flatter myself that I'm not exactly the most gullible person in quizbowl; maybe you disagree but, as someone who followed a similar track to the one Andy claimed to, "I got better by studying, writing, and editing science" and "NAQT went better for me" added up, and, despite being pretty plugged into everything in those days, I heard literally nobody say anything to the contrary until after the fact.

+It's always salutary, I think, to consider that our game is difficult for most people to understand, especially when it is played well. Otherwise knowledgable people who know but little of our game routinely literally cannot believe how even fairly good high schoolers can do what they do. This is exacerbated by the seeming appearance of an "insiders club" dynamic among those with a little knowledge about the game -- it strikes many as bad or even below-board (only slightly removed from viewing the questions themselves in advance) that we can improve so much by studying old packets, or understanding a writers' or editors' tendencies. Such people may take a similar view toward influencing the canon through our own writing and editing, or lodging protests, or the fact that today's editors are yesterday's players who could stitch things up for their old team, or any number of other things. Of course, those who know more understand that quizbowl is highly welcoming and meritocratic; that, with notable but rare exceptions, it's as above-board as any other such game or activity, if not more so. The point is, though, we have to do work to establish that idea even for fair-minded observers.
Mike Sorice
Coach, Centennial High School of Champaign, IL (2014-2020) & Team Illinois (2016-2018)
Alumnus, Illinois ABT (2000-2002; 2003-2009) & Fenwick Scholastic Bowl (1999-2000)
Member, ACF (Emeritus), IHSSBCA, & PACE
User avatar
setht
Auron
Posts: 1192
Joined: Mon Oct 18, 2004 2:41 pm
Location: Columbus, Ohio

Re: This Time, A Stern Warning

Post by setht »

vinteuil wrote: Sat Apr 18, 2020 10:41 am
Ike wrote: Sat Apr 18, 2020 7:13 amThe probability (again assuming that ICT 2011 is a regular difficulty tournament,) of doing that well at both of those tournaments is simply calculated by multiplying those probabilities, which amounts to about 1/500,000.
Doesn't this assume that the probabilities are independent? Is that a good assumption for two highly correlated events like subsequent performances on a similar set?
I Am Not A Statistician, but this has been bugging me a bit: I think it actually is a pretty reasonable assumption that the probabilities are independent. Perhaps people are confusing 1) possible non-independence between random values drawn from some distribution with 2) the fact that random values drawn (independently!) from a reasonably narrow distribution tend to have similar values. (And so a small sample of random values can give a good estimate of the mean/peak of a narrow/strongly-peaked parent distribution.)

Putting this in a quiz bowl context, (I claim without presenting supporting data that) players tend to have similar PPGs at tournaments of similar difficulty*. If Player X has put up 35-50 PPG on the last three regular-difficulty sets they played, it is pretty likely that X will also put up about 35-50 PPG on the next regular-difficulty set they play**. This means that performances on similar sets are usually pretty similar (for aggregate statistics like PPG), but that's because there's (usually) a reasonably narrow/strongly-peaked parent distribution, not because of independence/non-independence of tournament performances. Put another way, I think a given player's PPG on similar sets tends to be highly clustered, but "highly clustered" is different from "correlated."

* in the absence of changes in important factors like teammate strength, opponent strength, set distributions, etc.
** again, assuming there aren't big changes in important factors.

I think tournament performance is statistically dependent on a number of factors—e.g. teammate strength, opponent strength, whether a player sleeps well the night before a tournament, and so on. I don't think "how Player X did at Tournament Y three months ago, or at Tournament Z two years ago" is likely to affect "how Player X will do at Tournament A tomorrow." (X's performances at Y and Z may be good predictors of X's performance at A, but they do not themselves affect that performance.)
Seth Teitler
Formerly UC Berkeley and U. Chicago
President of NAQT
Emeritus member of ACF
Post Reply