Posted: Wed Jan 26, 2022
ryanrosenberg
Inspired by the data analysis from the bonuses thread, I've scraped and compiled stats from the last five competition years of college quizbowl (including open tournaments listed as college tournaments on the DB). My hope is that this dataset enables both more research to better our understanding of how quizbowl works, as well as better record-keeping and preservation of quizbowl history.

The files are too large to attach, so I've linked them on GitHub: team stats and player stats.

What can you do with this dataset?
  • Answer questions. Cody's analysis in the other thread is a good example of one type of question ("how often are games decided by bonus conversion?") that a dataset like this can answer. On a slightly different tack, I should have a post up soon about win probability using this dataset.
  • Compile records for your school/friends/self. Or remember past tournaments.
What are the limitations of the dataset?
  • Tournament coverage. Right now I can only scrape stats from tournaments that are on the Tournament Database (which excludes ICTs and some SCTs). I also did not check for tournaments that were not listed as college tournaments; there are probably a few college tournaments from this time span that I've missed as a result.
  • Some missing stats (order of finish for tournaments, tossups heard for SQBS tournaments). I'm currently only scraping the Scoreboard page, I would need to spend some time to set up the parser to extract that information from other pages if there's a demand for that.
What's next?
(in priority order)
  • Expanding the time range of tournaments covered. Currently the workflow right now consists of me manually entering tournament info (year, set, site, stats link) in a Google Sheet. I don't anticipate this is an easily-automatable task, so I would welcome people helping me out, and am willing to pay people for their time in helping compile tournament links. Contact me if you'd like to help!
  • Doing team/player name rectification to allow for joining data across tournaments. This is also a very manual task that I would appreciate help with, although it will probably have to wait until after there are more years in the dataset.
  • Creating a website to display historical stats, a la the Sports Reference websites or the ACF/NAQT statistics databases. This will definitely have to wait until after name rectification, but I think is a worthy final goal for this project.

Posted: Thu Jan 27, 2022
ryanrosenberg
Thanks to some assistance from Maxwell Ye and Harry White, and some suggestions from Dylan Bowman, I've now been able to scrape stats from the remaining college tournaments on the database. I'll work on cleaning them up and publishing them, hopefully over the weekend.

Posted: Thu Jan 27, 2022
Gene Harrogate
Thanks for this Ryan! I've had a lot of fun playing around with it and tarnishing the names of my enemies.

Posted: Sat Jan 29, 2022
ryanrosenberg
I've added the rest of the college tournaments on the database; there's now fairly complete coverage back to the 2011-12 season, and scattered tournaments before that. Team stats and player stats

With the bulk of data scraping done, I'd like to begin the process of cleaning and resolving team and player names. This is a task that requires a fair amount of manual work, but is both easier if you have a decent knowledge of quizbowl history and also more fun if that's the case IMO. If you're interested in volunteering, let me know and I'll add you to the organization server.