College quizbowl statistics database
Posted: Wed Jan 26, 2022 1:47 am
Inspired by the data analysis from the bonuses thread, I've scraped and compiled stats from the last five competition years of college quizbowl (including open tournaments listed as college tournaments on the DB). My hope is that this dataset enables both more research to better our understanding of how quizbowl works, as well as better record-keeping and preservation of quizbowl history.
The files are too large to attach, so I've linked them on GitHub: team stats and player stats.
What can you do with this dataset?
(in priority order)
The files are too large to attach, so I've linked them on GitHub: team stats and player stats.
What can you do with this dataset?
- Answer questions. Cody's analysis in the other thread is a good example of one type of question ("how often are games decided by bonus conversion?") that a dataset like this can answer. On a slightly different tack, I should have a post up soon about win probability using this dataset.
- Compile records for your school/friends/self. Or remember past tournaments.
- Tournament coverage. Right now I can only scrape stats from tournaments that are on the Tournament Database (which excludes ICTs and some SCTs). I also did not check for tournaments that were not listed as college tournaments; there are probably a few college tournaments from this time span that I've missed as a result.
- Some missing stats (order of finish for tournaments, tossups heard for SQBS tournaments). I'm currently only scraping the Scoreboard page, I would need to spend some time to set up the parser to extract that information from other pages if there's a demand for that.
(in priority order)
- Expanding the time range of tournaments covered. Currently the workflow right now consists of me manually entering tournament info (year, set, site, stats link) in a Google Sheet. I don't anticipate this is an easily-automatable task, so I would welcome people helping me out, and am willing to pay people for their time in helping compile tournament links. Contact me if you'd like to help!
- Doing team/player name rectification to allow for joining data across tournaments. This is also a very manual task that I would appreciate help with, although it will probably have to wait until after there are more years in the dataset.
- Creating a website to display historical stats, a la the Sports Reference websites or the ACF/NAQT statistics databases. This will definitely have to wait until after name rectification, but I think is a worthy final goal for this project.