Gender in QuizDB Sets

Dormant threads from the high school sections are preserved here.
Locked
User avatar
QuestionCactus
Lulu
Posts: 60
Joined: Sat Nov 17, 2018 9:55 pm

Gender in QuizDB Sets

Post by QuestionCactus »

I recently read the thread Women in Sets, in which David Reinstein took two sets and counted the number of times an answerline contained a woman. I was curious as to whether a script could perform a similar enumeration using the entire QuizDB database, which would have the added benefit of observing how the distribution by gender varies by category and difficulty.

So to begin, I took all 91,070 tossups and dividing them into male/female/both/neither. I considered a tossup "male" if the tossup included any of {he, his} and a tossup "female" if it included any of {she, her, hers}. Results:

Code: Select all

Total: 91070 tossups

Male: 45558 (50.025255298122325%)
Female: 13119 (14.405402437685296%)
Both: 8929 (9.804545953662018%)
Neither: 41322 (45.3738882178544%)

Exclusively Male: 36629 (40.2207093444603%)
Exclusively Female: 4190 (4.600856484023279%)
This methodology is obviously imperfect and without any doubt contains both false positives and false negatives for both genders. Furthermore, unlike David's methodology, which counts tossups as male or female based on their answerline, this considers gendered pronouns that occur anywhere in the tossup. For example, here are some "both" tossups, with the relevant pronouns underlined:
This author wrote about Mary Cochran, whose father dies of a heart attack before he can tell her he loves her, in the short story "Unlighted Lamps." He also wrote a short story in which the narrator's father claims that, unlike Christopher Columbus, he can stand an egg on its end without cheating. In addition to "The Triumph of the Egg," he wrote a short story in which George has sex with Louise Trunnion in a berry field. That story, "Nobody Knows," appears in the same volume as a story about Adolph Myers, who changes his name to Wing Biddlebaum after being accused of molesting his students. That volume, his most famous work, is a short story cycle centered on George Willard. For 10 points, name this author of Winesburg, Ohio.
ANSWER: Sherwood Anderson

One of this author's short stories ends with the narrator saying of the title character, "A couple of years later I went away to college and I didn't know where the (swear word) she went." That story, "Nilda," is one of several stories by this author narrated by a fatherless New Jersey teenager, several of which were collected in Drown. One section of his most famous work is named after the three heartbreaks of the protagonist's mother, Belicia Cabral. In that novel, the protagonist's family suffers from the fuku curse and grapples with its experiences under Rafael Trujillo's regime. For 10 points, name this author who is best known for about an overweight, fantasy-obsessed Dominican-American, The Brief Wondrous Life of Oscar Wao.
ANSWER: Junot Diaz

At one point in this novel, Portia tells her father about a scam artist named B.F. Mason. One character in this novel starts using his wife Alice's perfume after she dies, and also takes up sewing. Another character is forced to take a job at a five-and-dime store to pay the hospital bills after her brother Bubber shoots another character. Jake Blount, Biff Brannon, Dr. Copeland, and Mick Kelly all confide in the protagonist, who commits suicide after Spiros Antonapoulos dies in an insane asylum. That protagonist is the deaf-mute John Singer. For 10 points, name this novel by Carson McCullers.
ANSWER: The Heart Is a Lonely Hunter

This artist designed a series of paired Corinthian columns for the dome of St. Peter's Basilica and a never-built pyramid for the tomb of Pope Julius II. That tomb contains his sculpture of a horned biblical figure with a large beard. In addition to Moses, he also carved a sculpture of Mary cradling Jesus in her lap, as well as a huge sculpture of a biblical figure holding a slingshot. For 10 points, name this sculptor of the Vatican Pieta and the huge marble David.
ANSWER: Michelangelo di Lodovico Buonarroti Simoni

This author's translations of Wang Wei, Du Fu, and Li Po were collected in the book Three Chinese Poets. His own poetic output includes a poem which begins, "Some men like Jack and some like Jill, I'm glad I like them both." That poem, "Dubious," appeared in his first poetry collection, Mappings. A translation by Charles Johnston inspired his novel in verse set in Francisco. He also wrote an epically long novel set in Nehru's India which follows Rupa Mehra's search for a husband for her daughter Lata. For 10 points, name this author of The Golden Gate and A Suitable Boy.
ANSWER: Vikram Seth
Four of these tossups have answerlines that are male but are marked "both" since they have female pronouns in them. Further, it's possible (though from my eyeballing of it, much less frequent) for a tossup marked as being exclusively one gender or the other to actually have an answerline of the other gender. Here's a random sampling of male and female tossups:
ANSWER: Daniel Webster
ANSWER: Archimedes of Syracuse
ANSWER: Thomas Woodrow Wilson
ANSWER: Oscar Fingal O'Flahertie Wills Wilde
ANSWER: A Modest Proposal For Preventing The Children of Poor People in Ireland From Being
ANSWER: Edwin Arlington Robinson
ANSWER: Odin
ANSWER: August Wilson
ANSWER: Henry Ford
ANSWER: Huckleberry Finn [or Huckleberry Finn]
ANSWER: McCarthy
ANSWER: Native Son
ANSWER: Langston Hughes
ANSWER: Jerry Seinfeld [or Jerry Seinfeld]
ANSWER: Athens
ANSWER: Stanley Milgram
ANSWER: irony
ANSWER: Republic of India [or Bharatiya Ganarajya]
ANSWER: van der Waals equation
ANSWER: "The Rime of the Ancient Mariner"
ANSWER: Battle of Antietam [or Battle of Sharpsburg]
ANSWER: Aldous Leonard Huxley
ANSWER: Hans Holbein the Younger [prompt on Holbein]
ANSWER: Aleksandr Isayevich Solzhenitsyn
ANSWER: Moby-Dick
ANSWER: Andrew Jackson
ANSWER: hero
ANSWER: the Rolling Stones
ANSWER: "The Fall of the House of Usher"
ANSWER: Duke of Wellington [or Arthur Wellesley, Duke of Wellington]
ANSWER: Aristotle
ANSWER: Samuel Johnson
ANSWER: Modern Family
ANSWER: Richard Milhous Nixon
ANSWER: Leon Trotsky [or Lev Davidovich Bronstein]
ANSWER: "The Outcasts of Poker Flat"
ANSWER: Soren Aabye Kierkegaard
ANSWER: Thomas Stearns Eliot
ANSWER: The Strange Case of Dr. Jekyll and Mr. Hyde
ANSWER: William Hogarth
ANSWER: Joe Biden
ANSWER: Francis Scott Key Fitzgerald
ANSWER: Arthur Asher Miller
ANSWER: George Herbert Walker Bush [prompt on Bush; accept any answer that correctly distinguishes
ANSWER: Prince William Arthur Philip Louis of Wales
ANSWER: James Abram Garfield
ANSWER: Benito Mussolini
ANSWER: Hugo Rafael Chavez Frias
ANSWER: Gene Simmons
ANSWER: William Butler Yeats
ANSWER: Willa Cather
ANSWER: Lady Windermere's Fan, A Play About a Good Woman
ANSWER: William Faulkner [or William Cuthbert Faulkner]
ANSWER: Samaritans [or Good Samaritans; or Kuthim until it is read]
ANSWER: Volpone
ANSWER: I Am Legend
ANSWER: USSR [or Soviet Union; or Union of Soviet Socialist Republics; or Soyuz Sovetskikh Sotsialisticheskikh Respublik; prompt on Russia]
ANSWER: The Prime of Miss Jean Brodie
ANSWER: Madame du Pompadour [or Jeanne Antoinette Poisson, Marquise du Pompadour]
ANSWER: Alain LeRoy Locke
ANSWER: Norma
ANSWER: Toni Morrison [or Chloe Ardelia Wofford]
ANSWER: Temple Drake (either name)
ANSWER: Nike [or Victory; or Athena Nike; prompt on "Athena"]
ANSWER: Sampo [or Sammas]
ANSWER: Willy's funeral [or Loman's funeral or the funeral in Death of a Salesman; accept "Requiem" before mentioned; prompt on funeral or equivalents throughout]
ANSWER: Sylvia Plath
ANSWER: being turned into stone or petrification [or clear-knowledge equivalents]
ANSWER: birds [Accept swans until "speech of these" and accept eagles after "Prometheus."]
ANSWER: Thais
ANSWER: Pre-Raphaelite Brotherhood [accept Pre-Raphaelites or PRB]
ANSWER: Aphrodite [accept Venus until "Dionysus"; do not accept or prompt thereafter]
ANSWER: horses [or stallions; or mares]
ANSWER: spears [or lances; or javelins; or polearms; accept naginata due to ambiguities with the Japan clue]
ANSWER: Queen Elizabeth I [prompt on “(Queen) Elizabeth”]
ANSWER: Ostrogoths [prompt on "Goths" until the last word]
ANSWER: spinning wheels [prompt on partial answer; or rouets; or spinnrade]
ANSWER: The Women of Algiers (in their apartment) [or Femmes d'Alger dans leus appartement]
ANSWER: Ulysses
ANSWER: Isabella I [or Isabella of Castile before mention]
ANSWER: Our Town
ANSWER: Piso's Conspiracy (Pisonian Conspiracy)
ANSWER: dragons [or serpents; prompt on snakes]
ANSWER: Emily [Elizabeth] Dickinson
ANSWER: Resurrection [or Voskreseniye]
ANSWER: Duke of Bilgewater and Dauphin of France [be generous, and accept King in place of "Dauphin"; accept in either order; prompt on descriptions like "conmen/swindlers from The Adventures of Huckleberry Finn"]
ANSWER: Corfu [or Corcyra or Kerkyra before "Corcyra"]
ANSWER: Seville [or Sevilla]
ANSWER: Adrienne Rich
ANSWER: Catherine of Aragon
ANSWER: Ellen Page
ANSWER: Roxane
ANSWER: Eleanor [accept any specific Eleanors, of course]
ANSWER: Calais
ANSWER: Minoan civilization
ANSWER: the Homeric Hymns
ANSWER: Wives of Mao Zedong [accept obvious equivalents]
ANSWER: Japan
ANSWER: Burger's Daughter
ANSWER: Judith
From this, I notice only Alain Locke and Ulysses as being "mis-gendered"; here are their tossups.
This man described becoming enraptured by a "lovesick swan that hath her pool forsaken" whose cheeks were "aflame /with the conquering glow of lofty shame / that hopeless love and longing brings" in the poem "The Moon Maiden." Thisperson's travelogues include "Oxford Contrasts" and "Impressions of Haifa," the latter of which was, like the essay "TheGospel for the Twentieth Century," inspired by a conversion to Baha'i. This writer commissioned Winold Reiss to createthe cover and illustrations for a work that grew out of a guest-editing job on the March 1925 issue of The Survey Graphic.This person wrote essays about "The Legacy of Ancestral Arts," how the "Youth Speaks," and spirituals in that work,whose second part, named for its title figure "in a New World," contains Walter (*) White's "The Paradox of Color" and anessay by Melville Herskovits. For 10 points, name this "Dean of the Harlem Renaissance," the editor of The New Negro.
ANSWER: Alain LeRoy Locke

One character in this novel praises amor matris between husband and wife but solemnly holds that "People do not knowhow dangerous lovesongs can be." A chapter in this novel progresses from a Latinate to an Anglo-Saxon to a Medieval to aRomantic and finally modern prose style, mirroring the nine-month gestation of human birth, in describing a dinner ofsardines and beer at the Holles Street maternity and the Gold Cup races. This novel, which uses the word "Chrysostomos"to describe the gold points in the perfect white teeth of one character, contains a (*) telegram reading "Nother dying comehome father." In an infamous scene in this novel, Gerty McDowell gradually reveals more of her legs while another charactermasturbates. It is capped off with the once-longest sentence of the English language, part of a soliloquy ending "yes I said yes Iwill yes." For 10 points, name this novel about one day in the life of Leopold Bloom, by James Joyce.
ANSWER: Ulysses
Despite these shortcomings, we'll continue taking a look at the data, but keep in mind when drawing conclusions that my references to "male tossups" and "female tossups" are really based on a selection algorithm of limited accuracy and sophistication.

Image

We see that a plurality of tossups are "neither" (and we'll see later that a plurality of those neither tossups are science tossups), and that just under 3.5 times as many tossups are male-or-both as are female-or-both, and that 8.7 times as many tossups are male as are female. The difference is not as extreme as David's 78/4 = 19.5 for Set A but is close to his 70/10 = 7 for Set B.

We can divide these into their component categories:

Image
Image
Image
Image
Image

Science, predictably, is disproportionately found in the neither category. As a result, any other category besides religion and geography becomes more likely to appear in a male tossup. This isn't the case for female tossups. Female tossups are virtually never science tossups, and are instead literature tossups at over twice the overall rate, and history tossups at close to half of the overall rate.

We can look at it from the other direction, noting the gender distribution in each of the big three categories:

Image
Image
Image

The most obvious observation is that science is "neither" 92% of the team, and otherwise is male, with almost no female presence. In fact, there are only 35 female science tossups. Their answerlines:
ANSWER: Bedbugs
ANSWER: Anaconda [accept: Eunectes; accept "Water Boa" until Boa is mentioned]
ANSWER: Magellanic clouds [prompt on "(dwarf) galaxies"; accept Large and Small Magellanic clouds or Magellanic Bridge/Stream]
ANSWER: Jane Goodall
ANSWER: catfish [prompt on "fish"]
ANSWER: Mercury  (Note: the Candian article is actually important, since my professor who used to work at BASF talked numerous times about how their process was affected)
ANSWER: Zika virus
ANSWER: chimpanzees [or Pan troglodytes; prompt on great apes before it is read; prompt on primates before "primatologist" is read]
ANSWER: Barbara McClintock
ANSWER: mercury
ANSWER: Curie [accept Frederic-Joliot Curie in the first sentence]
ANSWER: mercury [accept Hg before mentioned]
ANSWER: tobacco mosaic virus [accept TMV]
ANSWER: Eastman Kodak Company
ANSWER: matches [or matchsticks]Â
ANSWER: Lynn Margulis
ANSWER: fertility rates [prompt on birth rate or a description until mentioned; prompt on fertility] <LT>
ANSWER: dark matter <AG>
ANSWER: pregnancy [accept word forms]
ANSWER: Barbara McClintock
ANSWER: Cassiopeiae (prompt on "The Seated Queen")
ANSWER: antibodies [or antibody; or immunoglobulins; or Ig; prompt on "Ab"]
ANSWER: neutron stars (accept pulsars)
ANSWER: Y chromosome (prompt on just chromosome) <Park>
ANSWER: bugs [accept Heisenbugs] <Jose>
ANSWER: angular momentum
ANSWER: mercury (or Hg before end) <Hao> BONUSES
ANSWER: Herschel [accept Caroline Herschel or William Herschel] <Jose, Science - Astronomy>
ANSWER: Marie Skłodowska-Curie (prompt on Curie; prompt on Madame Curie)
ANSWER: Jane Goodall
ANSWER: Jane Morris Goodall
ANSWER: Grace Brewster Murray Hopper
ANSWER: chimpanzees [or Pan troglodytes; prompt on great apes before it is read; prompt on primates before "primatologist" is read]
ANSWER: rabies virus
ANSWER: compilers
If I list the actual women (and not just pronouns like the one in "these insects reproduce when the penis pierces the skin of the female and deposits sperm directly into her body") I see Henrietta Leavitt, Jane Goodall, some sort of legendary Japanese catfish-goddess, Mary, Barbara McClintock, Karen Wetterhahn, Marie Curie, Karen Wetterhahn again, Rosalind Franklin, "the Kodak Girl," the Little Match Girl (from the Hans Christian Anderson story), Lynn Margulis, Lisa Randall, Barbara McClintock again, Cassiopeia (as a giveaway for the constellation), Jocelyn Bell, Grace Hopper, Karen Wetterhahn a third time, Caroline Herschel, Marie Curie again, Jennifer Williams and Anne Pusey, Jane Goodall again, Jane Goodall yet again, Grace Hopper again, Beatrix Gardner, Jane Goodall again, and Jeanna Giese.

Also, if you're curious, these are the answerlines of the nine "both" science tossups:
ANSWER: debugging [or bug finding]
ANSWER: Franz Anton Mesmer
ANSWER: Q-tip [accept cotton swab before the end; accept Baby Gays before mentioned]
ANSWER: Curie [accept Pierre Curie; accept Marie Skłodowska Curie; accept (Paul-) Jacques Curie]
ANSWER: anglerfish
ANSWER: Julius Robert Oppenheimer
ANSWER: Louis Agassiz
ANSWER: IVF [or in vitro fertilization]
ANSWER: global warming [accept any answer indicating that temperature increased like "climate change"]
We see further that both literature and history are both about half male, but that there's a striking difference in the female percentage. A literature tossup is 6.6 times more likely to contain a female pronoun than a history tossup is and 3.8 times as likely to contain female pronouns but not male pronouns.

We can see if this gender distribution stays consistent across difficulties:

Image

We see that the female tossup percentage is very consistent at its roughly 5% value, whereas "both" and "male" trend upward and "neither" trends downwards.

We can also sort all of the tournaments in the database based on their female/male tossup ratio:

Code: Select all

Tournaments with the Highest Female/Male Ratio:
[('2014 Geography Monstrosity', 0.5, '2 female', '4 male'),
 ('2012 YMIR', 0.3466666666666667, '26 female', '75 male'),
 ('2005 Teitler Myth Singles', 0.3262411347517731, '46 female', '141 male'),
 ('2016 A Bit of Lit', 0.3125, '5 female', '16 male'),
 ('2017 Scattergories', 0.29347826086956524, '27 female', '92 male'),
 ('2017 MASSOLIT', 0.2857142857142857, '24 female', '84 male'),
 ('2017 Penn Bowl', 0.27380952380952384, '23 female', '84 male'),
 ('2010 MELD', 0.25287356321839083, '22 female', '87 male'),
 ('2010 Wild Kingdom', 0.25, '3 female', '12 male'),
 ('2012 RAVE', 0.25, '9 female', '36 male'),
 ('2014 SCOP Novice', 0.2459016393442623, '15 female', '61 male'),
 ('2017 Prison Bowl X', 0.24175824175824176, '22 female', '91 male'),
 ('GRAB BAG', 0.2413793103448276, '14 female', '58 male'),
 ('2008 Chicago Open Literature', 0.24031007751937986, '31 female', '129 male'),
 ('2016 Delta Burke', 0.23809523809523808, '5 female', '21 male'),
 ('2014 Mavis Gallant Memorial', 0.23529411764705882, '8 female', '34 male'),
 ('2010 Collaborative MS Tournament', 0.23404255319148937, '11 female', '47 male'),
 ('2015 Penn Bowl', 0.22580645161290322, '21 female', '93 male'),
 ('GRAPHIC', 0.22058823529411764, '15 female', '68 male'),
 ('2017 RMBCT', 0.21978021978021978, '20 female', '91 male'),
 ('2009 Minnesota Open Lit', 0.21518987341772153, '34 female', '158 male'),
 ('2014 New Trier Scobol Solo', 0.21428571428571427, '3 female', '14 male'),
 ('2017 Jordaens Visual Arts', 0.21348314606741572, '19 female', '89 male'),
 ('2014 Gorilla Lit', 0.2127659574468085, '20 female', '94 male'),
 ('2015 RILKE', 0.2125, '17 female', '80 male'),
 ('2012 SCOP Novice', 0.21153846153846154, '11 female', '52 male'),
 ('2012 KABO', 0.20987654320987653, '17 female', '81 male'),
 ("2017 It's Lit", 0.20689655172413793, '18 female', '87 male'),
 ('2016 SCOP MS 6', 0.20408163265306123, '10 female', '49 male'),
 ('2012 WELD', 0.2032520325203252, '25 female', '123 male'),
 ('2018 PACE NSC', 0.20108695652173914, '37 female', '184 male'),
 ('2015 Chicago Open Visual Arts', 0.2, '14 female', '70 male'),
 ('2015 BHSAT', 0.2, '17 female', '85 male'),
 ('2017 XENOPHON', 0.2, '3 female', '15 male'),
 ('2018 Chicago Open', 0.2, '44 female', '220 male'),
 ('2008 MUT', 0.1981981981981982, '22 female', '111 male'),
 ('2013 Collaborative MS Tournament', 0.19672131147540983, '12 female', '61 male'),
 ('2016 MLK', 0.19327731092436976, '23 female', '119 male'),
 ('2010 BELFAST Arts', 0.1896551724137931, '11 female', '58 male'),
 ('2016 Penn Bowl', 0.18867924528301888, '20 female', '106 male'),
 ('2016 Listory', 0.1875, '21 female', '112 male'),
 ('2011 SCOP Novice', 0.1875, '12 female', '64 male'),
 ('2011 HSAPQ VHSL States', 0.1797752808988764, '16 female', '89 male'),
 ('2011 HSAPQ VHSL Districts', 0.1780821917808219, '13 female', '73 male'),
 ('2014 ICCS', 0.17777777777777778, '16 female', '90 male'),
 ('2017 EMT', 0.175, '21 female', '120 male'),
 ('2014 DEES', 0.1746031746031746, '22 female', '126 male'),
 ("2013 Schindler's Lit", 0.17391304347826086, '12 female', '69 male'),
 ('2018 SMT', 0.17094017094017094, '20 female', '117 male'),
 ('2017 BHSAT', 0.1686746987951807, '14 female', '83 male')]

Code: Select all

Tournaments with the Lowest Female/Male Ratio:
[('Lederberg Memorial Science Tournament 2: Daughter Cell', 0.0, '0 female', '15 male'),
 ('2013 Scobol Solo', 0.0, '0 female', '1 male'),
 ('2014 ACF Fall', 0.0, '0 female', '12 male'),
 ('2011 Geography Monstrosity', 0.0, '0 female', '11 male'),
 ('2012 Geography Monstrosity', 0.0, '0 female', '5 male'),
 ('2016 Geography Monstrosity', 0.0, '0 female', '7 male'),
 ('2013 Geography Monstrosity', 0.0, '0 female', '1 male'),
 ('2015 Geography Monstrosity', 0.0, '0 female', '3 male'),
 ('2017 Geography Monstrosity', 0.0, '0 female', '5 male'),
 ('2017 Math Monstrosity', 0.0, '0 female', '30 male'),
 ('2015 Claude Shannon Memorial Tournament', 0.0, '0 female', '2 male'),
 ('2011 Illinois Wissenschaftslehre', 0.0, '0 female', '29 male'),
 ('2017 JAKOB', 0.014705882352941176, '1 female', '68 male'),
 ('2008 NTV', 0.017857142857142856, '1 female', '56 male'),
 ('2013 Delta Burke', 0.018691588785046728, '2 female', '107 male'),
 ('2013 Cheyne American History', 0.02564102564102564, '1 female', '39 male'),
 ('2009 VCU Open', 0.0297029702970297, '3 female', '101 male'),
 ('2012 Peaceful Resolution', 0.030120481927710843, '5 female', '166 male'),
 ('2016 A Culture of Improvement', 0.03225806451612903, '3 female', '93 male'),
 ('2013 Arrabal', 0.033707865168539325, '3 female', '89 male'),
 ('2011 Chicago Open History', 0.0425531914893617, '4 female', '94 male'),
 ('2008 RMP Fest', 0.042735042735042736, '5 female', '117 male'),
 ('2014 Cheyne American History People', 0.044444444444444446, '2 female', '45 male'),
 ('2013 ACF Regionals', 0.0457516339869281, '7 female', '153 male'),
 ('2011 Cheyne American History', 0.046875, '3 female', '64 male'),
 ('2012 Maggie Walker GSAC', 0.04861111111111111, '7 female', '144 male'),
 ('2015 We Have Never Been Modern', 0.05, '4 female', '80 male'),
 ('2012 Penn Bowl', 0.05, '6 female', '120 male'),
 ('2009 Prison Bowl', 0.05217391304347826, '6 female', '115 male'),
 ('2009 U of Georgia CCC', 0.05263157894736842, '1 female', '19 male'),
 ('2016 BHSAT', 0.05263157894736842, '1 female', '19 male'),
 ('2017 lIST VI', 0.05309734513274336, '6 female', '113 male'),
 ('2011 Maggie Walker GSAC', 0.05309734513274336, '6 female', '113 male'),
 ('2014 Cane Ridge Revival', 0.05343511450381679, '7 female', '131 male'),
 ('2017 Letras', 0.05555555555555555, '1 female', '18 male'),
 ('2010 VCU Open (Sunday)', 0.055944055944055944, '8 female', '143 male'),
 ('2011 HSAPQ Tournament 16', 0.056451612903225805, '7 female', '124 male'),
 ('2011 BHSAT', 0.05714285714285714, '6 female', '105 male'),
 ('2013 Chicago Open', 0.0582010582010582, '11 female', '189 male'),
 ('2008 ACF Fall', 0.05952380952380952, '10 female', '168 male'),
 ('2010 Harvard International', 0.05982905982905983, '7 female', '117 male'),
 ('2013 Brookwood Invitational Scholars Bowl', 0.06086956521739131, '7 female', '115 male'),
 ('2014 ACF Nationals', 0.0611353711790393, '14 female', '229 male'),
 ('2013 Prison Bowl', 0.06140350877192982, '7 female', '114 male'),
 ('2010 ANGST', 0.061946902654867256, '7 female', '113 male'),
 ('2014 SUBMIT', 0.06206896551724138, '9 female', '145 male'),
 ('2010 Minnesota Open', 0.062111801242236024, '10 female', '161 male'),
 ('2010 Princeton Buzzerfest', 0.06299212598425197, '8 female', '127 male'),
 ('2011 PACE NSC', 0.06324110671936758, '16 female', '253 male'),
 ('2012 Illinois Fall Tournament', 0.06338028169014084, '9 female', '142 male')]
The first part in each row names the tournament and is followed by the female/male ratio and lastly the specific number of female and male tossups.

Observe that some of the tournaments for whatever reason contain extremely few tossups and that this decreases the accuracy if the rankings; it would be akin to declaring a baseball player who went up to bat once and hit the ball to be the greatest batter in the game because he's batting 1.000. Given that that the overall female/male ratio is 0.114390237243714 and the average number of male tossups + female tossups per tournament is 125.59692307692308, I can weight the average in favor of tournaments with more tossups by changing the formula from female/male to (female + 12.892311)/(male + 112.704612077). There's possibly a more sensible way to do this and I'm vaguely aware of something called a beta distribution that might be relevant, but for now let's look at this:

Code: Select all

Tournaments with the Highest Adjusted Female/Male Ratio:
[('2005 Teitler Myth Singles', 0.23212944580655095, '46 female', '141 male'),
 ('2012 YMIR', 0.2071995491727483, '26 female', '75 male'),
 ('2017 Scattergories', 0.19487744118336933, '27 female', '92 male'),
 ('2017 MASSOLIT', 0.18755183526433286, '24 female', '84 male'),
 ('2017 Penn Bowl', 0.1824680703772719, '23 female', '84 male'),
 ('2008 Chicago Open Literature', 0.1815948426586796, '31 female', '129 male'),
 ('2010 MELD', 0.1747196053065945, '22 female', '87 male'),
 ('2009 Minnesota Open Lit', 0.17322316986110978, '34 female', '158 male'),
 ('2017 Prison Bowl X', 0.17128876290150352, '22 female', '91 male'),
 ('2018 Chicago Open', 0.17099946599728244, '44 female', '220 male'),
 ('2018 PACE NSC', 0.168154821223514, '37 female', '184 male'),
 ('2015 Penn Bowl', 0.1647620374564734, '21 female', '93 male'),
 ('2017 RMBCT', 0.1614706248652179, '20 female', '91 male'),
 ('2012 WELD', 0.16076185640195, '25 female', '123 male'),
 ('2014 SCOP Novice', 0.16057323214674268, '15 female', '61 male'),
 ('2014 Gorilla Lit', 0.15912712672200663, '20 female', '94 male'),
 ('2017 Jordaens Visual Arts', 0.1581139403387823, '19 female', '89 male'),
 ('GRAB BAG', 0.1575371085338318, '14 female', '58 male'),
 ('2008 MUT', 0.1559749290640013, '22 female', '111 male'),
 ('2015 RILKE', 0.155119852492455, '17 female', '80 male'),
 ('2016 MLK', 0.15490546639646638, '23 female', '119 male'),
 ("2017 It's Lit", 0.15469002282275218, '18 female', '87 male'),
 ('GRAPHIC', 0.1543530664735597, '15 female', '68 male'),
 ('2012 KABO', 0.15431904630188897, '17 female', '81 male'),
 ('2015 BHSAT', 0.15119683190980818, '17 female', '85 male'),
 ('2016 Listory', 0.15083050893671043, '21 female', '112 male'),
 ('2016 Penn Bowl', 0.1503960556095612, '20 female', '106 male'),
 ('2010 Collaborative MS Tournament', 0.14960313724991586, '11 female', '47 male'),
 ('2012 RAVE', 0.14722012111274702, '9 female', '36 male'),
 ('2015 Chicago Open Visual Arts', 0.14719010480516148, '14 female', '70 male'),
 ('2014 DEES', 0.14617359378353625, '22 female', '126 male'),
 ('2017 EMT', 0.14564520529909103, '21 female', '120 male'),
 ('2012 SCOP Novice', 0.14506157841427209, '11 female', '52 male'),
 ('2013 NASAT', 0.1446467770708396, '28 female', '170 male'),
 ('2008 FICHTE', 0.14423923013918608, '25 female', '150 male'),
 ('2013 Collaborative MS Tournament', 0.14330253355026468, '12 female', '61 male'),
 ('2011 HSAPQ VHSL States', 0.14324070581475087, '16 female', '89 male'),
 ('2018 SMT', 0.14319395114702385, '20 female', '117 male'),
 ('2014 ICCS', 0.14253405832238725, '16 female', '90 male'),
 ('2014 Mavis Gallant Memorial', 0.14241073067992147, '8 female', '34 male'),
 ('2016 SCOP MS 6', 0.14156869557374907, '10 female', '49 male'),
 ('2011 SCOP Novice', 0.1408696168561409, '12 female', '64 male'),
 ('2010 BELFAST Arts', 0.13996289092190936, '11 female', '58 male'),
 ('2011 HSAPQ VHSL Districts', 0.1394273987619871, '13 female', '73 male'),
 ('2016 A Bit of Lit', 0.1390184136470229, '5 female', '16 male'),
 ('2009 Mahfouz Memorial Lit', 0.138579841720442, '17 female', '103 male'),
 ('2018 MBAT', 0.13843638007070203, '16 female', '96 male'),
 ('2012 ANFORTAS', 0.138238723157193, '19 female', '118 male'),
 ('2015 GSAC XXIII', 0.1380942069686375, '18 female', '111 male'),
 ('2017 HFT XII', 0.1380942069686375, '18 female', '111 male')]

Code: Select all

Tournaments with the Lowest Adjusted Female/Male Ratio:
[('2012 Peaceful Resolution', 0.06419811594311452, '5 female', '166 male'),
 ('2013 Delta Burke', 0.06778333353685213, '2 female', '107 male'),
 ('2009 VCU Open', 0.07436578389929102, '3 female', '101 male'),
 ('2013 ACF Regionals', 0.07486626161474118, '7 female', '153 male'),
 ('2017 JAKOB', 0.07687856353151823, '1 female', '68 male'),
 ('2016 A Culture of Improvement', 0.07725792260822592, '3 female', '93 male'),
 ('2012 Maggie Walker GSAC', 0.07749105416942485, '7 female', '144 male'),
 ('2008 RMP Fest', 0.07789269374357301, '5 female', '117 male'),
 ('2014 ACF Nationals', 0.07870046247412096, '14 female', '229 male'),
 ('2013 Arrabal', 0.0787900228772814, '3 female', '89 male'),
 ('2011 PACE NSC', 0.07900450266653092, '16 female', '253 male'),
 ('2013 Chicago Open', 0.07919106981998104, '11 female', '189 male'),
 ('2012 Penn Bowl', 0.0811858038883591, '6 female', '120 male'),
 ('2008 ACF Fall', 0.0815530276849189, '10 female', '168 male'),
 ('2014 Cane Ridge Revival', 0.08162468010131421, '7 female', '131 male'),
 ('2010 VCU Open (Sunday)', 0.08170486574449712, '8 female', '143 male'),
 ('2011 Chicago Open History', 0.08172198399572918, '4 female', '94 male'),
 ('2008 NTV', 0.08234695441319224, '1 female', '56 male'),
 ('2009 Prison Bowl', 0.08296850392126193, '6 female', '115 male'),
 ('2010 Minnesota Open', 0.08363874772252583, '10 female', '161 male'),
 ('2017 lIST VI', 0.08370369938898199, '6 female', '113 male'),
 ('2011 Maggie Walker GSAC', 0.08370369938898199, '6 female', '113 male'),
 ('2011 HSAPQ Tournament 16', 0.08403854418150937, '7 female', '124 male'),
 ('2008 HSAPQ 4Q 1', 0.08421544797980024, '11 female', '171 male'),
 ('2011 HSAPQ National History Bowl', 0.08481334694466902, '11 female', '169 male'),
 ('2014 SUBMIT', 0.08495118043699877, '9 female', '145 male'),
 ('2012 Illinois Fall Tournament', 0.08595176515053334, '9 female', '142 male'),
 ('2012 BARGE', 0.0862905519169499, '9 female', '141 male'),
 ('2010 Harvard International', 0.08659952806403311, '7 female', '117 male'),
 ('2011 BHSAT', 0.08677956254467395, '6 female', '105 male'),
 ('2010 Princeton Buzzerfest', 0.08715856911959954, '8 female', '127 male'),
 ('2013 Brookwood Invitational Scholars Bowl', 0.08736015849021657, '7 female', '115 male'),
 ('2015 We Have Never Been Modern', 0.08765909034522874, '4 female', '80 male'),
 ('2013 Prison Bowl', 0.08774550644449879, '7 female', '114 male'),
 ('2014 NASAT', 0.08796828682854953, '14 female', '193 male'),
 ('2010 ANGST', 0.08813426902066876, '7 female', '113 male'),
 ('2014 Chicago Open', 0.08825971163433301, '11 female', '158 male'),
 ('2015 VCU Open', 0.0898313364421802, '9 female', '131 male'),
 ('2011 Cheyne American History', 0.08993716017482803, '3 female', '64 male'),
 ('2012 ACF Fall', 0.09020146264486514, '9 female', '130 male'),
 ('2012 College History Bowl', 0.09020146264486514, '9 female', '130 male'),
 ('2008 ACF Nationals', 0.09023214364369586, '10 female', '141 male'),
 ('2017 Math Monstrosity', 0.09034263723055859, '0 female', '30 male'),
 ('2016 PACE NSC', 0.09038976146192963, '17 female', '218 male'),
 ('2008 HSAPQ NSC 2', 0.09060255265093203, '11 female', '151 male'),
 ('2013 Maggie Walker GSAC', 0.09095094111863872, '9 female', '128 male'),
 ('2011 St. Anselms and Torrey Pines', 0.0909551509274823, '7 female', '106 male'),
 ('2011 Illinois Wissenschaftslehre', 0.0909801791983632, '0 female', '29 male'),
 ('2015 SHEIKH', 0.09137294249404457, '7 female', '105 male'),
 ('2013 Cheyne American History', 0.09157474390395426, '1 female', '39 male')]
I have neither data nor a strong intuition about the composition or construction of these sets, so I'm not sure what conclusions to draw. Just by inspection, it looks like more recent sets appear more frequently in the first list than the second. In any case I'll stress that I don't endorse any witch-hunting or wild speculation or unreasonable conclusions based on this data.

With that in mind, we can move into some interesting diversions. Using a little more natural-language processing, I looked for all the instances where the words "he" or "she" were followed by a verb; this can allow us to have some sort of vague insight into what men and women are doing in tossups. Furthermore, by subtracting the rate at which a given verb appears after "he" from the rate at which the same verb appears after "she," we can get a sense of the skew of any given action. The results are somewhat interesting to browse through:

Code: Select all

Strongest Male Skew:
[('father', 4.310006302186993e-06),
 ('got', 4.4696361652309556e-06),
 ('fail', 4.549451096752937e-06),
 ('give', 4.549451096752938e-06),
 ('began', 4.629266028274918e-06),
 ('tri', 4.629266028274918e-06),
 ('gain', 4.629266028274919e-06),
 ('goe', 4.709080959796899e-06),
 ('came', 4.788895891318881e-06),
 ('propos', 4.8687108228408625e-06),
 ('said', 5.108155617406806e-06),
 ('publish', 5.427415343494732e-06),
 ('refus', 5.666860138060676e-06),
 ('attempt', 5.826490001104639e-06),
 ('show', 5.826490001104639e-06),
 ('work', 6.065934795670582e-06),
 ('term', 6.305379590236527e-06),
 ('receiv', 6.46500945328049e-06),
 ('order', 6.5448243848024705e-06),
 ('found', 6.704454247846433e-06),
 ('serv', 6.784269179368415e-06),
 ('sign', 7.0237139739343595e-06),
 ('want', 7.263158768500303e-06),
 ('depict', 7.74204835763219e-06),
 ('believ', 7.901678220676154e-06),
 ('help', 8.22093794676408e-06),
 ('kill', 9.258532056549838e-06),
 ('die', 9.258532056549838e-06),
 ('lost', 9.49797685111578e-06),
 ('design', 9.737421645681725e-06),
 ('gave', 9.97686644024767e-06),
 ('compos', 1.0615385892423519e-05),
 ('led', 1.0934645618511446e-05),
 ('becam', 1.1014460550033426e-05),
 ('doe', 1.117409041307739e-05),
 ('defeat', 1.1493350139165315e-05),
 ('made', 1.181260986525324e-05),
 ('argu', 1.2052054659819183e-05),
 ('took', 1.2131869591341165e-05),
 ('paint', 1.3089648769604941e-05),
 ('appear', 1.3648353290258812e-05),
 ('did', 1.524465192069844e-05),
 ('creat', 1.6920765482660044e-05),
 ('call', 2.4982073566380163e-05),
 ('use', 2.5381148223990068e-05),
 ('ha', 4.381839740556777e-05),
 ('wrote', 7.96553016589374e-05),
 ('had', 8.069289576872315e-05),
 ('is', 0.00017583229414292492),
 ('wa', 0.0002423181321007354)]

Code: Select all

Strongest Female Skew:
[('spill', -2.3944479456594406e-07),
 ('poison', -1.5962986304396272e-07),
 ('cofound', -1.596298630439627e-07),
 ('wed', -1.596298630439627e-07),
 ('have', -1.596298630439627e-07),
 ('protect', -1.596298630439627e-07),
 ('tumbl', -1.596298630439627e-07),
 ('divorc', -1.596298630439627e-07),
 ('strip', -1.596298630439627e-07),
 ('incur', -1.596298630439627e-07),
 ('cure', -7.981493152198137e-08),
 ('conceiv', -7.981493152198137e-08),
 ('beg', -7.981493152198137e-08),
 ('are', -7.981493152198135e-08),
 ('brush', -7.981493152198135e-08),
 ('effect', -7.981493152198135e-08),
 ('piti', -7.981493152198135e-08),
 ('confront', -7.981493152198135e-08),
 ('worship', -7.981493152198135e-08),
 ('sort', -7.981493152198135e-08),
 ('cooper', -7.981493152198135e-08),
 ('evad', -7.981493152198135e-08),
 ('overrul', -7.981493152198135e-08),
 ('blush', -7.981493152198135e-08),
 ('menac', -7.981493152198135e-08),
 ('loan', -7.981493152198135e-08),
 ('consumm', -7.981493152198135e-08),
 ('waslur', -7.981493152198135e-08),
 ('arch', -7.981493152198135e-08),
 ('suckl', -7.981493152198135e-08),
 ('narrowlyavoid', -7.981493152198135e-08),
 ('whisk', -7.981493152198135e-08),
 ('mutter', -7.981493152198135e-08),
 ('offend', -7.981493152198135e-08),
 ('wasdestin', -7.981493152198135e-08),
 ('mail', -7.981493152198135e-08),
 ('past', -7.981493152198135e-08),
 ('keep', -7.981493152198135e-08),
 ('cancel', -7.981493152198135e-08),
 ('care', -7.981493152198135e-08),
 ('desert', -7.981493152198135e-08),
 ('wait', -7.981493152198135e-08),
 ('faint', -7.981493152198135e-08),
 ('kindl', -7.981493152198135e-08),
 ('plagiar', -7.981493152198135e-08),
 ('press', -7.981493152198135e-08),
 ('isbear', -7.981493152198135e-08),
 ('waskil', -7.981493152198135e-08),
 ('twist', -7.981493152198135e-08),
 ('incub', -7.981493152198135e-08)]
Note: I ran all the verbs through a stemmer, which for example will reduce playing, plays, and played to just "play." Some of the words have been converted to stems which aren't actually words; tries, tried, trying, etc. all became "tri" and proposed, proposes, proposing, etc. all become propos.

We can examine the same sort of skew metric for nouns that follow either "his" or "her":

Code: Select all

Greatest Male Skew:
[('master', 1.237131438590711e-05),
 ('mistress', 1.2850203975038996e-05),
 ('arm', 1.300983383808296e-05),
 ('companion', 1.3169463701126923e-05),
 ('role', 1.3329093564170884e-05),
 ('charact', 1.3329093564170887e-05),
 ('uncl', 1.3887798084824756e-05),
 ('student', 1.41272428793907e-05),
 ('capit', 1.4446502605478624e-05),
 ('right', 1.5164836989176458e-05),
 ('armi', 1.5244651920698438e-05),
 ('hors', 1.588317137287429e-05),
 ('“', 1.6681320688094103e-05),
 ('enemi', 1.6681320688094103e-05),
 ('reign', 1.6681320688094103e-05),
 ('oppon', 1.6761135619616084e-05),
 ('nation', 1.7160210277225992e-05),
 ('career', 1.9155583565275525e-05),
 ('theori', 1.9235398496797505e-05),
 ('time', 2.0751882195715153e-05),
 ('sister', 2.1550031510934966e-05),
 ('famili', 2.1629846442456943e-05),
 ('hand', 2.1709661373978927e-05),
 ('collect', 2.2028921100066852e-05),
 ('home', 2.2108736031588832e-05),
 ('opera', 2.242799575767676e-05),
 ('predecessor', 2.242799575767676e-05),
 ('stori', 2.298670027833063e-05),
 ('rival', 2.410410931963837e-05),
 ('essay', 2.418392425116035e-05),
 ('love', 2.5939852744643937e-05),
 ('head', 3.376171603379811e-05),
 ('life', 3.479931014358387e-05),
 ('play', 3.942857617185879e-05),
 ('book', 4.58137706936173e-05),
 ('daughter', 4.6212845351227194e-05),
 ('paint', 4.9964147132760326e-05),
 ('death', 5.2278780146897783e-05),
 ('name', 5.674841631212874e-05),
 ('brother', 6.409139001215103e-05),
 ('mother', 6.624639316324452e-05),
 ('namesak', 6.744361713607424e-05),
 ('novel', 6.76032469991182e-05),
 ('poem', 6.856102617738198e-05),
 ('friend', 6.896010083499189e-05),
 ('countri', 7.654251932958011e-05),
 ('father', 9.202661604484452e-05),
 ('son', 0.00010072644358074048),
 ('work', 0.0001605078272907045),
 ('wife', 0.00023170274620831188)]

Code: Select all

Greatest Female Skew:
[('husband', -0.00013137537728518132),
 ('boyfriend', -4.629266028274919e-06),
 ('breast', -3.9907465760990674e-06),
 ('dress', -3.432042055445198e-06),
 ('child', -3.2724121924012357e-06),
 ('babi', -3.1127823293572732e-06),
 ('suitor', -3.112782329357273e-06),
 ('hair', -2.3146330141374594e-06),
 ('fianc', -2.0751882195715154e-06),
 ('maid', -1.8357434250055709e-06),
 ('pregnanc', -1.4366687673956642e-06),
 ('bath', -1.2770389043517017e-06),
 ('womb', -1.2770389043517015e-06),
 ('wed', -1.277038904351701e-06),
 ('skirt', -1.1174090413077389e-06),
 ('look', -8.779642467417949e-07),
 ('lap', -8.779642467417946e-07),
 ('stepson', -7.981493152198136e-07),
 ('girdl', -7.981493152198135e-07),
 ('vagina', -7.981493152198135e-07),
 ('twelfth', -7.981493152198134e-07),
 ('attend', -7.183343836978322e-07),
 ('engag', -5.587045206538698e-07),
 ('sikh', -5.587045206538694e-07),
 ('necklac', -5.587045206538694e-07),
 ('rapist', -5.587045206538694e-07),
 ('collarbon', -5.587045206538694e-07),
 ('loom', -5.587045206538694e-07),
 ('bosom', -4.788895891318881e-07),
 ('maiden', -4.788895891318881e-07),
 ('ex-husband', -4.788895891318881e-07),
 ('wedge-head', -4.788895891318881e-07),
 ('virgin', -3.990746576099069e-07),
 ('milk', -3.990746576099068e-07),
 ('mirror', -3.990746576099068e-07),
 ('specialti', -3.990746576099068e-07),
 ('”', -3.9907465760990675e-07),
 ('abductor', -3.9907465760990675e-07),
 ('crotch', -3.9907465760990675e-07),
 ('puriti', -3.9907465760990675e-07),
 ('ostrich', -3.9907465760990675e-07),
 ('apron', -3.9907465760990675e-07),
 ('ex-boyfriend', -3.9907465760990675e-07),
 ('shawl', -3.9907465760990675e-07),
 ('speak', -3.9907465760990675e-07),
 ('hall', -3.192597260879255e-07),
 ('handkerchief', -3.1925972608792543e-07),
 ('male', -3.1925972608792543e-07),
 ('morphin', -3.192597260879254e-07),
 ('veil', -3.192597260879254e-07)]
It looks like the nouns were more distinguished between the genders than the verbs were.
Arjun Panickssery (he/his)
President, American Quizbowl League
User avatar
Stained Diviner
Auron
Posts: 5085
Joined: Sun Jun 13, 2004 6:08 am
Location: Chicagoland
Contact:

Re: Gender in QuizDB Sets

Post by Stained Diviner »

Thanks for doing this. This is some cool data.

The data certainly is not perfect. There are more than two mis-gendered tossups in the group you listed, such as Faulkner and books written by men. Also, some of the set data is off--it says the 2013 Scobol Solo used only one gendered pronoun, which is way off. It is possible that this is due to some sets being only partially in the database rather than anything you did wrong. There's also the fact that your method is simplistic, which you freely admit and I can't blame you for, since it's better than my method, and you weren't getting paid for this.

Another issue is that writers try to avoid female pronouns more than male pronouns, which could skew the data. Also, writers are more willing to use gendered pronouns for a clue, which is less important, as compared to an answer, which is more important. ("One of this author's characters dances with her future husband at Netherfield" is less transparent than "This author wrote about one of her characters dancing at Netherfield", though to be honest neither one is horribly transparent.)

All that being said, my sense is that your numbers are reasonably representative of trends. The nouns that skew female is a scary list that might be the most fixable aspect that a writer can take from this. It looks like even the questions that have women in them couldn't pass the Bechdel test, and that's a bad thing. It's not surprising that the female list would skew towards female things, but it's weird that things like arms, roles, characters, and a whole bunch of other normal things skew male. The main part of your data, which shows the overall underrepresentation of women, gives additional evidence of a problem that was already known that hopefully writers and editors will continue to try to improve upon.
David Reinstein
Head Writer and Editor for Scobol Solo, Masonics, and IESA; TD for Scobol Solo and Reinstein Varsity; IHSSBCA Board Member; IHSSBCA Chair (2004-2014); PACE President (2016-2018)
Hypersmart
Lulu
Posts: 23
Joined: Wed Feb 17, 2016 4:02 am
Contact:

Re: Gender in QuizDB Sets

Post by Hypersmart »

This was very interesting. Thank you for doing this!
Veer R
_____________________________
Somewhere
User avatar
Aaron's Rod
Sec. of Cursed Images, Chicago SJW Cabal
Posts: 851
Joined: Wed Nov 27, 2013 7:29 pm

Re: Gender in QuizDB Sets

Post by Aaron's Rod »

This is really cool! Thanks for sharing!

A couple of ideas for extensions or next steps, should you be interested in continuing to work on this project:
QuestionCactus wrote: Thu Jul 04, 2019 1:13 pm
Note: I ran all the verbs through a stemmer, which for example will reduce playing, plays, and played to just "play." Some of the words have been converted to stems which aren't actually words; tries, tried, trying, etc. all became "tri" and proposed, proposes, proposing, etc. all become propos.
This is a matter of preference, but you might consider using a lemmatizer instead for more readable results.
QuestionCactus wrote: Thu Jul 04, 2019 1:13 pm Using a little more natural-language processing, I looked for all the instances where the words "he" or "she" were followed by a verb; this can allow us to have some sort of vague insight into what men and women are doing in tossups. Furthermore, by subtracting the rate at which a given verb appears after "he" from the rate at which the same verb appears after "she," we can get a sense of the skew of any given action.
I found the gendered verbs and nouns particularly interesting. I assume you were using part-of-speech tagging for this? If so, it's not too much of a stretch at all to extend this to named-entity recognition.
Deviant Insider wrote: Mon Jul 08, 2019 7:49 pm Also, writers are more willing to use gendered pronouns for a clue, which is less important, as compared to an answer, which is more important.
Especially if you share that attitude, I think it would be both easy and informative to look at NER--at least on answer lines, and maybe also for clues. As you correctly note, looking at his/her pronouns doesn't always refer to the answer line, and can also produce misleading or mixed results. NER won't gender things for you, but you can always look up the sex/gender of people, mythological beings, etc. in Wikidata as a preferred option for "gendering" a question, and use his/her pronouns when Wikidata isn't informative. (You could also use Wikidata to make a link from a work to the sex/gender of its creator, even if it's not mentioned in the question.)
Alex D.
ACF
http://tinyurl.com/qbmisconduct

"You operate at a shorter wavelength and higher frequency than most human beings." —Victor Prieto
User avatar
QuestionCactus
Lulu
Posts: 60
Joined: Sat Nov 17, 2018 9:55 pm

Re: Gender in QuizDB Sets

Post by QuestionCactus »

Deviant Insider wrote: Mon Jul 08, 2019 7:49 pm The data certainly is not perfect. There are more than two mis-gendered tossups in the group you listed, such as Faulkner and books written by men. Also, some of the set data is off--it says the 2013 Scobol Solo used only one gendered pronoun, which is way off. It is possible that this is due to some sets being only partially in the database rather than anything you did wrong. There's also the fact that your method is simplistic, which you freely admit and I can't blame you for, since it's better than my method, and you weren't getting paid for this.
There are a few weird quirks in the database. 2013 Scobol Solo, like you noticed, has not only one gendered pronoun, it only has one tossup total, according to the database. Here it is:
One character created by this author is born as the boat Wilhelm Gustloff sinks. That character, Konny Pokriefke, is the protagonist of Crabwalk. Another novel features a character whose Adam's apple is attacked by a cat and who is fascinated by the remains of a minesweeper. This author wrote about a man who falls in love with Roswitha Raguna and considers himself to have two fathers, one of whom dies after defending a post office, and the other of whom joins the Nazi Party. Name this writer whose Danzig Trilogy includes Cat and Mouse and his tale of Oskar Matzerath titled The Tin Drum.

ANSWER: Gunter (Wilhelm) Grass
I've noticed a few other strange things. There are some questions on the current QuizDB site that don't appear in my file, which you might expect; it could be out of date. But I also remember finding a few tossups in my file that aren't in the QuizDB site. A few weird things like that.
Deviant Insider wrote: Mon Jul 08, 2019 7:49 pm It's not surprising that the female list would skew towards female things, but it's weird that things like arms, roles, characters, and a whole bunch of other normal things skew male.
I quickly looked up "arm" out of curiosity, and there are actually more female tossup containing "his arm" than female tossups containing "her arm." I could print all of the tossups, but there are a lot of them and I don't see any obvious pattern from brief scrolling. No obvious pattern for "role" either.
Aaron's Rod wrote: Thu Jul 11, 2019 3:55 pm
QuestionCactus wrote: Thu Jul 04, 2019 1:13 pm Using a little more natural-language processing, I looked for all the instances where the words "he" or "she" were followed by a verb; this can allow us to have some sort of vague insight into what men and women are doing in tossups. Furthermore, by subtracting the rate at which a given verb appears after "he" from the rate at which the same verb appears after "she," we can get a sense of the skew of any given action.
I found the gendered verbs and nouns particularly interesting. I assume you were using part-of-speech tagging for this? If so, it's not too much of a stretch at all to extend this to named-entity recognition.
A lemmatizer would have been more readable, I agree. I was doing it quickly and wasn't motivated to look up the syntax for nltk's lemmatizer haha.

For the gendered verbs/nouns, I just pos-tagged every word that occurred after "he" or "she." I wasn't familiar with Wikidata; using that would probably make it worthwhile to just redo the classification system. Instead of naively looking for pronouns, one could list the named entities and look them up on Wikidata to find their gender.

I might be able to do this at some point; in any case, I can clean up the Jupyter Notebook I used and then other people can mess around with the analysis as well.
Arjun Panickssery (he/his)
President, American Quizbowl League
Locked