LaTeX and packets

Mike Bentley · Post by **Mike Bentley** » Mon May 19, 2008 12:49 pm

Can someone give me an overview of exactly what we're tryiing to accomplish here? Is it just to parse questions into "tossup text", "answer text", "underlined answer text", etc.? If so, having people use a non-Word solution seems excessive.

I don't really see why we need to use latex or anything else to do this. So long as someone writes a packet in a consistent manner, parsing Word formatted text is pretty much a non-issue. I pretty much already have a program that can receive text pasted from Word and output to, say, an XML file. It would require some modification to load Word documents directly, but I don't imagine that would be overly complicated.

fleurdelivre · Post by **fleurdelivre** » Mon May 19, 2008 2:06 pm

Bentley Like Beckham wrote:Can someone give me an overview of exactly what we're trying to accomplish here?

Let's dream bigger than that. If we're going to bother with any of this, why not work towards a coherent system for managing questions from concept through writing to storing and searching. I might be totally crazy for thinking we can pull it off, but why not at least sit down a few of the tech-inclined people and consider possibilities?

ezubaric wrote: at Caltech we tried to have an online system called Jerome where people just filled in templates and then it got categorized and sent to an editor, but it never really caught on. I used it for myself for editing, but people couldn't be compelled to use it (even though you got real-time feedback from editors and got to see your questions move through the editing process). I'm not saying that Jerome is the answer (he's old, creaky, and built on old technology), but I think the idea was a decent one.

It looks like the first half of such a system has been developed before; let's do it better this time. Why worry about parsing Word packets if we can come up with an cleaner way to produce them in the first place?

grapesmoker wrote:Actually, I think as far as collaborative editing goes, a system that either utilizes Google Docs or some CMS system like Drupal would be the way to go. I'm exploring the possibility of using Drupal for this but I haven't had much experience with it so I don't know how feasible it is.

Finally, I mentioned some of this project to my boss (it's a quiet day) and he's actually pushing Drupal for this one. Given the enthusiasm, I'll see if I can't use some calm afternoon to set up a test environment and see if it makes any sense to me.

dschafer · Post by **dschafer** » Mon May 19, 2008 3:17 pm

Bentley Like Beckham wrote:Can someone give me an overview of exactly what we're tryiing to accomplish here?

Personally, what I envision is this:

We develop a universally agreed-upon XML quizbowl schema, probably similar to or based on the XML output from xmlize.pl. We use that script (or a variant) to convert current or old Word packets to this XML format. Additionally, we develop a standard plain-text markup system for those of us that would prefer that, and create a script to convert the plain-text to the XML format. Basically, any packet could be easily converted into this standard XML format.

Once everything is in our standard XML format, we can do all sorts of fun things. Developing a script to convert the XML to TeX (with an appropriate .cls file) would be easy. Similarly, an IRC bot could work with the standard XML format, and thus could access the archives of any tournament converted to it. Once we have a universally agreed-upon standard, a lot of possibilities open themselves up, especially if it is easy to convert back archives of packets to that format.

Sen. Estes Kefauver (D-TN) · Mon May 19, 2008 3:59 pm

I feel like openly joining Ryan Westbrook's movement of quizbowl luddites. Please people,, things are pretty much good I think. We have questions to play on, and quite often they are well written. That's really all you need to have good quizbowl. I don't want to play on computer-read quizbowl that automatically scores itself or anything, I want to have a moderator who reads me questions. I don't want to have to learn any kind of computer script to write my packets, especialyl considering my poor experience trying to learn LaTex, and beyond Jerry's searchable database, I don't see any real reason to come up with all kinds of codes for quizbowl (especially since it sounds to me like it's not too hard to convert packets that are already ACF formatted into searchable texts for the QBDB). Beyond that, what is so necessary about all of this? It looks good? I'm honestly not following.

cvdwightw · Post by **cvdwightw** » Mon May 19, 2008 4:16 pm

ezubaric wrote:I also second the idea of a quiz bowl question editor that would produce packets in whatever format was desired; at Caltech we tried to have an online system called Jerome where people just filled in templates and then it got categorized and sent to an editor, but it never really caught on. I used it for myself for editing, but people couldn't be compelled to use it (even though you got real-time feedback from editors and got to see your questions move through the editing process). I'm not saying that Jerome is the answer (he's old, creaky, and built on old technology), but I think the idea was a decent one.

As someone who vaguely remembers submitting questions via Jerome, I think the hardest part was that you didn't know what parts of the distribution you had filled already. I don't remember all that much about the process, so this might be wrong, but (1) I don't think there was any code that said "this is RMP, this is Literature, etc." and the system just tried to guess by answer selection, and (2) I don't remember whether you could see all the questions your team wrote or just all the questions you submitted; if the latter, then that's a huge problem. You also had to submit questions one at a time, rather than as a complete file, and I'm more sure on this because I don't have anything saved to my computer from that tournament except a largely incomplete packet, and I know I submitted at least 3/4 of the full packet via Jerome. This system seemed much more inefficient from the packet-submission end than from an editing aspect.

I'm not by any means the go-to guy for markup languages and all that fancy stuff, but I think there's a bit of an issue between how people are submitting packets and how packets come out. For instance, Jerry's algorithm searches for "tossups", then treats everything as a tossup. This doesn't work if instead I submit my packet with "Literature 5/5" and then put five tossups and five bonuses, then "History 5/5" with five tossups and five bonuses, etc. It works great for finished packets, but not so well with submissions. On the other hand, if we make a LaTex shell like Andrew suggests, then each tournament can customize it to their distribution by using whatever number of tossup and bonus blocks they want, along with a \category{} line that the editor fills in. The advantages of this are great - uniformity, for one thing - and there's not a super-high barrier to entry for new writers, because you've already given them everything that they need, all they need to do is type the question, put the answer in the appropriate spot, put the prompts/do not accepts in the appropriate spot, and send. At this point, maybe I'm dreaming, but there's another shell that allows even a LaTex-illiterate editor to create a full packet from the edited original version, and then the editor just exports to .pdf file.

I can't imagine that with all these people who know LaTex that it would be that hard to do something like this and create a common ground for packets. To Charlie - the idea is to reach a compromise where people can just type their questions without trying to figure out what just happened - honestly, if three different people send me their parts of the packet (in my experience, at least one of them will blatantly disregard all sorts of formatting directions), or I try to just copy from Google Docs into Word, then all sorts of weird stuff happens when I make a full packet and I have to do all sorts of formatting that I shouldn't have to. If I have a shell where I just put everything where they're intuitively supposed to go, then all you have to do if I'm compiling our team's packet is write a simple plaintext file, and I just copy the relevant parts of your file to the parts of the shell where they, intuitively, go.

dschafer · Post by **dschafer** » Mon May 19, 2008 4:24 pm

I agree that we shouldn't make people learn any sort of complicated scripting to write packets. On the other hand, AFT seems no more difficult to write in than the current ACF formatting standards.

I also agree that there is nothing truly necessary about this, but I also don't see a downside to it. I view the XML idea as something that would be created from the Word docs; a well-defined open standard that anyone with an idea of something cool to do with packets can work off of, without having to worry about the weird formatting quirks of word documents. Question databases and IRC bots come to mind as potential uses, but there are certainly more uses for computers out there (repeat detection, automatic packet randomization, etc.), and the more machine-friendly a format we have, the more likely it is such a project could come to be.

As an example of how this could be beneficial while not affecting packet writers at all:

Packet is written in word, just as it is now.
Editor receives packet, converts it to XML (note: all weird formatting issues in the submission now are irrelevant, so long as the script caught them)
Editor runs question-reordering script on XML; questions are now sorted in a aesthetically pleasing manner.
Editor runs script to convert XML to TeX and convert the TeX into a PDF final product.

The in-product was just a Word doc, as it is now, but the editor now didn't have to deal with randomization or formatting of the submission, since the scripts did that automatically.

Regarding automatic scoring: there was a room at ICT where this was done, and it was awesome. It had a scoreboard, what tossup we were on, countdown timers for bonus parts, and even a little indicator for which player had buzzed. It was pretty amazing.

NoahMinkCHS · Post by **NoahMinkCHS** » Mon May 19, 2008 4:52 pm

dschafer wrote:Regarding automatic scoring: there was a room at ICT where this was done, and it was awesome. It had a scoreboard, what tossup we were on, countdown timers for bonus parts, and even a little indicator for which player had buzzed. It was pretty amazing.

This type of thing is the one thing that I would take from Chip Beall and CBI and import into real quizbowl, especially for timed formats like NAQT. Maybe I'm just OCD but I really like knowing the score. Sorry I missed that room at ICT.

Also, for the self-described Luddites: AFT is actually less newfangled than Word. It's platform-independent plain text! Beyond robot readers and databases, there are plenty of reasons to eschew a proprietary, error-laden file format that only complicates the writing, submission, editing, formatting, and reading steps by making dumb guesses about what the writer might be trying to do. Plain text has a much lower "ZOMG fear new technology!" factor than Word. I've never edited anything more advanced than a house-written HS tournament or my team's packet for a submission tournament and even I've seen a multitude of ways Word makes life harder. Now that we have to deal with DOCX format from Word 2007, not to mention people submitting files that were created in all 103 previous versions of Word, plus OpenOffice, plus Mac versions, that's only going to get worse. Meanwhile, everybody has Notepad...

ezubaric · Post by **ezubaric** » Mon May 19, 2008 5:35 pm

cvdwightw wrote:As someone who vaguely remembers submitting questions via Jerome, I think the hardest part was that you didn't know what parts of the distribution you had filled already. I don't remember all that much about the process, so this might be wrong, but (1) I don't think there was any code that said "this is RMP, this is Literature, etc." and the system just tried to guess by answer selection, and (2) I don't remember whether you could see all the questions your team wrote or just all the questions you submitted; if the latter, then that's a huge problem. You also had to submit questions one at a time, rather than as a complete file, and I'm more sure on this because I don't have anything saved to my computer from that tournament except a largely incomplete packet, and I know I submitted at least 3/4 of the full packet via Jerome. This system seemed much more inefficient from the packet-submission end than from an editing aspect.

All of your memories are pretty much correct, and reflect the challenges of such a system. There was indeed a batch submission system, but I never told anyone about it because it was harder to fix people's mistakes when twenty questions come in wrong than just one. One perverse side effect was the lack of categories at submission time; the batch upload guessed (fairly well) the category of each question and put it in the DB.

The take home lesson of this is that any proposed system must not impose any additional burdens on the end-user. Even it makes life much easier for editors, it won't get adopted unless it also somehow helps the people that will have to interact with it.

grapesmoker · Post by **grapesmoker** » Mon May 19, 2008 5:42 pm

dschafer wrote:We develop a universally agreed-upon XML quizbowl schema, probably similar to or based on the XML output from xmlize.pl. We use that script (or a variant) to convert current or old Word packets to this XML format. Additionally, we develop a standard plain-text markup system for those of us that would prefer that, and create a script to convert the plain-text to the XML format. Basically, any packet could be easily converted into this standard XML format.

Much of this work is already be done (though I wouldn't be so presumptuous as to suggest that my particular approach is "universally agreed-upon.") What really needs to happen, and hasn't yet, is a single script, that end-to-end transforms a packet into XML and maybe even contacts the server to upload the packet, to make it easier for people to work with.

cvdwightw wrote: I'm not by any means the go-to guy for markup languages and all that fancy stuff, but I think there's a bit of an issue between how people are submitting packets and how packets come out. For instance, Jerry's algorithm searches for "tossups", then treats everything as a tossup. This doesn't work if instead I submit my packet with "Literature 5/5" and then put five tossups and five bonuses, then "History 5/5" with five tossups and five bonuses, etc. It works great for finished packets, but not so well with submissions. On the other hand, if we make a LaTex shell like Andrew suggests, then each tournament can customize it to their distribution by using whatever number of tossup and bonus blocks they want, along with a \category{} line that the editor fills in. The advantages of this are great - uniformity, for one thing - and there's not a super-high barrier to entry for new writers, because you've already given them everything that they need, all they need to do is type the question, put the answer in the appropriate spot, put the prompts/do not accepts in the appropriate spot, and send. At this point, maybe I'm dreaming, but there's another shell that allows even a LaTex-illiterate editor to create a full packet from the edited original version, and then the editor just exports to .pdf file.

Dwight, these are all good ideas, and I've been thinking about how to realize them. Obviously in the end we'd like to end up with something that is intuitive for the end user and avoids any need to manually do markups. I recognize the shortcomings of my algorithm, which is why at the moment it's only useful for importing compiled packets.

Deesy Does It wrote:I feel like openly joining Ryan Westbrook's movement of quizbowl luddites. Please people,, things are pretty much good I think. We have questions to play on, and quite often they are well written. That's really all you need to have good quizbowl. I don't want to play on computer-read quizbowl that automatically scores itself or anything, I want to have a moderator who reads me questions. I don't want to have to learn any kind of computer script to write my packets, especialyl considering my poor experience trying to learn LaTex, and beyond Jerry's searchable database, I don't see any real reason to come up with all kinds of codes for quizbowl (especially since it sounds to me like it's not too hard to convert packets that are already ACF formatted into searchable texts for the QBDB). Beyond that, what is so necessary about all of this? It looks good? I'm honestly not following.

Charlie, I've tried to lay out why I think this is worthwhile. I've already made it clear (I hope) that if we (the technical people) ask the writers in general to make any adjustments at all, they will be minimal. So if you prefer your to remain on the Ludditic side of things, feel free to do nothing. I think there's a strong case to be made for the potential of technology to improve quizbowl, and I want to work to make those improvements happen.

Matt Weiner · Post by **Matt Weiner** » Mon May 19, 2008 5:54 pm

NoahMinkCHS wrote:Also, for the self-described Luddites: AFT is actually less newfangled than Word. It's platform-independent plain text! Beyond robot readers and databases, there are plenty of reasons to eschew a proprietary, error-laden file format that only complicates the writing, submission, editing, formatting, and reading steps by making dumb guesses about what the writer might be trying to do. Plain text has a much lower "ZOMG fear new technology!" factor than Word. I've never edited anything more advanced than a house-written HS tournament or my team's packet for a submission tournament and even I've seen a multitude of ways Word makes life harder. Now that we have to deal with DOCX format from Word 2007, not to mention people submitting files that were created in all 103 previous versions of Word, plus OpenOffice, plus Mac versions, that's only going to get worse. Meanwhile, everybody has Notepad...

Please remember to avoid the inherent fallacy of all "you should use Linux/emacs/a magnetized needle that you manually change bits in your RAM with" arguments, and do not assume that everyone is coming into this from the state of nature. People already know how to use Word, and anything else is something they have to learn. It's not an all-things-equal situation.

Mike Bentley · Post by **Mike Bentley** » Mon May 19, 2008 6:08 pm

Yeah one of the issues of plain text is you lack some really useful quizbowl features in Word:

Spell checking (depending on the editor you use)
Line counting (i.e. how many lines in Word with 1 inch margins does this question take up?)

These aren't deal breakers, but they can certainly make the process more annoying.

fleurdelivre · Post by **fleurdelivre** » Mon May 19, 2008 6:08 pm

ezubaric wrote:The take home lesson of this is that any proposed system must not impose any additional burdens on the end-user. Even it makes life much easier for editors, it won't get adopted unless it also somehow helps the people that will have to interact with it.

grapesmoker wrote:Obviously in the end we'd like to end up with something that is intuitive for the end user and avoids any need to manually do markups. I recognize the shortcomings of my algorithm, which is why at the moment it's only useful for importing compiled packets.

Amen. Luddites, take note: if we promise not to make your lives any harder, can we please, please try to build a clean system that will allow you to communicate with editors, collaborate on packets and/or store them in an easily-searchable database? None of these things will happen quickly, and it will certainly be some time and effort before we could have a comprehensive system running. But the whole point of any computer-based system is to improve the process, so have a little faith and give our insanity the benefit of a doubt. The worst that happens is it gets widely rejected and and the

Matt Weiner wrote:"you should use Linux/emacs/a magnetized needle that you manually change bits in your RAM with"

people become the brunt of yet another round of jokes at your expense.

NoahMinkCHS · Post by **NoahMinkCHS** » Mon May 19, 2008 10:25 pm

Matt Weiner wrote:
NoahMinkCHS wrote:Also, for the self-described Luddites: AFT is actually less newfangled than Word. It's platform-independent plain text! Beyond robot readers and databases, there are plenty of reasons to eschew a proprietary, error-laden file format that only complicates the writing, submission, editing, formatting, and reading steps by making dumb guesses about what the writer might be trying to do. Plain text has a much lower "ZOMG fear new technology!" factor than Word. I've never edited anything more advanced than a house-written HS tournament or my team's packet for a submission tournament and even I've seen a multitude of ways Word makes life harder. Now that we have to deal with DOCX format from Word 2007, not to mention people submitting files that were created in all 103 previous versions of Word, plus OpenOffice, plus Mac versions, that's only going to get worse. Meanwhile, everybody has Notepad...
Please remember to avoid the inherent fallacy of all "you should use Linux/emacs/a magnetized needle that you manually change bits in your RAM with" arguments, and do not assume that everyone is coming into this from the state of nature. People already know how to use Word, and anything else is something they have to learn. It's not an all-things-equal situation.

This is a good argument against LaTeX or XML, but I'm still not sure how AFT is any more complicated than ACF-formatted Word (which is also something that has to be learned, for teams and players that have not done packet-sub in the past). I'm sympathetic to Mike's concerns re: spell check and whatnot, but you could, in fact, use Word for all your typing (taking advantage of red underlines and all!) and then just copy that into an email or Notepad.

It's interesting that while bolded/underlined answers is now the paradigm that "everyone already knows", it obviously hasn't always been the case. In fact, packets once looked like this... which to me looks a lot like Jerry's AFT example. (Which, for those worried about aesthetics, seems like it can easily be transformed into the bold/underline text we all know and love... but can also have other things done to it that Word text cannot.)

dschafer · Post by **dschafer** » Mon May 19, 2008 10:38 pm

grapesmoker wrote:
dschafer wrote:We develop a universally agreed-upon XML quizbowl schema, probably similar to or based on the XML output from xmlize.pl. We use that script (or a variant) to convert current or old Word packets to this XML format. Additionally, we develop a standard plain-text markup system for those of us that would prefer that, and create a script to convert the plain-text to the XML format. Basically, any packet could be easily converted into this standard XML format.
Much of this work is already be done (though I wouldn't be so presumptuous as to suggest that my particular approach is "universally agreed-upon.") What really needs to happen, and hasn't yet, is a single script, that end-to-end transforms a packet into XML and maybe even contacts the server to upload the packet, to make it easier for people to work with.

Can you create a place (on the qbwiki, maybe) where you outline the XML schema you use? The one thing that has been puzzling me is how to deal with 30-20-10, "5 for one, 10 for two..." and "FFPE and a five point bonus" type bonuses, which (though they may be uncommon), should still be addressable by the XML format.

Sen. Estes Kefauver (D-TN) · Mon May 19, 2008 11:07 pm

Well the solution is to not write those kinds of bonuses.

grapesmoker · Post by **grapesmoker** » Mon May 19, 2008 11:24 pm

dschafer wrote:Can you create a place (on the qbwiki, maybe) where you outline the XML schema you use? The one thing that has been puzzling me is how to deal with 30-20-10, "5 for one, 10 for two..." and "FFPE and a five point bonus" type bonuses, which (though they may be uncommon), should still be addressable by the XML format.

I'll post something on the wiki. As for your question, I don't have a good way to deal with the first and last examples, but then again I don't think they're great questions to write (and aren't actually being written anymore). The middle one is just handled by making the total point value 10, and then writing the conditional in the question. Like this:

[10] For five points for one, and ten for two, identify Jerry Vinokurov's two least favorite pieces of Microsoft software.

edit: Quick rundown of QBML now available for those interested.

dschafer · Post by **dschafer** » Tue May 20, 2008 12:10 am

Believe me, I don't like those bonuses either; however, a quizbowl XML schema should be able to handle them. They certainly exist in archived packets, and could potentially exist in the future as well, so the schema would be incomplete if it was incapable of describing them.

Edit: Can I suggest that "QBML" not be used as the name, as this is the same acronym used by NAQT (according to their new writer's kit) and thus might lead to confusion.

BuzzerZen · Post by **BuzzerZen** » Tue May 20, 2008 12:42 am

One piece of software that I suspect would be interesting to some here is Prince, which transforms X(HT)ML into PostScript/PDF documents using ordinary CSS with some extensions. So it should be possible to transform QBML into PDF packets without using LaTeX as an intermediary. It's proprietary software, and use on a server technically requires an expensive server license, and the free version embeds a watermark on the first page of every document...on the other hand, it's very nifty.

grapesmoker · Post by **grapesmoker** » Tue May 20, 2008 10:47 am

BuzzerZen wrote:One piece of software that I suspect would be interesting to some here is Prince, which transforms X(HT)ML into PostScript/PDF documents using ordinary CSS with some extensions. So it should be possible to transform QBML into PDF packets without using LaTeX as an intermediary. It's proprietary software, and use on a server technically requires an expensive server license, and the free version embeds a watermark on the first page of every document...on the other hand, it's very nifty.

Why bother with this? Writing a script to convert XML to LaTeX is trivial; any web-based tool that's going to have an "export to PDF" option will need it anyway. I'd like to avoid relying on proprietary software if possible.

Edit: Can I suggest that "QBML" not be used as the name, as this is the same acronym used by NAQT (according to their new writer's kit) and thus might lead to confusion.

It's a nice abbreviation, and I don't think there's really going to be any confusion, since no one is planning to appropriate NAQT's format anyway. If you have a better acronym, I'm happy to switch to that, though.

ezubaric · Post by **ezubaric** » Tue May 20, 2008 11:48 am

grapesmoker wrote:edit: Quick rundown of QBML now available for those interested.

Nice.

Some more things that we had in Jerome (obviously more geared toward editing):

1. author for each question
2. editor for each question
3. status of each question (newly submitted, needs editing, finished)
4. category of each question

Also, why are questions within tossup and bonus tags? Why not just have a type for each? It would make submitting questions easier (if you have four people, you could just copy and paste submissions into a single file rather than separating out tossups and bonuses).

grapesmoker · Post by **grapesmoker** » Tue May 20, 2008 12:22 pm

ezubaric wrote:
grapesmoker wrote:edit: Quick rundown of QBML now available for those interested.
Nice.

Some more things that we had in Jerome (obviously more geared toward editing):

1. author for each question
2. editor for each question
3. status of each question (newly submitted, needs editing, finished)
4. category of each question

All points well made. I want to emphasize that this was something I more or less came up with on the fly; it wasn't the result of sitting down and figuring out a full-featured style for all contingencies. I would agree that all of those things should be added to questions.

Also, why are questions within tossup and bonus tags? Why not just have a type for each? It would make submitting questions easier (if you have four people, you could just copy and paste submissions into a single file rather than separating out tossups and bonuses).

Can you give an example of what you mean? I'm not sure I totally understand this.

dschafer · Post by **dschafer** » Tue May 20, 2008 2:30 pm

So I've been working on a fairly comprehensive XML structure to describe tournaments/packets/questions. It's rather complicated when dealing with the weird bonus cases (five for one, ten for two, twenty for three, thirty for all four) and complicated answer lines (accept foo, don't accept bar, prompt on foobar), but standard tossups and FTPE bonuses are fairly easily described. I'm using "CQML" (Comprehensive Quizbowl Markup Language) as a placeholder name.

I've put an example CQML file on the wiki.

To avoid any confusion: I am not advocating questions be written in this format; that would be foolish. Similarly, questions should not be read in this format either. This is designed to be very easily machine-readable and machine-writable, and to unambiguously describe all of the weird quizbowl quirks current packets contain. Ideally, this would be a stopping point on the way to other, cooler computerized things: we would have one master script that would convert any packet in Word format to CQML, then IRC bots / packet archives / TeX translators / robot question readers / etc. would take the CQML file as input, since it is far more machine readable.

Thoughts?

grapesmoker · Post by **grapesmoker** » Tue May 20, 2008 3:17 pm

Dan, that's a pretty neat structure. Here's my one concern: recognizing weird bonus patterns is hard to do automatically. There are any number of ways that people would indicate this in the past, and we'd have to account for all these variants. I'm not saying it's impossible, but it does present an annoying problem. Concentrating our efforts on the better packets from the last 7 or 8 years would probably be more productive than bending over backwards to make sure Tennessee Masters 1994 makes it into an archive.

dschafer · Post by **dschafer** » Tue May 20, 2008 5:12 pm

I agree that parsing "thirty for all four" type bonuses would be rather hard (and not really that critical), but I wanted to make sure the spec supported it.

I just added power mark support; <power value="15">57</power>, where the number on the inside is the number of characters into the question the power is available for. Are there any other common features of a packet that the structure should support?

Sen. Estes Kefauver (D-TN) · Tue May 20, 2008 5:33 pm

Does the power marker only take into account the * symbol, the bolded text, or both? I ask because there are lots of packets that used powers where they are only indicated by bold text without the asterisk.

dschafer · Post by **dschafer** » Tue May 20, 2008 5:51 pm

I would assume the (*) marker is not included in the question content. In general, though, it would just be the number of characters in the tossup text from <tossup>, once all tags are stripped.

ezubaric · Post by **ezubaric** » Tue May 20, 2008 9:01 pm

grapesmoker wrote:Can you give an example of what you mean? I'm not sure I totally understand this.

Okay, so suppose Jack has a packet with <Tossups>Q1 Q2 Q3 Q4</Tossups> <Bonuses>Q5 Q6</Bonuses> and Jill has a packet with <Tossups>Q_a Q_b</Tossups> <Bonuses>Q_c Q_d Q_e Q_f</Bonuses>. To combine them into a submission, you would have to copy and paste the tossups and bonuses into the appropriate subsection rather than just concatenating the files together.

A minor issue, to be sure, and it would only be an issue if people were using the XML for the submission, or editors would be working directly with the XML.

ezubaric · Post by **ezubaric** » Tue May 20, 2008 9:05 pm

dschafer wrote:I've put an example CQML file on the wiki.

Thoughts?

How would "ANSWER: _J_ohn Fitzgerald _Kennedy_" be encoded?

dschafer · Post by **dschafer** » Tue May 20, 2008 10:30 pm

ezubaric wrote:
dschafer wrote:I've put an example CQML file on the wiki.

Thoughts?
How would "ANSWER: _J_ohn Fitzgerald _Kennedy_" be encoded?

Code: Select all

<answer><req>J</req>ohn Fitzgerald <req>Kennedy</req></answer>

EDIT:

buzzerzen suggested this power mark system, which I think I like better:

Code: Select all

<clue>FAQTP, identify this curved <power value="15" />yellow fruit.</clue>

Sir Thopas · Post by **Sir Thopas** » Tue May 20, 2008 10:39 pm

What about _Philip_ _II_ _of Macedon_, where either Philip II or Philip of Macedon is acceptable?

dschafer · Post by **dschafer** » Tue May 20, 2008 10:56 pm

metsfan001 wrote:What about _Philip_ _II_ _of Macedon_, where either Philip II or Philip of Macedon is acceptable?

Probably this:

Code: Select all

<answer>
   <req>Philip II</req> of Macedon
   <accept><req>Philip</req> II of <req>Macedon</req></accept>
   <prompt>Philip</prompt>
</answer>

grapesmoker · Post by **grapesmoker** » Tue May 20, 2008 11:05 pm

There's no real need to have the "accept" portion be a separate tag; this needlessly complicates parsing. It's enough to just specify the alternately acceptable answer and mark the parts that are acceptable with the <req> tag. It's what we do now anyway and I'm not sure there's need for a special "accept" or "prompt" tag.

dschafer · Post by **dschafer** » Tue May 20, 2008 11:12 pm

grapesmoker wrote:There's no real need to have the "accept" portion be a separate tag; this needlessly complicates parsing. It's enough to just specify the alternately acceptable answer and mark the parts that are acceptable with the <req> tag. It's what we do now anyway and I'm not sure there's need for a special "accept" or "prompt" tag.

I'm not sure I follow; how would you do Philip II?

This is probably extraneous use of the <prompt> tag, as anything using the XML should know to prompt if only part of the required answer is given. A better example where the use of the <prompt> tag would be needed is:

Code: Select all

<answer>
   <req>Arsenal</req> F.C.
   <prompt>Gunners</prompt>
</answer>

grapesmoker · Post by **grapesmoker** » Wed May 21, 2008 10:57 am

dschafer wrote:I'm not sure I follow; how would you do Philip II?

I would just write:

Code: Select all

<answer>
   <req>Philip II</req> or <req>Philip of Macedon</req>
</answer>

and likewise for your second example:

Code: Select all

<answer>
   <req>Arsenal</req> F.C. (prompt on <req>Gunners</req>)
</answer>

This is exactly what we do now anyway, and I don't see any advantage to having prompt and accept tags. It just adds extra complexity to the process without an added benefit.

dschafer · Post by **dschafer** » Wed May 21, 2008 11:18 am

Okay, I see. The benefit of those extra answer tags would be for an IRC bot (or something similar) using the CQML as its question source; with <accept>,<prompt> and <dna> tags, the bot can easily figure out what to do with a given answer, without having to do any sort of non-XML parsing.

grapesmoker · Post by **grapesmoker** » Wed May 21, 2008 12:47 pm

dschafer wrote:Okay, I see. The benefit of those extra answer tags would be for an IRC bot (or something similar) using the CQML as its question source; with <accept>,<prompt> and <dna> tags, the bot can easily figure out what to do with a given answer, without having to do any sort of non-XML parsing.

Ah, that makes sense. It might be tough to work out how to automatically capture prompts and alternate answers from text, since, again, people are not consistent in how they indicate those parts.

BuzzerZen · Post by **BuzzerZen** » Wed May 21, 2008 1:23 pm

grapesmoker wrote:
dschafer wrote:Okay, I see. The benefit of those extra answer tags would be for an IRC bot (or something similar) using the CQML as its question source; with <accept>,<prompt> and <dna> tags, the bot can easily figure out what to do with a given answer, without having to do any sort of non-XML parsing.
Ah, that makes sense. It might be tough to work out how to automatically capture prompts and alternate answers from text, since, again, people are not consistent in how they indicate those parts.

I think a sensible tack to take would be to, in general, just put everything into an answer line, and trust people who are competent and interested enough to write IRC bots to be competent and interested enough to convert human-readable answer lines into machine-interpretable ones.

grapesmoker · Post by **grapesmoker** » Wed May 21, 2008 2:13 pm

BuzzerZen wrote:I think a sensible tack to take would be to, in general, just put everything into an answer line, and trust people who are competent and interested enough to write IRC bots to be competent and interested enough to convert human-readable answer lines into machine-interpretable ones.

A good solution.

ezubaric · Post by **ezubaric** » Wed May 21, 2008 3:58 pm

dschafer wrote:Okay, I see. The benefit of those extra answer tags would be for an IRC bot (or something similar) using the CQML as its question source; with <accept>,<prompt> and <dna> tags, the bot can easily figure out what to do with a given answer, without having to do any sort of non-XML parsing.

Another nice thing about having really pedantic answer tags would be that we could be sure that, for example, in searching for an answer while editing or for research, we'll get all of the relevant answers. But that's too much to expect of users; we should make it as simple as possible for the base case but still have the flexibility to be pedantic enough for IRC bots, etc.

Since we don't quite have the kitchen sink in there, another nice markup would be in the question itself; imagine if all of the novels, place names, people, dates, etc. were marked in a question. Then you could easily do a search for "ALL PEOPLE MENTIONED IN QUESTIONS ABOUT MANIFEST DESTINY, ORDERED BY FREQUENCY". That would be awesome. Again, I'm not saying that we should make this the standard, but having the schema support this kind of thing would be cool (and make checking for repeats easier if it did exist ... it could even be done semi-automatically, as NE recognition is getting pretty good).

grapesmoker · Post by **grapesmoker** » Wed May 21, 2008 4:11 pm

ezubaric wrote:Since we don't quite have the kitchen sink in there, another nice markup would be in the question itself; imagine if all of the novels, place names, people, dates, etc. were marked in a question. Then you could easily do a search for "ALL PEOPLE MENTIONED IN QUESTIONS ABOUT MANIFEST DESTINY, ORDERED BY FREQUENCY". That would be awesome. Again, I'm not saying that we should make this the standard, but having the schema support this kind of thing would be cool (and make checking for repeats easier if it did exist ... it could even be done semi-automatically, as NE recognition is getting pretty good).

Not sure what the utility of such a search would be. You can already find every question on manifest destiny in the database, what would be the point of having proper names marked up specially? We could certainly throw that in at some further point if we're feeling adventurous, I guess.

ezubaric · Post by **ezubaric** » Wed May 21, 2008 4:41 pm

grapesmoker wrote:Not sure what the utility of such a search would be. You can already find every question on manifest destiny in the database, what would be the point of having proper names marked up specially? We could certainly throw that in at some further point if we're feeling adventurous, I guess.

Well, seeing that 74% of questions on Manifest Destiny mention Horace Greeley without having to read through all of them would be cool.

fleurdelivre · Post by **fleurdelivre** » Wed May 21, 2008 5:02 pm

ezubaric wrote:Well, seeing that 74% of questions on Manifest Destiny mention Horace Greeley without having to read through all of them would be cool.

Isn't that when you just query the results of the first and do some quick arithmetic, though? I mean, it would be cool, but if we don't start with something practical /simple this will never get off the ground.

ezubaric · Post by **ezubaric** » Wed May 21, 2008 5:45 pm

fleurdelivre wrote:
ezubaric wrote:Well, seeing that 74% of questions on Manifest Destiny mention Horace Greeley without having to read through all of them would be cool.
Isn't that when you just query the results of the first and do some quick arithmetic, though? I mean, it would be cool, but if we don't start with something practical /simple this will never get off the ground.

Well, what's a brainstorming session without feature creep? :)

But really, if the power mark annotation were flexible enough, this could be used like so:

Code: Select all

In an <ANNOTATION TYPE="POWER">editorial</ANNOTATION>, <ANNOTATION TYPE="PERSON">Horace Greeley</ANNOTATION>'s paper ...

So rather than have a specific power tag, we could just have a general "ANNOTATION" that could denote special properties of individual phrases in the text. People could just throw away the annotations they didn't care about. This would allow for flagging other stuff like adult language, meta, etc.

Matt Weiner · Post by **Matt Weiner** » Wed May 21, 2008 6:18 pm

You could implement that as a second session of tinkering with the packets when they are added to the database. I've already talked to Jerry about adding a tagging system that would let trusted users mark questions as "history" "social science" "geography" etc, and then again by subcategory. You could mark out people, places, years, and whatnot in the same way.

dschafer · Post by **dschafer** » Wed May 21, 2008 7:00 pm

ezubaric wrote:
Code: Select all
In an <ANNOTATION TYPE="POWER">editorial</ANNOTATION>, <ANNOTATION TYPE="PERSON">Horace Greeley</ANNOTATION>'s paper ...
So rather than have a specific power tag, we could just have a general "ANNOTATION" that could denote special properties of individual phrases in the text. People could just throw away the annotations they didn't care about. This would allow for flagging other stuff like adult language, meta, etc.

I really, really like the idea of the <annotation> tag. I'll be adding that soon. I think powers should still be separate from annotations, as the two are rather different concepts (one affects gameplay, the other is more metadata).

BuzzerZen wrote:{quotes removed for brevity}
I think a sensible tack to take would be to, in general, just put everything into an answer line, and trust people who are competent and interested enough to write IRC bots to be competent and interested enough to convert human-readable answer lines into machine-interpretable ones.

It seems to me that this would defeat the point of converting it to XML first. One main objective of the XML spec is to avoid duplicating work; there have got to be at least five different "Word document packet to {something}" parsers out there, and their authors had to completely reimplement the algorithms needed to parse a packet. The human-readability parsing of any part of the packet, be it formatting, answer lines, or anything else, is certainly going to have to be done somewhere; the XML can be the end result of that parsing. This would allow us to have a single, global "Word document to CQML" converter that deals with all aspects of human readability and that anybody can use. That converter could then worry about human readability, so the designers of IRC bots / question database / robot moderators wouldn't have to (since they could take CQML as input).

Matt Weiner wrote:You could implement that as a second session of tinkering with the packets when they are added to the database. I've already talked to Jerry about adding a tagging system that would let trusted users mark questions as "history" "social science" "geography" etc, and then again by subcategory. You could mark out people, places, years, and whatnot in the same way.

Agreed. The <annotation> tag would certainly be optional, and could definitely be added in a second pass.

ezubaric · Post by **ezubaric** » Wed May 21, 2008 8:48 pm

Matt Weiner wrote:I've already talked to Jerry about adding a tagging system that would let trusted users mark questions as "history" "social science" "geography" etc, and then again by subcategory.

Well, categories are question specific, while the annotations being discussed here are applied to elements *within* a question. (OFF TOPIC: As I've said elsewhere, I think the first pass of such a categorization of QBDB questions be done automatically; better to have humans correct the 10% of errors than have humans only categorize 10% of the database.)

You could mark out people, places, years, and whatnot in the same way.

I think everybody agrees that annotations should be optional and would likely be done as post-processing either by editors (who want to keep track of specific things appearing in questions) or by interested people with too much time on their hands.