(parse 'Colorless green ideas sleep furiously')
Today Stacey and I were pair programming on a parser for requests to an IRC server. We started with this and this. Using the instaparser library, we ended up with this:
(def request-parser
(insta/parser "message = prefix? command params?
prefix = <':'> word <SPACE+>
command = word | number number number
params = middle? trailing?
middle = (<SPACE+> word)+
trailing = (<SPACE+> <':'> #'[a-zA-Z0-9 ]+')
word = #'[a-zA-z]+'
number = #'\\d'
<SPACE> = #' '"))
If you take a look at the IRC RFCs, you'll immediately see that there are some differences and discrepancies. For example, the issue of nick and message lengths are left untouched by this grammar.
As Stacey and I developed the parser, we began to wonder where certain errors like "invalid nick, length > 9" would be thrown or handled. Specifically, should these be parsing errors? Generally, what parts of the program's logic should be handled by this grammar?
Somehow I recalled Chomsky's famous line "Colorless green ideas sleep
furiously" which is a sentence (grammatically correct, or in our CS context,
it parses), but meaningless. This way of thinking about grammar applies here
too. The IRC command FOOBAR
is grammatically correct but meaningless. Errors
for meaningless commands should certainly be handled, but not in the grammar.
I went to the wikipedia page on "Colorless green ideas sleep furiously," saw this:
and was enlightened.
Postscript
That tree resembles a parse tree. If I didn't know any better, I would say that it is a parse tree.
Happy Halloween!