ABNF Parser and Verifier
Motivation
The Augmented
BNF is used to specify the syntax of a number of very common and
emerging text-based Internet protocols:
- SMTP (Internet mail)
- POP3 (Internet mail)
- IMAP (Internet mail)
- HTTP (WWW)
- RTSP (streaming multimedia)
- SIP (Internet telephony signaling)
A typical example for a BNF (here, for email) is
message-id = "Message-ID:" identifier CRLF ; a comment
in-reply-to = "In-Reply-To:" identifier CRLF
references = "References:" identifier *(*CFWS identifier) CRLF
identifier = "<" addr-spec ">"
ABNF extends context-free grammars (BNFs) by allowing repetitions of
an element or a group of elements within a rule, possibly restricted in
number. It also expresses optional elements more concisely by allowing
bracketed elements, e.g., [foo] is equivalent to
*1(foo). Besides, the syntax differs somewhat from, e.g.,
yacc.
Unfortunately, there are currently no tools that parse a BNF to check
whether a definition is syntactically accurate or complete. Also, each
message parser is implemented by hand.
Specification
The project consists of three (separable) parts:
- Parse ABNF descriptions with grammars as in ABNF
so that specifications can be checked for syntactic correctness and
completeness. Allow BNF elements to be out of order (use before
definition). Provide for file inclusion to simplify use of standard
definitions. Possibly accomodate slight variations found (e.g.) in RFC
822 (email) and RFC 2068 (HTTP). Allow both / and | for alternatives.
Support the notation "#1 identifier" to indicate a
comma-separated list of identifier. Input is a the ABNF
description found in the protocol specification, output are error
indications.
- Check text strings (e.g., SMTP headers) to see if they conform to a
given grammar and possibly show their production. You only have to
process the protocol headers, not the whole message. Your program should
read the headers from stdin, but there should be a function internally
that processes the headers as a single string.
- Build a parser generator that can create run-time parser that parse
messages based on a given ABNF specification.
Note: this project can be split into several (e.g., parsing/checking
and translation).
Investigate whether lex and/or YACC can be made to fit some of these
tasks, e.g., by translating an ABNF specification into something
understandable by these tools.
Environment
I prefer that the project be done in Java (command-line mode), but this
is not a requirement.
Prerequisites
Knowledge of compilers and parsing. Knowledge of protocols not
required.
Last modified: 1997-09-21 by Henning Schulzrinne