I am from Cambridge Massachusetts US, working on program analysis for some Scala projects. My project uses some nice features of Antlr4 so we want an accurate EBNF grammar in Antlr4 form to build the parser and AST walker first.
I saw several previous attempt from stackoverflow:
I made my own attempt by comparing the Antlr4 grammar shown in the post, and correct it to match the 2.13 spec.
And I took a look of the scala syntax spec, and found something inaccurate there, here is an example:
The " ScalaParse" looks nice. I tested it with some random large scala codes and it says it is parsing it successfully? I assume either there is a recursive decent parser implemented for it or the author used some EBNF grammar to generate the parser? But I dont seem to find a grammar file from this project.
I really need a correct EBNF grammar for Antlr4 since it provides lots of convenient features we are using, like the AST walker and the parse tree visualization. Those things are very helpful for debugging.
ScalaParse basically is nothing but a grammar, as far as I’m aware. While it’s not stated in BNF form, I believe that a collection of FastParse Parsers is mostly isomorphic to a BNF grammar. My guess is that going from ScalaParse to BNF would be a largely mechanical exercise. It does require learning how to read FastParse, though…
I tried to test the EBNF grammar from https://www.scala-lang.org/files/archive/spec/2.13/13-syntax-summary.html in online checker here: https://mdkrajnak.github.io/ebnftest/ but I can’t event get past grammar checking. In “Test Output” I’m constantly getting errors like: “Error parsing grammar specification”. I tried tidying up the grammar (converting the exotic apostrophes and quotation marks into normal ones for example), but I still don’t have working version.
thank you for trying. This is what I am facing. I have provided an example above showing a potential problem of the spec about parsing the import statement.
I guess there’s no single industry standard for EBNF grammar and you need to adjust it to a particular parser generator implementation. Also, the EBNF grammar in Scala documentation is intentionally broken, I think:
lower ::= ‘a’ | … | ‘z’ | ‘_’ // and any character in Unicode category Ll, and and any character in Lo or Ml that has contributory property Other_Lowercase
upper ::= ‘A’ | … | ‘Z’ | ‘$’ // and any character in Unicode category Lu, Lt or Nl, and any character in Lo and Ml that don't have contributory property Other_Lowercase
digit ::= ‘0’ | … | ‘9’
opchar ::= // printableChar not matched by (whiteSpace | upper | lower |
// letter | digit | paren | delim | opchar | Unicode_Sm | Unicode_So)
printableChar ::= // all characters in [\u0020, \u007F] inclusive
How the above is supposed to work without adapting it to particular parser generator implementation?
def While = P( `while` ~/ "(" ~ ExprCtx.Expr ~ ")".! ~ Expr.! )
.map { case ((cond, body)) => new AstWhile(cond, body) } // choose this or below code
.map(AstWhile.tupled) // alternative version for case classes
And of course do such change everywhere through the parsing code for it to make sense.