.NET based lexers

Coordinator
Oct 24, 2009 at 12:18 AM

I went through the exercise of creating a Scintilla lexer in c++. Well not so much created as repurposed another (ADA) for my needs. Wow-wee what a pain! It's not so much the lexing logic as it is the 32768 different ways strings are handled. I've been programming almost 2/3 of my life with c/c++ being my 2nd/3rd languages learned and I still struggle with it.

In addition to being monumentally difficult to anyone like me who hasn't been a hardcore c++ programmer there seems to be a unicode limitation put upon the lexing classes that read in characters from the document. I need to be able to lex specific extended characters in a special way and I can't figure out how to do so. Scintilla docs make some mention of this since all programming languages pretty much limit their significant (non comment or string) characters to the ASCII range.

I'm going with the container lexing approach now using Jacob's INI lexer as a starting point. I think I can come up with some abstractions to make equivalents of the Accessor and StyleContext classes that Scintilla gives to the c++ lexers. When I get it to the point that I have it with the c++ lexer I'll compare the 2 for speed differences.

Anyone have any ideas they'd like to see?