[solved] Extending the HTML-Lexer

Aug 24, 2008 at 12:49 PM


after getting Scintilla to work, I tried to use the HTML Lexer. Unfortunately it did not contain the PHP functions So I modified the Scintilla Lexer so that it takes another keywordlist with the PHP functions. My code looks something like


static const char * const htmlWordListDesc[] = {
"HTML elements and attributes",
"JavaScript keywords",
"VBScript keywords",
"Python keywords",
"PHP keywords",
"PHP keywords (2)",
"SGML and DTD keywords",

For this to work, I had to add another constant in SciLexer.h, named SCE_HPHP_WORD2 with value 128. After reading the docs it odducred to me, that 127 was the maximum number of styles a lexer can own. So I removed all Python constants and the correspondig code fragments in the lexer and set SCE_HPHP_WORD2 to 99. Compiles smoothly.

Back in ScintillaNET I added another Keyword-Element in the Configuration/Builtin/html.xml

<Keywords List="5" Name="PHP">defined</Keywords>

(Yeah, "defined" just for testing.)

Also I opened Configuration/Builtin/LexerStylerNames/hypertext.txt and added:

PHP.WORD2 = 99

Just to be sure, I copied that file to html.txt.

Now ScintillaNET compiles also fine, but my keyword list is not handed to Scintilla itself. For testing, I added

WordList* keywords6 = new WordList();
delete keywordlists[5];
keywordlists[5] = keywords6;

in ColouriseHyperTextDoc(). If I set this list to the main PHP keyword list, only the word "defined" gets highlighted. So my changes to the lexer were correct. Even if I add this list (keywords6) and let it be colored with SCE_HPHP_WORD, the "defined" gets blue, additionally to the other keywords. Great. But I cannot get the keywords out of the html.xml from ScintillaNET to work.

Additionally, I cannot set the style for WORD2. I used the following ScintillaNet.xml according to the documentation:

<?xml version="1.0" encoding="utf-8" ?>
<Language Name="default">
<Style Name="Default" FontName="Consolas"/>
<Language Name="html">

<Indentation TabWidth="4" UseTabs="false"/>


<SubLanguage Name="PHP">
<Style Name="HSTRING" ForeColor="Red" BackColor="White"/>
<Style Name="SIMPLESTRING" ForeColor="Red" BackColor="White"/>
<Style Name="WORD" ForeColor="Blue" BackColor="White"/>
<Style Name="WORD2" ForeColor="#FF00FF" BackColor="White"/>



Changing the values results in different colors in the window. So setting them there is correct. But "WORD2" is not recognized. :-( When I add "Number="99"", it works! So setting the style number directly works fine.

So in the end, it works, but those 2 problems (keywordlists and accessing the style names) are a bit annoying. Setting the keywordlist direkt in the Colourise*-function is all but high-performance. Any suggestions?

Aug 24, 2008 at 2:53 PM
Edited Aug 24, 2008 at 3:21 PM
OK, if I specify the new keywords (PHP functions) in my ScintillaNet.xml, it works. The color must still be set with the corresponding style number (99), but at least I can set the function names. :-)
Aug 25, 2008 at 11:46 PM
If you want to use the style name instead of number open up Configuration/BuiltIn/LexerStyleNames/Html.txt. This is where the mapping between names and numbers are configured. Find style # 99 and change the name to WORD2.

This setup is a bit awkward and you can't really configure it outstide of the builtin configuration (if I remember correctly) because I didn't really build in support for either external lexers or changes to the default Scintilla lexers.
Aug 26, 2008 at 11:12 AM
OK, shame on me. The problem was simple: Building my project copied the old ScintillaNET.dll (which I copied back then when I started to my project directory). So the newly compiled dll never got to run. Linking the output file from the ScintillaNET project works now fine. :-)