Prototype Lexer/Highlighter Available

Topics: Developer Forum, Project Management Forum, User Forum
Coordinator
Dec 24, 2008 at 3:13 AM
Just wanted to let everyone know I checked in some code that demonstrates how one would go about writing their own container-based lexer for Scintilla.NET. I frequently see questions on this topic in the forums and so I thought a sample might be helpful. I'm calling it a prototype because I'm sure it's not perfect but it should be a good starting point for those interested.

I've added a new item to the Language menu in SCide called INI which lexes INI text files. When selected it switched Scintilla into "container" lexer mode meaning that it doesn't use one of the built-in lexers packaged with SciLexer.DLL but instead raises the StyleNeeded event and allows custom lexing. As far as I know, the container lexer mode is just as capable as having a prebuilt lexer and is the primary (only?) means where we can add custom lexing capabilities in managed code. Each call to the StyleNeeded event is passed to the sample IniLexer class to style appropriately. I've placed plenty of comments in the code so it should be fairly self-documenting. Chageset 45206 contains everything you need.

BTW, there is already a lexer for this that ships with Scintilla but using that would defeat the purpose of the sample code now wouldn't it? :)

In addition to the sample code, I'm interested in reworking the Scintilla.NET API to abstract away the need to switch to container lexer at all. I thought it might be nice to created a base LexerProvider class that acts like a provider for every and all types of syntax files. We could have other providers that would interpret the syntax files for SciTE, Programmer's Notepad, Notepadd++ and other Scintilla based programs. Our own current configuration model would be just another type of provider. Custom lexers would also be able to inherit from this and using the container mode internally developers could easily create their own lexers by just overriding a couple methods. I would be intereseted in any other thoughts on the matter.


Merry Christmas!
Jacob
Apr 6, 2009 at 11:08 AM
Edited Apr 6, 2009 at 11:09 AM

Hi Jacob,

just used your ini-lexer example to set up my own highlighter. So far no big problems but one smaller issue:

I used the scintilla control as some kind of output window and as such it doesn't have the focus while data drops in. In this case it sometimes happens that the lexer won't be called for each line right when it arrives. You just got one call for the whole range and actually it performs just the first line. The rest will be formatted some times later (e.g more data comes in, control gots focus).

Due to this fact i made a little improvement to process the whole range instead of hoping scintilla will call us multiple line and after that change everything works like a charm. :-)

To be as much as compatible as possible to your example, just substitude your StyleNeeded() function with the following one:

 

public static void StyleNeeded(Scintilla scintilla, Range range)

 

 

  {

 

Line oneLine = range.StartingLine;

 

 

 

while (oneLine.Length > 0)

 

 

  {

 

// Create an instance of our lexer and bada-bing the line!

 

 

 

OutputLexer lexer = new OutputLexer(scintilla, oneLine.StartPosition, oneLine.Length);

 

 

  lexer.Style();

oneLine = oneLine.Next;

}

}

 

Best regards,
Oliver

Coordinator
Apr 9, 2009 at 11:13 PM
Thanks OliverM. I've also noticed that Scintilla seems to call (or not call) StyleNeeded at unpredictable times. I'm sure if I put a little more thought into it I could figure out the pattern. Until then, I'm sure others will find your code snippet helpful.

Jacob

Dec 8, 2010 at 3:37 AM
Edited Dec 8, 2010 at 3:51 AM

I know this is an old post, but it's the single best reference I've found for custom highlighting from managed code.

I'd just like to share where I got from your sample code, in case it might help somebody else.

Thanks!

 

using System;
using System.Text;
using System.Drawing;
using System.Text.RegularExpressions;
using ScintillaNet;

namespace PwrIDE
{
  class PwrLexer
  {
    private const int ST_DEFAULT  = 32;
    private const int ST_DATATYPE = 11;
    private const int ST_LITERAL  = 12;
    private const int ST_STRING   = 13;
    private const int ST_SECTION  = 14;
    private const int ST_RECORD   = 15;
    private const int ST_COMMENT  = 16;

    private static Regex RegWhiteSpace   = new Regex(@"^\s+");
    private static Regex RegDataType     = new Regex(@"^(INT|CHAR|RATE|DATE|FLOAT|MONEY|VOID|CONST)[^A-Z0-9]");
    private static Regex RegSection      = new Regex(@"^(GLOBAL|SETUP|SELECT|ORDER|PROCESS|FINAL|PROCEDURE|DO|END)[^A-Z0-9]");
    private static Regex RegRecordA      = new Regex(@"^(ACCOUNT|SHARE|LOAN|CARD|PREFERENCE|NOTE)[^A-Z0-9]");
    private static Regex RegRecordB      = new Regex(@"^(ACCOUNT|SHARE|LOAN|CARD|PREFERENCE|NOTE)(\.[A-Z0-9]+)+");
    private static Regex RegRecordC      = new Regex(@"^(ACCOUNT|SHARE|LOAN|CARD|PREFERENCE|NOTE)(:[A-Z0-9]+)+(\.[A-Z0-9]+)*");
    private static Regex RegLiteralDate  = new Regex(@"^'[0-9\-]{1,2}\/[0-9\-]{1,2}\/[0-9\-]{2,4}'");
    private static Regex RegLiteralMoney = new Regex(@"^\$[0-9,]+\.[0-9]{2}");
    private static Regex RegLiteralRate  = new Regex(@"^[0-9]+\.[0-9]+%");
    private static Regex RegLiteralFloat = new Regex(@"^[0-9]+\.[0-9]+");
    private static Regex RegLiteralInt   = new Regex(@"^[0-9]+");
    private static Regex RegWord         = new Regex(@"^[A-Z][A-Z0-9]+");

    public static void Init(Scintilla scintilla)
    {
      scintilla.Indentation.SmartIndentType = SmartIndent.None;
      scintilla.ConfigurationManager.Language = String.Empty;
      scintilla.Lexing.LexerName = "container";
      scintilla.Lexing.Lexer = Lexer.Container;

      scintilla.Styles[ST_DEFAULT ].FontName =
      scintilla.Styles[ST_DATATYPE].FontName =
      scintilla.Styles[ST_LITERAL ].FontName =
      scintilla.Styles[ST_STRING  ].FontName =
      scintilla.Styles[ST_SECTION ].FontName =
      scintilla.Styles[ST_RECORD  ].FontName =
      scintilla.Styles[ST_COMMENT ].FontName = "Consolas";
      scintilla.Styles[ST_DEFAULT ].ForeColor = Color.Black;
      scintilla.Styles[ST_DATATYPE].ForeColor = Color.Navy;
      scintilla.Styles[ST_LITERAL ].ForeColor = Color.Cyan;
      scintilla.Styles[ST_STRING  ].ForeColor = Color.Maroon;
      scintilla.Styles[ST_SECTION ].ForeColor = Color.Blue;
      scintilla.Styles[ST_RECORD  ].ForeColor = Color.Orange;
      scintilla.Styles[ST_COMMENT ].ForeColor = Color.Green;
      scintilla.Styles[ST_SECTION ].Bold      = true;
    }

    public static void StyleNeeded(Scintilla scintilla, Range range)
    {
      int start = range.StartingLine.StartPosition;
      int end   = start;
      int max   = scintilla.Lines[scintilla.Lines.Count-1].StartPosition + scintilla.Lines[scintilla.Lines.Count-1].Length;

      //we'll get the whole update range at once
      //we also get the maximum editor range, incase a new comment goes to the end of the file
      Line curr = range.StartingLine;
      while (curr.Number <= range.EndingLine.Number)
      {
        end += curr.Length;
        curr = curr.Next;
      }

      StyleSection(scintilla, start, end, max);
    }

    public static void StyleSection(Scintilla scintilla, int start, int end, int max)
    {
      int pos = start;
      while (pos < end)
      {
        string curr = scintilla.GetRange(pos, end).Text.ToUpper();

        //make a couple of direct checks for special hanlding (comments/string literals) pass the rest to a RegEx handler
        if((curr.Length>1) && (curr[0] == '/') && (curr[1] == '*'))       pos += StyleCommentMulti( scintilla,                                     pos, end, max);
        else if ((curr.Length>1) && (curr[0] == '/') && (curr[1] == '/')) pos += StyleCommentSingle(scintilla,                                     pos, end, max);
        else if (curr[0] == '"')                                          pos += StyleString(       scintilla, curr,                               pos, end, max);
        else if (RegDataType.IsMatch(curr))                               pos += StyleRegExSection( scintilla, curr, RegDataType    , ST_DATATYPE, pos, end, max);
        else if (RegSection.IsMatch(curr))                                pos += StyleRegExSection( scintilla, curr, RegSection     , ST_SECTION , pos, end, max);
        else if (RegRecordC.IsMatch(curr))                                pos += StyleRegExWhole(   scintilla, curr, RegRecordC     , ST_RECORD  , pos, end, max);
        else if (RegRecordB.IsMatch(curr))                                pos += StyleRegExWhole(   scintilla, curr, RegRecordB     , ST_RECORD  , pos, end, max);
        else if (RegRecordA.IsMatch(curr))                                pos += StyleRegExSection( scintilla, curr, RegRecordA     , ST_RECORD  , pos, end, max);
        else if (RegWhiteSpace.IsMatch(curr))                             pos += StyleRegExWhole(   scintilla, curr, RegWhiteSpace  , ST_DEFAULT , pos, end, max);
        else if (RegLiteralDate.IsMatch(curr))                            pos += StyleRegExWhole(   scintilla, curr, RegLiteralDate , ST_LITERAL , pos, end, max);
        else if (RegLiteralMoney.IsMatch(curr))                           pos += StyleRegExWhole(   scintilla, curr, RegLiteralMoney, ST_LITERAL , pos, end, max);
        else if (RegLiteralRate.IsMatch(curr))                            pos += StyleRegExWhole(   scintilla, curr, RegLiteralRate , ST_LITERAL , pos, end, max);
        else if (RegLiteralFloat.IsMatch(curr))                           pos += StyleRegExWhole(   scintilla, curr, RegLiteralFloat, ST_LITERAL , pos, end, max);
        else if (RegLiteralInt.IsMatch(curr))                             pos += StyleRegExWhole(   scintilla, curr, RegLiteralInt  , ST_LITERAL , pos, end, max);
        else if (RegWord.IsMatch(curr))                                   pos += StyleRegExWhole(   scintilla, curr, RegWord        , ST_DEFAULT , pos, end, max);
        else                                                              pos++;
      }
    }

    public static int StyleCommentMulti(Scintilla scintilla, int start, int end, int max)
    {
      //'2's : skip over initial "/*"
      string full = scintilla.GetRange(start, max).Text.ToUpper();
      int offset = 2; int depth = 1;
      while ((depth > 0) && (offset+start-1 < max))
      {
        if ((full[offset] == '/') && (full[offset+1] == '*'))
        {
          depth++;
          offset++;
        }
        else if ((full[offset] == '*') && (full[offset+1] == '/'))
        {
          depth--;
          offset++;
        }
        offset++;
      }
      ((INativeScintilla)scintilla).StartStyling(start, 0x1F);
      ((INativeScintilla)scintilla).SetStyling(offset, ST_COMMENT);
      return offset;
    }

    public static int StyleCommentSingle(Scintilla scintilla, int start, int end, int max)
    {
      string full = scintilla.GetRange(start, max).Text.ToUpper();
      int offset = 0;
      while ((full[offset] != '\r') && (full[offset] != '\n') && (start+offset < max))
        offset++;
      ((INativeScintilla)scintilla).StartStyling(start, 0x1F);
      ((INativeScintilla)scintilla).SetStyling(offset, ST_COMMENT);
      return offset;
    }

    public static int StyleString(Scintilla scintilla, string text, int start, int end, int max)
    {
      string full = scintilla.GetRange(start, max).Text.ToUpper();
      int offset = 1;
      while ((full[offset] != '\r') && (full[offset] != '\n') && (full[offset] != '"') && (start + offset < max))
        offset++;
      offset++;
      ((INativeScintilla)scintilla).StartStyling(start, 0x1F);
      ((INativeScintilla)scintilla).SetStyling(offset, ST_STRING);
      return offset;
    }

    public static int StyleRegExWhole(Scintilla scintilla, string text, Regex reg, int style, int start, int end, int max)
    {
      //match & style an entire regex
      string match = reg.Match(text).Value;
      ((INativeScintilla)scintilla).StartStyling(start, 0x1F);
      ((INativeScintilla)scintilla).SetStyling(match.Length, style);
      return match.Length;
    }

    public static int StyleRegExSection(Scintilla scintilla, string text, Regex reg, int style, int start, int end, int max)
    {
      //match & style first subgroup of regex
      string match = reg.Match(text).Groups[1].Value;
      ((INativeScintilla)scintilla).StartStyling(start, 0x1F);
      ((INativeScintilla)scintilla).SetStyling(match.Length, style);
      return match.Length;
    }
  }
}

 

Dec 8, 2010 at 5:24 AM

I tried this but it didn't seem to make a difference in my document. Could you give an example of how to use it to highlight something?

Thanks.

Dec 8, 2010 at 5:54 AM

When you first load scintilla_control/your_form, do a:

yourLexer.Init(scintilla_control);
scintilla_control.Lexing.Colorize(); //this is just to get the initial colorization right after loading a file, the rest will take care of itself

in my case this was:

scintilla.Text = file_contents;
PwrLexer.Init(scintilla);
scintilla.Lexing.Colorize();

Aug 8, 2013 at 3:34 AM
Edited Aug 8, 2013 at 3:42 AM
I'm using a Modified for of PwerLexer converted to VB

Dragging the Virtical Scroll Bar Freezes the Application till it catches up... Arrow Keys,MouseWheel and Scroll Buttons are smooth scrolling.

It lags badly with 9k lines

Is there any way To Fix the Scroll Drag so its smoother?
Imports System.Text
Imports System.Drawing
Imports System.Text.RegularExpressions
Imports ScintillaNET
Imports DSeX.ConfigStructs
Public Class PwrLexer

    Private Shared Lock As Boolean = False
    ' Origional Thread for this class
    'http://scintillanet.codeplex.com/discussions/42949
    Private Const ST_DEFAULT As Integer = 32
    Private Const ST_STRING_VAR As Integer = 11
    Private Const ST_NUM_VAR As Integer = 12
    Private Const ST_STRING As Integer = 13
    Private Const ST_ID As Integer = 14
    Private Const ST_NUMBER As Integer = 15
    Private Const ST_COMMENT As Integer = 16
    Private Const ST_HEADER As Integer = 17

    Private Shared HEADER As String = KeysIni.GetKeyValue("MS-General", "Header")
    Private Shared RegWhiteSpace As New Regex("^\s+") '\s+
    Private Shared RegStrVar As New Regex("^~([A-Za-z0-9_]+)", RegexOptions.IgnoreCase)
    Private Shared RegNumVar As New Regex("^%([A-Za-z0-9_]+)", RegexOptions.IgnoreCase)
    Private Shared RegString As New Regex("^\{(.*?)\}")
    Private Shared RegLineID As New Regex("^\(([0-9]*)\:([0-9]*)\)")
    Private Shared RegNumber As New Regex("^([0-9#]+)")
    Private Shared RegHeader As New Regex("^(" + HEADER + ")", RegexOptions.IgnoreCase)

    Public Shared Sub Init(scintilla As Scintilla)
        scintilla.Indentation.SmartIndentType = SmartIndent.None
        scintilla.ConfigurationManager.Language = [String].Empty
        scintilla.Lexing.LexerName = "container"
        scintilla.Lexing.Lexer = Lexer.Container

        scintilla.Styles(ST_DEFAULT).ForeColor = Color.Black
        scintilla.Styles(ST_STRING_VAR).ForeColor = EditSettings.StringVariableColor
        scintilla.Styles(ST_NUM_VAR).ForeColor = EditSettings.VariableColor
        scintilla.Styles(ST_STRING).ForeColor = EditSettings.StringColor
        scintilla.Styles(ST_COMMENT).ForeColor = EditSettings.CommentColor
        scintilla.Styles(ST_ID).ForeColor = EditSettings.IDColor
        scintilla.Styles(ST_NUMBER).ForeColor = EditSettings.NumberColor
        scintilla.Styles(ST_HEADER).ForeColor = Color.Green
        scintilla.Styles(ST_HEADER).Bold = True
    End Sub

    Public Shared Sub StyleNeeded(ByRef scintilla As Scintilla, ByRef range As Range)

        Dim start As Integer = range.StartingLine.StartPosition
        Dim [end] As Integer = start
        Dim max As Integer = scintilla.Lines(scintilla.Lines.Count - 1).StartPosition + scintilla.Lines(scintilla.Lines.Count - 1).Length
        Debug.Print("StyleNeededEventArgs()")
        'we'll get the whole update range at once
        'we also get the maximum editor range, incase a new comment goes to the end of the file
        'Dim curr As Line = range.StartingLine
        'While curr.Number <= range.EndingLine.Number
        '    [end] += curr.Length
        '    curr = curr.[Next]
        'End While
        Dim test As String = range.Text
        [end] += test.Length
        StyleSection(scintilla, start, [end], max, test)

    End Sub

    Public Shared Sub StyleSection(ByRef scintilla As Scintilla, ByRef start As Integer, ByRef [end] As Integer, ByRef max As Integer, ByRef Txt As String)
        Dim pos As Integer = start
        Dim i As Integer = 0
        While pos < [end]

            Dim curr As String = scintilla.GetRange(pos, [end]).Text

            'make a couple of direct checks for special handling (comments/string literals) pass the rest to a RegEx handler
            'If (curr.Length > 1)' AndAlso (curr(0) = "*"c) Then
            ' pos += StyleCommentSingle(scintilla, pos, [end], max)
            If RegHeader.IsMatch(curr) Then
                pos += StyleRegExWhole(scintilla, curr, RegHeader, ST_HEADER, pos, [end], max)

                'ElseIf (curr(0) = "{"c) Then
                '    pos += StyleString(scintilla, curr, pos, [end], max)

                'ElseIf RegStrVar.IsMatch(curr) Then
                '    pos += StyleRegExWhole(scintilla, curr, RegStrVar, ST_STRING_VAR, pos, [end], max)

                'ElseIf RegNumVar.IsMatch(curr) Then
                '    pos += StyleRegExWhole(scintilla, curr, RegNumVar, ST_NUM_VAR, pos, [end], max)

                'ElseIf RegLineID.IsMatch(curr) Then
                '    pos += StyleRegExWhole(scintilla, curr, RegLineID, ST_ID, pos, [end], max)

                'ElseIf RegNumber.IsMatch(curr) Then
                '    pos += StyleRegExWhole(scintilla, curr, RegNumber, ST_NUMBER, pos, [end], max)

                'ElseIf RegWhiteSpace.IsMatch(curr) Then
                '    pos += StyleRegExWhole(scintilla, curr, RegWhiteSpace, ST_DEFAULT, pos, [end], max)
            Else

                Select Case curr(0)
                    'Case vbLf
                    '    pos += 1
                    'Case vbCr
                    '    pos += 1
                    Case "*"c
                        i = StyleCommentSingle(scintilla, pos, [end], max)
                    Case "("c
                        i = StyleRegExWhole(scintilla, curr, RegLineID, ST_ID, pos, [end], max)

                    Case "%"c
                        i = StyleRegExWhole(scintilla, curr, RegNumVar, ST_STRING_VAR, pos, [end], max)

                    Case "0"c To "9"c
                        i = StyleRegExWhole(scintilla, curr, RegNumber, ST_NUMBER, pos, [end], max)
                    Case "{"c
                        i = StyleString(scintilla, curr, pos, [end], max)
                    Case "~"c
                        i = StyleRegExWhole(scintilla, curr, RegStrVar, ST_STRING_VAR, pos, [end], max)

                    Case Else
                        i = 0
                End Select
                If i = 0 Then
                    DirectCast(scintilla, INativeScintilla).StartStyling(pos, &H1F)
                    DirectCast(scintilla, INativeScintilla).SetStyling(1, ST_DEFAULT)
                    pos += 1
                Else
                    pos += i
                End If
            End If
        End While
    End Sub

    Public Shared Function StyleHeader(ByRef scintilla As Scintilla, ByRef start As Integer, ByRef [end] As Integer, ByRef max As Integer) As Integer
        Dim full As String = scintilla.GetRange(start, max).Text.ToUpper()
        Dim offset As Integer = 0
        While (full(offset) <> ControlChars.Cr) AndAlso (full(offset) <> ControlChars.Lf) AndAlso (start + offset < max)
            offset += 1
        End While
        DirectCast(scintilla, INativeScintilla).StartStyling(start, &H1F)
        DirectCast(scintilla, INativeScintilla).SetStyling(offset, ST_HEADER)
        Return offset
    End Function
    Public Shared Function StyleCommentSingle(ByRef scintilla As Scintilla, ByRef start As Integer, ByRef [end] As Integer, ByRef max As Integer) As Integer
        Dim full As String = scintilla.GetRange(start, max).Text.ToUpper()
        Dim offset As Integer = 0
        While (full(offset) <> ControlChars.Cr) AndAlso (full(offset) <> ControlChars.Lf) AndAlso (start + offset < max - 1)
            offset += 1
        End While
        DirectCast(scintilla, INativeScintilla).StartStyling(start, &H1F)
        DirectCast(scintilla, INativeScintilla).SetStyling(offset, ST_COMMENT)
        Return offset
    End Function

    Public Shared Function StyleString(ByRef scintilla As Scintilla, ByRef text As String, ByRef start As Integer, ByRef [end] As Integer, ByRef max As Integer) As Integer
        Dim full As String = scintilla.GetRange(start, max).Text.ToUpper()
        Dim offset As Integer = 1
        While (full(offset) <> ControlChars.Cr) AndAlso (full(offset) <> ControlChars.Lf) AndAlso (full(offset) <> "}"c) AndAlso (start + offset < max - 1)
            offset += 1
        End While
        offset += 1
        DirectCast(scintilla, INativeScintilla).StartStyling(start, &H1F)
        DirectCast(scintilla, INativeScintilla).SetStyling(offset, ST_STRING)
        Return offset
    End Function

    Public Shared Function StyleRegExWhole(ByRef scintilla As Scintilla, ByRef text As String, ByRef reg As Regex, ByRef style As Integer, ByRef start As Integer, ByRef [end] As Integer, _
     ByRef max As Integer) As Integer
        'match & style an entire regex
        Dim match As String = reg.Match(text).Value
        DirectCast(scintilla, INativeScintilla).StartStyling(start, &H1F)
        DirectCast(scintilla, INativeScintilla).SetStyling(match.Length, style)
        Return match.Length
    End Function

    Private Shared Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
        target = value
        Return value
    End Function
End Class
Coordinator
Aug 8, 2013 at 5:55 PM
@Gerolkae,

Your "scrolling performance" issue is really a "lexer performance" issue. To successively attempt to match regular expressions against the unstyled content will simply take too long. In short, that approach doesn't scale... and a quick scroll on a long document makes that obvious.

This thread is old and we have better examples now on how to write a custom lexer. You should check out @blah38621's WPF branch. The WPF portions of it are irrelevant for you, but the custom lexer implementations show best practices. He also details some of his work in this thread:

https://scintillanet.codeplex.com/discussions/435975#post1012843


Cheers,
Jacob
Aug 9, 2013 at 7:09 PM
Thanks!