This project is read-only.

Custom Lexer Unicode

Topics: Developer Forum, User Forum
Aug 9, 2013 at 7:53 PM
I'm Building a custom lexer around the winforms CustomLexer class and I've noticed ConsumeString Doesn't account for Unicode Bytes...

ConsumeString(STYLE_STRING, '\0', false, false, '}'); Has Coloring a few characters before the {} brackets I'm using to Delimt strings

How would I compensate for the Byte Size and character Indexing?
Aug 9, 2013 at 10:07 PM
It should already account for Unicode characters, a char is 2-bytes in .net, and strings are already in Unicode. Are you sure you don't have a block of 2 bytes that are both NUL ('\0')? If you need to have NUL characters as a valid piece of a string (I don't know of any programming languages that this is allowed in) then use a different character as the escape character, or else re-write the ConsumeString function so that it doesn't account for any form of escape sequence. Also, are you sure that you have loaded the string your working with as unicode in the first place, because if it was loaded as utf8 or ascii, that would cause problems.
Aug 10, 2013 at 1:20 AM
Edited Aug 13, 2013 at 12:57 AM
I'm Loading a text file with stream reader which has special characters...
It's a Custom Script called Dragon speak for a MMORPG called Furcadia. The Game supports some Unicode characters

I'm a novice at Programming and I'm not sure exactly whats Happening but I'm trying to build a custom Editor for the script
Aug 10, 2013 at 6:04 PM
The degree sign there is actually in UTF-8, but everything I'm looking at says it should work..... If it's shifting the highlighting over then it means that a character isn't getting properly Consume()'d somewhere in the string..... Modify the ConsumeString method and check if the current character is larger than 0x7F, if it is, add 1 to the number of characters being consumed. That should (hopefully) detect the UTF-8 character and consume both bytes of it.
Aug 12, 2013 at 12:56 AM
Edited Aug 12, 2013 at 2:43 AM
Adding this Code helped the Shifting But now just on the one Line Characters After the Degree sing show up as default

Side Note: I added a dragonspeak.xml file and set my customlexer to dragonspeak.. But everything there appears as default Hightlighting Things I'm Working on with ScintillaNET https://dsex.codeplex.com/discussions/453114
            else if ((char)CurrentCharacter > 0x7F)
            {
                Consume();
            }
Aug 13, 2013 at 12:56 AM
Edited Aug 13, 2013 at 12:58 AM
New problem with Highlight Alignment
I'm attempting to highlight 0-9#- Characters But One Char before the hightleted character is highlighted

"DSPK V4.00" The V is Hhisglighted and the dor is highlighted but the last 0 is not.. Any Idea whats happening?
using System;
using System.Drawing;
using System.Collections.Generic;

namespace ScintillaNET.Lexers
{
    public sealed class dsLexer : CustomLexer
    {
        public override string LexerName { get { return "dragonspeak"; } }
        
        private const int STYLE_STRING = 11;
        private const int STYLE_NUMBER = 12;
        private const int STYLE_COMMENT = 14;
        private const int STYLE_NUM_VAR = 15;
        private const int STYLE_STR_VAR = 16;
        private const int STYLE_HEADER = 17;

        public dsLexer(Scintilla scintilla) : base(scintilla) { }

        private enum State : int
        {
            Unknown = STATE_UNKNOWN,
            String,
            Comment,
            Number,
            StringVar,
            NumVar,
            Header
        }

        new private State CurrentState
        {
            get { return (State)base.CurrentState; }
            set { base.CurrentState = (int)value; }
        }

        public override Dictionary<string, int> StyleNameMapping
        {
            get
            {
                return new Dictionary<string, int>()
                {
                    { "Comment", STYLE_COMMENT },
                    { "Number", STYLE_NUMBER },
                    { "String", STYLE_STRING },
                    { "StringVariable", STYLE_STR_VAR},
                    { "NumberVariable", STYLE_NUM_VAR},
                    { "Header", STYLE_HEADER}
                };
            }
        }
        public override Dictionary<string, int> KeywordNameMapping
        {
            get
            {
                return new Dictionary<string, int>();
            }
        }
     
        protected override void Initialize()
        {

            base.Initialize();
        }

        protected override void InitializeStateFromStyle(int style)
        {
            switch (style)
            {
                case STYLE_STRING:
                    CurrentState = State.String;
                    break;
                // Otherwise we don't need to carry the
                // state on from the previous line.
                case STYLE_HEADER:
                    CurrentState = State.Header;
                    break;
                case STYLE_NUM_VAR:
                    CurrentState = State.NumVar;
                    break;
                default:
                    break;
            }
        }

        protected override void Style()
        {
            StartStyling();

            while (!EndOfText)
            {
                switch (CurrentState)
                {
                    case State.Unknown:
                        bool consumed = false;
                        switch (CurrentCharacter)
                        {
                            case '0':
                            case '1':
                            case '2':
                            case '3':
                            case '4':
                            case '5':
                            case '6':
                            case '7':
                            case '8':
                            case '9':
                            case '#':
                            case '-':
                                 CurrentState = State.Number;
                                 break;
                            case '{':
                                CurrentState = State.String;
                                break;
                            case '*':
                                CurrentState = State.Comment;

                                break;
                            case '%':
                                CurrentState = State.NumVar;
                                //Consume();
                                //SetStyle(STYLE_NUM_VAR);
                                //consumed = true;
                                break;
                            case '~':
                                CurrentState = State.StringVar;
                                //Consume();
                                //SetStyle(STYLE_STR_VAR);
                                //consumed = true;
                                break;
                            default:
                                if (IsWhitespace(CurrentCharacter))
                                {
                                    ConsumeWhitespace();
                                    consumed = true;
                                }
                                else
                                {
                                    SetStyle(STYLE_DEFAULT);
                                }
                                break;
                        }
                        if (!consumed)
                            Consume();
                        break;
                    case State.Comment:
                        ConsumeUntilEOL(STYLE_COMMENT);
                        CurrentState = State.Unknown;
                        break;
                    case State.String:
                        // We're using '\0' as the escape character because a NUL character
                        // doesn't usually appear in text-based documents.
                        ConsumeString(STYLE_STRING, '\0',false, false, '}');
                        consumed = true;
                        CurrentState = State.Unknown;
                        break;
                    case State.NumVar:
                        if (!IsIdentifier(CurrentCharacter))
                        {
                            CurrentState = State.Unknown;
                            Consume();
                            SetStyle(STYLE_DEFAULT);
                        }
                        else
                        {
                            Consume();
                            SetStyle(STYLE_NUM_VAR); 
                            
                        }

              
                        break;
                    case State.StringVar:
                        if (!IsIdentifier(CurrentCharacter))
                        {
                            CurrentState = State.Unknown;
                            SetStyle(STYLE_DEFAULT);
                        }
                        else
                        {
                            SetStyle(STYLE_STR_VAR);
                            Consume();
                        }
                        
                       
                        break;
                    case State.Number:
                          if (IsNum(CurrentCharacter))
                          {
                             
                            SetStyle(STYLE_NUMBER);
                         Consume(); 
                            }
                          else
                          {
                              CurrentState = State.Unknown;
                             
                              SetStyle(STYLE_DEFAULT);
                                Consume(); 
                          }
                          break;
                    default:
                        throw new Exception("Unknown state!");
                }
            }

            switch (CurrentState)
            {
                case State.Unknown: break;
                case State.Comment:
                    SetStyle(STYLE_COMMENT);
                    break;
                case State.NumVar:
                    SetStyle(STYLE_NUM_VAR);
                    break;
                case State.String:
                    SetStyle(STYLE_STRING);
                    StyleNextLine(); // Continue the style to the next line
                    break;
                case State.Number:
                       SetStyle(STYLE_NUMBER);
                      Consume();
                       break;
                default:
                    throw new Exception("Unknown state!");
            }
        }

     bool IsNum(char c)
        {
            switch (c)
            {
                case '0':
                case '1':
                case '2':
                case '3':
                case '4':
                case '5':
                case '6':
                case '7':
                case '8':
                case '9':
                case '#':
                case '-':
                    return true;
                default:
                    return false;
            }
        }

        
    }
}
Aug 14, 2013 at 10:06 AM
Fixed the problem...
I was loading an ASCII file into the editor with out converting t to UTF8