This project is read-only.

Selection.Start / TextLength + UTF-8?

Topics: Developer Forum
Jul 1, 2010 at 5:08 PM
Edited Jul 1, 2010 at 8:32 PM


I'm currently writing an IntelliSense-implementation for my new program; but there is one problem I'm not able solve:
When I paste one german umlaut (e.g. "ä" or "ö") first all works well...but when I call Selection.Start it seems to return a wrong value.

I've tested the following:

Text in control: "ä\r\n"
Scintilla.TextLength: 4
Scintilla.Selection.Start: 4

It would be better if Unicode-characters are counted as single characters, not as bytes...or is there any property/function that returns the real count of characters (e.g. "1", not "2" for "ä")?

Or is it possible to use ANSI (which would be much easier)?
When I set Scintilla.Encoding = System.Text.Encoding.Default it throws an Exception; using the NativeInterface.SetCodePage()-Function works, but all pasted german umlauts are displayed as "E4" or similarly; typed characters are correctly displayed (!).
Do I have to convert every character pasted by program to ANSI (Encoding.Convert(...)) or is there another way to use ANSI?

Please help me, I've written an entire lexer for my new language, an own IntelliSense-implementation and a folding-function, but I'm failing at this simple problem... :-)

Best regards,

PS: If there are mistakes in my language: Sorry...I'm german and recently in the 11th class. ;-)

EDIT: Is there a bug in this forum? All my newlines aren't displayed...?
Jul 1, 2010 at 6:13 PM

By default ScintillaNET will use UTF-8 encoding. I wouldn't recommend changing it because that is the only way to support Unicode -- a requirement for your application. As you discovered, the TextLength and similar properties return the byte length, not the text length. It's a nuisance for anyone doing Unicode and something that I would like to correct in a future version of ScintillaNET. Until then your only reliable option (that I'm aware of) is to get the complete text and count the chars up until the selection point. For example:

byte[] bytes = scintilla.RawText;
int selectionStart = scintilla.Selection.Start;
Encoding encoding = scintilla.Encoding;

selectionStart = encoding.GetCharCount(bytes, 0, selectionStart);

That should get you an accurate char index. I would recommend caching this operation (as much as is possible) so as not to cause an impact on performance.



Jul 1, 2010 at 8:29 PM
Edited Jul 1, 2010 at 8:31 PM


First thank you for the quick answer! :-) OK, I can move to UTF-8...and then let the compiler convert it to ANSI (it's an AI-editor for my favourite-game, which is very old: Age of Empires II).
Yes, caching is necessary - before using Scintilla my application was slow, now it's very fast; I don't want to make it slow again... ;-)

Thanks for the easy source-code-example - with the correct implementation this should be the fastest way! :-)

By the way: ScintillaNET is a very useful project; writing an own lexer was difficult (the third programming language in my program: VB.NET, C# and C++), but now it's 10x faster than previous.
Keep it up! :-)


EDIT: Ah, I have to use HTML-Tags to format my post... =)