How to display (☺☻■▪◄●♪♦)

Topics: Developer Forum, User Forum
Aug 18, 2009 at 3:42 AM

Hi,

Does anyone know how to make ScintillaNet display smileys..

☺☻■▪◄●♪♦

 

Coordinator
Aug 18, 2009 at 5:31 PM

I have no problem just pasting this in. What font are you using?

Aug 18, 2009 at 10:12 PM

Consolas/courier new and then tried sans serif and Arial since your post...still comes up like ----|-----.

Are you pre or post Vista OS?

Coordinator
Oct 14, 2009 at 7:22 AM

Man, I forgot about this. Sorry about that. I'm on Windows 7. The characters that are showing up here aren't the low ASCII characters as I first thought, they are Unicode (may have been translated somewhere along the way from your posting). Now there are a lot of encoding issues I have to work out, but in the meantime if you want to display these particular characters do this:

scintilla.NativeInterface.SetCodePage(System.Text.UTF8Encoding.UTF8.CodePage);

Coordinator
Oct 14, 2009 at 7:38 AM

[workitem:20491] is tracking this

 

Oct 14, 2009 at 9:23 AM

Chris,
I do not know if it is the source of the problem, but ScintillaNET currently assumes that the working
code page is UTF-8, which is really not valid.

By default, Scintilla uses the code page of the OS, so I had to explicitly set it to the UTF-8, as part of the
initialization process. You may see issues from this on non-English Windows.

Best regards,
Paul.

Coordinator
Oct 14, 2009 at 6:37 PM

Yeah it is exactly that. UTF-8 does seems to work the best with Scintilla for everything I've ever used it for. Usually I'll try and set things to platform defaults but in this case we may have to break that pattern. Scintilla is kind of limited in it's character set capabilities, more investigation is needed to see what's really available and how we can best use it.

I would think the ideal default would actually be UTF-16 since that's what .NET strings are, what do you think?

 

Oct 14, 2009 at 8:04 PM

I think that Scintilla needs to display such characters bullets(▪) in their actual form-not a weird question mark or broken pipe symbol, ect..  UTF8 works great so does UTF32..I just tested them.  Culdn't find UTF16 in the "fix" that you have already supplied me..

 

scintilla.NativeInterface.SetCodePage(System.Text.UTF8Encoding.UTF8.CodePage);
Coordinator
Oct 14, 2009 at 9:36 PM

UTF-16 in .NET encoding namespaces is "Unicode". It doesn't work with Scintilla however.

 

Oct 14, 2009 at 10:19 PM
Edited Oct 14, 2009 at 11:29 PM

Chris,
In the Scintilla documentations, under the Other Settings (7th item on the first column), and as part of the
SCI_SETCODEPAGE/SCI_GETCODEPAGE

http://www.scintilla.org/ScintillaDoc.html#OtherSettings

the encoding is discussed. Some quotes from there

1. The default is SCI_SETCODEPAGE(0).

2. Code page SC_CP_UTF8 (65001) sets Scintilla into Unicode mode with the document treated as a
    sequence of characters expressed in UTF-8.

3. Setting codePage to a non-zero value that is not SC_CP_UTF8 is operating system dependent.  

I think the use of the UTF-8 internally for the Unicode support is due to the fact that good old Scintilla 
must work on all platforms including Windows 95/98 etc and GUI toolkits.

Best regards,
Paul.

Jan 9, 2012 at 3:21 PM
ChrisRickard wrote:

I would think the ideal default would actually be UTF-16 since that's what .NET strings are, what do you think?

101% agreed Chris!

Playing with byte[] strings is really not handy in the world of .NET, especially finding positions, characters, etc.
For example the logical usage:

_scintillaCtrl.Text[_scintillaCtrl.CurrentPos]

will fail in case of UniCode chars, as Text is a normal usable .NET String, while all *position* ints in scintilla are byte offsets! Besides this all chars have variable length, so convert back and forth are truly makes the code not clear and readable.

 

+1 VOTE for ".NET char" support in ScintillaNET

Coordinator
Jan 9, 2012 at 4:15 PM

I'm not sure how much can really be done with the current model. One of the biggest challenges is that native Scintilla only supports byte based positions throughout its entire API.

Say you wanted to grab 10 characters off of line 30. How would go about doing this?

  1. Start=Get the document position for line 30
  2. Create a text range for Start to Start + 10
  3. Get the text for the range

This works if your document is constrained to a single byte encoding. If you were using a double byte character set you'd get 5 characters. If using UTF8 you'd be fine unless you have a character that takes up more than one byte. In that case the last character will be lost and instead you'll have garbage. Unfortunately this kind of thinking is pervasive throughout the ScintillaNet API which makes it deeply flawed.

Scintilla stores its document content in either 1 byte ANSI, 2 byte DBCS which is really only good for certain far east encodings (it's usage predates Unicode), or UTF8. If it was UTF16 I could do a hack that doubles the byte size per character (then cross my finger I never come across text that extends beyond 2 btyes).

With variable length encoding you actually have to scan your string from the start and count characters until you get as many as you need. Scintilla doesn't provide any help with this. We need to get away from arbitrary text ranges for the purposes of processing text. A reader model much more closely resembles what you really need to do.

Coordinator
Jan 10, 2012 at 9:18 PM

I set out to solve this problem one day in the now defunct 3.0 branch. If anyone is feeling adventurous you are welcome to go digging around the TextAdapter class in that branch to see how it works. As I recall it was quite functional and almost complete. It contains methods such as ByteToCharIndex and CharToByteIndex (which when used correctly) can get you exactly what you're describing.

With an eye to performance it does several things to make the conversion as fast as possible. Again, going from memory it:

  • Uses unsafe code to read byte* pointers directly.
  • Uses the .NET Decorder and Encoder classes to work directly on the byte* and char* pointers to avoid unnecessary string/array creation.
  • Maintains an index of byte to char offsets for line starts to provide fast searching/lookups.

It is asking a lot to convert the entire ScintillaNET API to char offsets, however, maybe someone can pick up where I left off and at least provide a few additional APIs for the most common scenarios.

Cheers,
Jacob

 

Coordinator
Jan 11, 2012 at 12:04 AM
Nice to you from you!

I had done something similar too when setting out to make a lexer that was Unicode aware. The part I didn't do was caching the byte/position lookups. That would be critical otherwise the entire document would be scanned from the start for any arbitrary position lookups. I am suprised this hasn't been addressed within Scintilla itself.

On Tue, Jan 10, 2012 at 2:18 PM, jacobslusser <notifications@codeplex.com> wrote:

From: jacobslusser

I set out to solve this problem one day in the now defunct 3.0 branch. If anyone is feeling adventurous you are welcome to go digging around the TextAdapter class in that branch to see how it works. As I recall it was quite functional and almost complete. It contains methods such as ByteToCharIndex and CharToByteIndex (which when used correctly) can get you exactly what you're describing.

With an eye to performance it does several things to make the conversion as fast as possible. Again, going from memory it:

  • Uses unsafe code to read byte* pointers directly.
  • Uses the .NET Decorder and Encoder classes to work directly on the byte* and char* pointers to avoid unnecessary string/array creation.
  • Maintains an index of byte to char offsets for line starts to provide fast searching/lookups.

It is asking a lot to convert the entire ScintillaNET API to char offsets, however, maybe someone can pick up where I left off and at least provide a few additional APIs for the most common scenarios.

Cheers,
Jacob

Read the full discussion online.

To add a post to this discussion, reply to this email (ScintillaNET@discussions.codeplex.com)

To start a new discussion for this project, email ScintillaNET@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Jan 12, 2012 at 1:03 PM
Edited Jan 12, 2012 at 1:10 PM

Thanks guys for the reply.

It is asking a lot to convert the entire ScintillaNET API to char offsets, however, maybe someone can pick up where I left off and at least provide a few additional APIs for the most common scenarios.

You are right, from my usage (which is probably a general) the position and range would be the most important.
Btw, is there any plan about release date of 3.0?

Till that, this can be used:

Scintilla -> .Net Char Pos
int charPos = _scintilla.Encoding.GetCharCount(_scintilla.RawText, 0, sciPos);

revers:
int sciPos = _scintillaCtrl.Encoding.GetByteCount(_scintillaCtrl.Text.ToCharArray(), 0, charPos);