[Suggestion]Encoding Types

Mar 16, 2012 at 10:16 PM
Edited Mar 17, 2012 at 12:01 AM

Now that there've been some updates, would be nice to have support for more encoding types lile BigEndianUnicode, UTF-32, etc. (may all from this list)

thx

Coordinator
Mar 17, 2012 at 12:27 AM

The encodings we can support are limited by the encodings that the native Scintilla component can support. See the information about code pages in the original Scintilla documentation here: http://www.scintilla.org/ScintillaDoc.html#SCI_SETCODEPAGE.

Of course, just because ScintillaNET doesn't support it doesn't mean that you can't write your own load and save routines that will do any encoding conversion you wish.

 

Jacob

 

Mar 17, 2012 at 12:45 AM
Edited Mar 17, 2012 at 1:17 AM

I did but i have problems with that because i need to use iso-8859-1 (28591) on the scintilla component because the conversion from UTF-8 to this encoding format has some problems.

Ex:

UTF-8: "Ðåøèë ñäåëàòü ñèñòåìó áàíà ïî äíÿì íà MySQL"

UTF-8 to iso-8859-1: "?????????? ?????????????? ?????????????? ???????? ???? ???????? ???? MySQL"

 

edit:

doing some things:

UTF-8 to iso-8859-1: "Ðåøèë ñäåëàòü ñèñòåìó áàíà ïî äíÿì íà MySQL"

Coordinator
Mar 17, 2012 at 2:11 AM

Latin1? You shouldn't be having any problems with that either in Scintilla or .NET string handling. All the "ANSI" character sets are fully supported by Scintilla. Can you provide us with some code that demonstrates the behavior?

Mar 17, 2012 at 2:56 AM

Scintilla config:

.Encoding = System.Text.Encoding.UTF8

 

Document Load:

 

    Private Sub OpenToolStripMenuItem_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles OpenToolStripMenuItem.Click
        OFD.InitialDirectory = Settings.DefaultPath
        OFD.ShowDialog()
        If Not OFD.FileName Is Nothing AndAlso OFD.FileName.Length > 0 Then
            If File.Exists(OFD.FileName) Then
                Dim name As String
                name = Mid(OFD.FileName, OFD.FileName.LastIndexOf("\") + 2, OFD.FileName.LastIndexOf(".") - OFD.FileName.LastIndexOf("\") - 1)
                Instances.Add(New Instance(name))
                With Instances(GetInstanceByName(name))
                    TabControl1.SelectedTab = .TabHandle
                    Dim Reader As New StreamReader(OFD.FileName, System.Text.Encoding.UTF8, True)
                    .SyntaxHandle.Text = Reader.ReadToEnd()
                    Reader.Close()
                    .Name = name
                    .Path = OFD.FileName
                End With
            End If
        End If
    End Sub

 

 

Document Save:

 

With Instances(TabControl1.SelectedIndex)
    If Not .Path Is Nothing AndAlso .Path.Length > 0 AndAlso .Ext <> ".inc" Then
        'This document is the one that the user will open later
        Dim Writer As New StreamWriter(.Path, False, System.Text.Encoding.GetEncoding(28591))
        Writer.Write(.SyntaxHandle.Text)
        Writer.Close()
        
        'This document is for the compiler (as it work with iso-8859-1)
        Writer.Write(System.Text.Encoding.GetEncoding(28591).GetString(System.Text.Encoding.UTF8.GetBytes(.SyntaxHandle.Text)))
        Writer.Write(.SyntaxHandle.Text)
        Writer.Close()
    End If
End With

 

 

Coordinator
Mar 17, 2012 at 5:47 AM

Your load routine looks funny to me. Aren't you trying to read a "iso-8859-1 (28591)" encoded file? Why then is your StreamReader configured to interpret it as UTF-8?

Try this:

SyntaxHandle.Encoding = System.Text.Encoding.UTF8;

// Load from a "iso-8859-1 (28591)" encoded file
SyntaxHandle.Text = System.IO.File.ReadAllText(OFD.FileName, System.Text.Encoding.GetEncoding(28591));

// Save as a "iso-8859-1 (28591)" encoded file
System.IO.File.WriteAllText(Path, SyntaxHandle.Text, System.Text.Encoding.GetEncoding(28591));

The Encoding property on Scintilla determines how internally text is handled, but it has no bearing on how it is loaded or saved from a file. In this case Scintilla will internally store text as UTF-8 which preserves all the possible characters you could load or save out of your files. It has nothing to do with the source or destination file encodings.

 

Jacob

 

 

Mar 17, 2012 at 6:13 AM

I load and save a UTF-8 file, but the compiler need a "iso-8859-1" encoding file. so when i compile i use the 2º code provided (Document Save, compiler part). And that's when is the conversion bug.

Coordinator
Mar 17, 2012 at 7:36 AM

The save snippet that I posted should be what you're looking for. What you currently have in your compiler save logic takes the .NET string, converts it to a byte array, and then converts that byte array to "iso-8859-1". It should techincally still work, but is not necessary. You can go directly from a .NET String to the encoding you want.

Typically when you see something convert to an encoding and the character is replaced with a question mark "?" as you mentioned above, it means that there was not an equivalent character representation in your destination encoding. The "?" is a fallback character. Is it possible that your document has characters that cannot be represented in "iso-8859-1"? Are you using any Unicode sequences that cannot be expressed in that character set?

Either way, what you're experiencing isn't a problem with ScintillaNET.

If you would like to supply a text file that you think should be convertible to "iso-8859-1" and isn't, I would be happy to take a look at it.

 

Jacob

 

Mar 17, 2012 at 6:32 PM

Here's the thing, there're 2 programs: 1st is a very simple IDE called pawno (that i'm sure that use Scintilla, not .Net) for a languaje called "pawn", and the encoding format that this program use is "iso-8859-1". The 2nd one is the compiler that this program use (this program also use "iso-8859-1"). So as i'm creating a new IDE (you can see it here) to replace pawno, my IDE must open files in this format (as up to now all files were saved with pawno) and to save them on this format too (for the compiler, as i'm using the same compiler that pawno use). But when characters like the ones of my 2nd post are used i have problems while pawno doesn't.

Coordinator
Mar 17, 2012 at 7:57 PM

For me to offer any more advice, I really need a sample text file that you believe is being converted incorrectly. What's pasted above can't be used for me to troubleshoot with because it has already been through conversion.

 

Jacob

Mar 17, 2012 at 8:17 PM

here's some text that will have problems:

"Òîåñòü ñíîñèì ñòàðîå, è íà åãî ìåñòî ñòàâèì íîâîå.

Íî âîò ïàðàäîêñ, èãðîêè íà÷àëè æàëîâàòñÿ (â ñêîðå ïîñëå óñòàíîâêè äàííîé "ñèñòåìû")
÷òî èõ àâòî ïðîïîäàåò, êîãäà äðóãîé èãðîê ñîçäàåò ñâî¸ àâòî.

Ìîæåò ëè áûòü äàííîå ÿâëåíèå âûçâàíî òåì, ÷òî ñåðâåð ïûòàåòñÿ "Äåñòðîèòü" àâòî èãðîêà, íî åãî ïîêà íå ñóùåñòâóåò? (òîåñòü "RCarId[playerid]" åù¸ íå ñîçäàëè, à óæå ïûòàåìñÿ åãî óäàëèòü)?

È êàê ïðîâåðèòü, ñîçäàíî ëè àâòî ñ äàííîé ïåðåìåííîé? (êðîìå êàê ñîçäàíèÿ "ïàðàëåëüíîé" ïåðåìåííîé, êîòîðàÿ áóäåò ìåíÿòñÿ â çàâèñèìîñòè îò òîãî, ñîçäàíî àâòî èëè íåò)"

Coordinator
Mar 17, 2012 at 8:22 PM

I can't do anything with this text. I need the original source file that is causing you problems. Please email it to me. Zipped, not pasted. Before it has been converted, not after.

Jacob

Mar 17, 2012 at 8:41 PM
Edited Mar 17, 2012 at 9:07 PM

Wait i must create one.

 

Edit:

http://www.mediafire.com/?sm568r3zym3w6is

 

Edit2:

Fixed, my fault, i have one encoding as UTF8 and the whole app didn't work. sorry