ZScript: String.CharAt does not like Unicode characters
Moderator: GZDoom Developers
Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49226
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: ZScript: String.CharAt does not like Unicode characters
It will return 0 when the string is fully parsed. To convert single characters back to a string, you can use AppendCharacter with an empty source.
- Player701
-
- Posts: 1710
- Joined: Wed May 13, 2009 3:15 am
- Graphics Processor: nVidia with Vulkan support
- Contact:
Re: ZScript: String.CharAt does not like Unicode characters
All right, thank you again. These new methods will definitely come in handy...
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49226
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: ZScript: String.CharAt does not like Unicode characters
I also added some case conversion utilities to the String class which are Unicode-aware, they should be able to handle everything except the Turkish special case for I with dot and i without dot (one place where the Unicode consortium truly messed up by making the case conversion locale dependent, this is nearly impossible to solve unless you know the actual language of the input string - and that it doesn't mix languages.)
Re: ZScript: String.CharAt does not like Unicode characters
I wonder, doesn't this leave the same problem we had when moving from ascii to utf8? That is, the assumption before was that byte=character, but now multiple bytes can be a character... but the assumption that's likely to be made now is that codepoint=character; what happens if multi-codepoint-characters become a thing in the future (support for combining diacritics is added, say)?
[edit: I see you've already considered this in the commit message.]
I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.
Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?
[edit: I see you've already considered this in the commit message.]
I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.
Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49226
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: ZScript: String.CharAt does not like Unicode characters
I'd add such a function if I had sufficient documentation to handle it properly. Ideally, combining diacritics should never reach mod space, unless there is no precomposed alternative. But in the end my knowledge of all this is still far too limited to do it properly. Don't forget that there's also things like variation selectors, that, unlike combining diacritics are not placed AFTER but BEFORE the modified character.
Remember what I said: Unicode processing is a minefield and no matter what you try to cook up yourself will inevitably break if the feature set gets expanded. Although unlikely, what if I added Arabic support? Not only is that a right-to-left script, it also has so many oddities that no left-to-right code trying to process it character by character will ever work. What's there is to analyze a string, not for breaking it apart for printing.
Remember what I said: Unicode processing is a minefield and no matter what you try to cook up yourself will inevitably break if the feature set gets expanded. Although unlikely, what if I added Arabic support? Not only is that a right-to-left script, it also has so many oddities that no left-to-right code trying to process it character by character will ever work. What's there is to analyze a string, not for breaking it apart for printing.
- Player701
-
- Posts: 1710
- Joined: Wed May 13, 2009 3:15 am
- Graphics Processor: nVidia with Vulkan support
- Contact:
Re: ZScript: String.CharAt does not like Unicode characters
I'm sorry if I should have started a new thread for this, but since monospacing has been mentioned here before: I see that monospacing support has been added to Screen.DrawText, but what about BaseStatusBar.DrawString?
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49226
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: ZScript: String.CharAt does not like Unicode characters
That already supports monospacing. For the status bar it's a font option.
- Player701
-
- Posts: 1710
- Joined: Wed May 13, 2009 3:15 am
- Graphics Processor: nVidia with Vulkan support
- Contact:
Re: ZScript: String.CharAt does not like Unicode characters
Ah, yes, I forgot. The problem was that the characters were incorrectly aligned. I see that there is an enum for that now, but there is no argument for it in the HUDFont constructor.
- Graf Zahl
- Lead GZDoom+Raze Developer
- Posts: 49226
- Joined: Sat Jul 19, 2003 10:19 am
- Location: Germany
Re: ZScript: String.CharAt does not like Unicode characters
If there's a problem, please report a bug and provide an example. This uses completely different code for aligning the font and I either need to fix it or map to the generic variant.
- Player701
-
- Posts: 1710
- Joined: Wed May 13, 2009 3:15 am
- Graphics Processor: nVidia with Vulkan support
- Contact:
Re: ZScript: String.CharAt does not like Unicode characters
I think it's probably a missing feature instead of a bug. Okay, will make a new thread about it soon.
Upd: See here
Upd: See here