ZScript: String.CharAt does not like Unicode characters

Bugs that have been investigated and resolved somehow.

Moderator: GZDoom Developers

Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49226
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: ZScript: String.CharAt does not like Unicode characters

Post by Graf Zahl »

It will return 0 when the string is fully parsed. To convert single characters back to a string, you can use AppendCharacter with an empty source.
User avatar
Player701
 
 
Posts: 1710
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support
Contact:

Re: ZScript: String.CharAt does not like Unicode characters

Post by Player701 »

All right, thank you again. These new methods will definitely come in handy...
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49226
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: ZScript: String.CharAt does not like Unicode characters

Post by Graf Zahl »

I also added some case conversion utilities to the String class which are Unicode-aware, they should be able to handle everything except the Turkish special case for I with dot and i without dot (one place where the Unicode consortium truly messed up by making the case conversion locale dependent, this is nearly impossible to solve unless you know the actual language of the input string - and that it doesn't mix languages.)
gramps
Posts: 300
Joined: Thu Oct 18, 2018 2:16 pm

Re: ZScript: String.CharAt does not like Unicode characters

Post by gramps »

I wonder, doesn't this leave the same problem we had when moving from ascii to utf8? That is, the assumption before was that byte=character, but now multiple bytes can be a character... but the assumption that's likely to be made now is that codepoint=character; what happens if multi-codepoint-characters become a thing in the future (support for combining diacritics is added, say)?
[edit: I see you've already considered this in the commit message.]

I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.

Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49226
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: ZScript: String.CharAt does not like Unicode characters

Post by Graf Zahl »

I'd add such a function if I had sufficient documentation to handle it properly. Ideally, combining diacritics should never reach mod space, unless there is no precomposed alternative. But in the end my knowledge of all this is still far too limited to do it properly. Don't forget that there's also things like variation selectors, that, unlike combining diacritics are not placed AFTER but BEFORE the modified character.

Remember what I said: Unicode processing is a minefield and no matter what you try to cook up yourself will inevitably break if the feature set gets expanded. Although unlikely, what if I added Arabic support? Not only is that a right-to-left script, it also has so many oddities that no left-to-right code trying to process it character by character will ever work. What's there is to analyze a string, not for breaking it apart for printing.
User avatar
Player701
 
 
Posts: 1710
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support
Contact:

Re: ZScript: String.CharAt does not like Unicode characters

Post by Player701 »

I'm sorry if I should have started a new thread for this, but since monospacing has been mentioned here before: I see that monospacing support has been added to Screen.DrawText, but what about BaseStatusBar.DrawString?
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49226
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: ZScript: String.CharAt does not like Unicode characters

Post by Graf Zahl »

That already supports monospacing. For the status bar it's a font option.
User avatar
Player701
 
 
Posts: 1710
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support
Contact:

Re: ZScript: String.CharAt does not like Unicode characters

Post by Player701 »

Ah, yes, I forgot. The problem was that the characters were incorrectly aligned. I see that there is an enum for that now, but there is no argument for it in the HUDFont constructor.
User avatar
Graf Zahl
Lead GZDoom+Raze Developer
Lead GZDoom+Raze Developer
Posts: 49226
Joined: Sat Jul 19, 2003 10:19 am
Location: Germany

Re: ZScript: String.CharAt does not like Unicode characters

Post by Graf Zahl »

If there's a problem, please report a bug and provide an example. This uses completely different code for aligning the font and I either need to fix it or map to the generic variant.
User avatar
Player701
 
 
Posts: 1710
Joined: Wed May 13, 2009 3:15 am
Graphics Processor: nVidia with Vulkan support
Contact:

Re: ZScript: String.CharAt does not like Unicode characters

Post by Player701 »

I think it's probably a missing feature instead of a bug. Okay, will make a new thread about it soon.

Upd: See here
Post Reply

Return to “Closed Bugs [GZDoom]”