ZScript: String.CharAt does not like Unicode characters

Forum rules
Please don't bump threads here if you have a problem - it will often be forgotten about if you do. Instead, make a new thread here.

Post a reply

Smilies
:D :) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :wink: :geek: :ugeek: :!: :?: :idea: :arrow: :| :mrgreen: :3: :wub: >:( :blergh:
View more smilies

BBCode is OFF
Smilies are ON

Topic review
   

Expand view Topic review: ZScript: String.CharAt does not like Unicode characters

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 6:30 am

I think it's probably a missing feature instead of a bug. Okay, will make a new thread about it soon.

Upd: See here

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 6:25 am

If there's a problem, please report a bug and provide an example. This uses completely different code for aligning the font and I either need to fix it or map to the generic variant.

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 6:06 am

Ah, yes, I forgot. The problem was that the characters were incorrectly aligned. I see that there is an enum for that now, but there is no argument for it in the HUDFont constructor.

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 6:00 am

That already supports monospacing. For the status bar it's a font option.

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 5:47 am

I'm sorry if I should have started a new thread for this, but since monospacing has been mentioned here before: I see that monospacing support has been added to Screen.DrawText, but what about BaseStatusBar.DrawString?

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 4:34 am

I'd add such a function if I had sufficient documentation to handle it properly. Ideally, combining diacritics should never reach mod space, unless there is no precomposed alternative. But in the end my knowledge of all this is still far too limited to do it properly. Don't forget that there's also things like variation selectors, that, unlike combining diacritics are not placed AFTER but BEFORE the modified character.

Remember what I said: Unicode processing is a minefield and no matter what you try to cook up yourself will inevitably break if the feature set gets expanded. Although unlikely, what if I added Arabic support? Not only is that a right-to-left script, it also has so many oddities that no left-to-right code trying to process it character by character will ever work. What's there is to analyze a string, not for breaking it apart for printing.

Re: ZScript: String.CharAt does not like Unicode characters

by gramps » Sat Apr 13, 2019 4:15 am

I wonder, doesn't this leave the same problem we had when moving from ascii to utf8? That is, the assumption before was that byte=character, but now multiple bytes can be a character... but the assumption that's likely to be made now is that codepoint=character; what happens if multi-codepoint-characters become a thing in the future (support for combining diacritics is added, say)?
[edit: I see you've already considered this in the commit message.]

I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.

Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 2:31 am

I also added some case conversion utilities to the String class which are Unicode-aware, they should be able to handle everything except the Turkish special case for I with dot and i without dot (one place where the Unicode consortium truly messed up by making the case conversion locale dependent, this is nearly impossible to solve unless you know the actual language of the input string - and that it doesn't mix languages.)

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 2:13 am

All right, thank you again. These new methods will definitely come in handy...

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 2:09 am

It will return 0 when the string is fully parsed. To convert single characters back to a string, you can use AppendCharacter with an empty source.

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 1:58 am

Thank you very much. :D

Question: Is there a way to know when GetNextCodePoint has reached the end of the string? Or do I have to use CodePointCount for that?

Edit: Also, how do I convert the resulting integers to strings to pass them to Screen.DrawText / BaseStatusBar.DrawString etc. ?

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Sat Apr 13, 2019 1:50 am

Proper handlers have been added for extracting code points from strings.

Re: ZScript: String.CharAt does not like Unicode characters

by Player701 » Sat Apr 13, 2019 12:33 am

Graf Zahl wrote:The problem with letting this be handled by the community is that it will produce lots of broken code and no means to fix released mods.
Unicode handling is a minefield that seriously cannot be left to amateurs. Regarding monospacing, that really (*REALLY*) needs to be made part of the DrawString function. Even engine-internally, it's currently hacked in in the places that have it, most importantly the status bar's text drawer.
Yes, this is exactly why I want the API to be part of the engine. Same about monospacing. Yes, my method is largely a workaround and it indeed might not work correctly with more exotic characters. But right now it doesn't even work with, say, Cyrillic characters. Same problem with capitalization.

Regarding padding: the main problem is that I need to know the lengths of my strings to position them correctly. Whether this is done by forcing them all to the same length with invisible characters or by specifying the correct positions from the start does not really matter. It is also possible that I might need padding with non-invisible characters as well (say, "*" or "-") in certain cases.

Re: ZScript: String.CharAt does not like Unicode characters

by Graf Zahl » Fri Apr 12, 2019 11:43 pm

The problem with letting this be handled by the community is that it will produce lots of broken code and no means to fix released mods.
Unicode handling is a minefield that seriously cannot be left to amateurs. Regarding monospacing, that really (*REALLY*) needs to be made part of the DrawString function. Even engine-internally, it's currently hacked in in the places that have it, most importantly the status bar's text drawer.

Re: ZScript: String.CharAt does not like Unicode characters

by gramps » Fri Apr 12, 2019 9:00 pm

I'm still not really seeing the advantage of doing it engine-side over letting users create a library to support our own needs, letting that mature, and then maybe integrating something similar into the engine once utf-8 support has a little time to settle.

To the extent that the examples like monospacing and uppercasing can be handled by a general purpose utf-8 class engine-side, what would be the benefit of doing it that way over letting the community do it as a separate library? Wouldn't a simple iterator like the example I posted earlier handle a majority of the cases that can actually be handled?

Top