by gramps » Sat Apr 13, 2019 4:15 am
I wonder, doesn't this leave the same problem we had when moving from ascii to utf8? That is, the assumption before was that byte=character, but now multiple bytes can be a character... but the assumption that's likely to be made now is that codepoint=character; what happens if multi-codepoint-characters become a thing in the future (support for combining diacritics is added, say)?
[edit: I see you've already considered this in the commit message.]
I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.
Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?
I wonder, doesn't this leave the same problem we had when moving from ascii to utf8? That is, the assumption before was that byte=character, but now multiple bytes can be a character... but the assumption that's likely to be made now is that codepoint=character; what happens if multi-codepoint-characters become a thing in the future (support for combining diacritics is added, say)?
[size=85][edit: I see you've already considered this in the commit message.][/size]
I was thinking, maybe a way to future-proof against this is to also create GetNextCharacter as an alias for GetNextCodePoint. Then, if multi-codepoint-characters are added later, the appropriate changes can be made to GetNextCharacter, while leaving GetNextCodePoint alone. Modders would use whichever one is appropriate: if we want to deal with individual codepoints, we use GetNextCodePoint, and if we want the higher level of abstraction for characters, we use GetNextCharacter, even if they do exactly the same thing for now.
Every function named with "CodePoint" would get a (functionally identical, for now) "Character" alias. These names are also a bit more familiar and probably what people would tend to use who'd want the higher level of abstraction; people who explicitly want to deal with codepoints can use the "CodePoint" versions and be confident that their behavior won't change. What do you think?