From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-65-14.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!hzzNxxMX5IPvnEV4b74Cww.user.46.165.242.91.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Ada and Unicode Date: Fri, 8 Apr 2022 11:26:05 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <607b5b20$0$27442$426a74cc@news.free.fr> <86mttuk5f0.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: gioia.aioe.org; logging-data="33118"; posting-host="hzzNxxMX5IPvnEV4b74Cww.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org"; User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-US Xref: reader02.eternal-september.org comp.lang.ada:63712 List-Id: On 2022-04-08 10:56, Simon Wright wrote: > "Randy Brukardt" writes: > >> If you had an Ada-like language that used a universal UTF-8 string >> internally, you then would have a lot of old and mostly useless >> operations supported for array types (since things like slices are >> mainly useful for string operations). > > Just off the top of my head, wouldn't it be better to use UTF32-encoded > Wide_Wide_Character internally? Yep, that is the exactly the problem, a confusion between interface and implementation. Encoding /= interface, e.g. an interface of a string viewed as an array of characters. That interface just same for ASCII, Latin-1, EBCDIC, RADIX50, UTF-8 etc strings. Why do you care what is inside? Ada type system's inability to implement this interface is another issue. Usefulness of this interface is yet another. For immutable strings it is quite useful. For mutable strings it might appear too constrained, e.g. for packed encodings like UTF-8 and UTF-16. Also this interface should have nothing to do with the interface of an UTF-8 string as an array of octets or the interface of an UTF-16LE string as an array of little endian words. Since Ada cannot separate these interfaces, for practical purposes, Strings are arrays of octets considered as UTF-8 encoding. The rest goes into coding guidelines under the title "never ever do this." -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de