From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-65-14.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!hzzNxxMX5IPvnEV4b74Cww.user.46.165.242.91.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Ada and Unicode Date: Fri, 8 Apr 2022 21:45:18 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <607b5b20$0$27442$426a74cc@news.free.fr> <86mttuk5f0.fsf@stephe-leake.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: gioia.aioe.org; logging-data="11645"; posting-host="hzzNxxMX5IPvnEV4b74Cww.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org"; User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-US Xref: reader02.eternal-september.org comp.lang.ada:63718 List-Id: On 2022-04-08 21:19, Simon Wright wrote: > "Dmitry A. Kazakov" writes: > >> On 2022-04-08 10:56, Simon Wright wrote: >>> "Randy Brukardt" writes: >>> >>>> If you had an Ada-like language that used a universal UTF-8 string >>>> internally, you then would have a lot of old and mostly useless >>>> operations supported for array types (since things like slices are >>>> mainly useful for string operations). >>> >>> Just off the top of my head, wouldn't it be better to use >>> UTF32-encoded Wide_Wide_Character internally? >> >> Yep, that is the exactly the problem, a confusion between interface >> and implementation. > > Don't understand. My point was that *when you are implementing this* it > mught be easier to deal with 32-bit charactrs/code points/whatever the > proper jargon is than with UTF8. I think it would be more difficult, because you will have to convert from and to UTF-8 under the hood or explicitly. UTF-8 is de-facto interface standard and I/O standard. That would be 60-70% of all cases you need a string. Most string operations like search, comparison, slicing are isomorphic between code points and octets. So you would win nothing from keeping strings internally as arrays of code points. The situation is comparable to Unbounded_Strings. The implementation is relatively simple, but the user must carry the burden of calling To_String and To_Unbounded_String all over the application and the processor must suffer the overhead of copying arrays here and there. >> Encoding /= interface, e.g. an interface of a string viewed as an >> array of characters. That interface just same for ASCII, Latin-1, >> EBCDIC, RADIX50, UTF-8 etc strings. Why do you care what is inside? > > With a user's hat on, I don't. Implementers might have a different point > of view. Sure, but in Ada philosophy their opinion should carry less weight, than, say, in C. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de