From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
	ip-172-31-65-14.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.6
Path: eternal-september.org!reader02.eternal-september.org!aioe.org!hzzNxxMX5IPvnEV4b74Cww.user.46.165.242.91.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: Ada and Unicode
Date: Fri, 8 Apr 2022 21:45:18 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t2q3cb$bbt$1@gioia.aioe.org>
References: <607b5b20$0$27442$426a74cc@news.free.fr>
 <86mttuk5f0.fsf@stephe-leake.org> <s5jr59$1tkq$1@gioia.aioe.org>
 <s5juep$1lbu$1@gioia.aioe.org> <s5jute$1s08$1@gioia.aioe.org>
 <s5k0ai$bb5$1@dont-email.me>
 <fantome.forums.tDeContes-E8EAB8.20043603042022@news.free.fr>
 <t2knpr$s26$1@dont-email.me> <t2lesj$d2f$1@dont-email.me>
 <lysfqolzrg.fsf@pushface.org> <t2ov3c$10au$1@gioia.aioe.org>
 <lyfsmn2xjn.fsf@pushface.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="11645"; posting-host="hzzNxxMX5IPvnEV4b74Cww.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.8.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
Xref: reader02.eternal-september.org comp.lang.ada:63718
List-Id: <comp.lang.ada>

On 2022-04-08 21:19, Simon Wright wrote:
> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> On 2022-04-08 10:56, Simon Wright wrote:
>>> "Randy Brukardt" <randy@rrsoftware.com> writes:
>>>
>>>> If you had an Ada-like language that used a universal UTF-8 string
>>>> internally, you then would have a lot of old and mostly useless
>>>> operations supported for array types (since things like slices are
>>>> mainly useful for string operations).
>>>
>>> Just off the top of my head, wouldn't it be better to use
>>> UTF32-encoded Wide_Wide_Character internally?
>>
>> Yep, that is the exactly the problem, a confusion between interface
>> and implementation.
> 
> Don't understand. My point was that *when you are implementing this* it
> mught be easier to deal with 32-bit charactrs/code points/whatever the
> proper jargon is than with UTF8.

I think it would be more difficult, because you will have to convert 
from and to UTF-8 under the hood or explicitly. UTF-8 is de-facto 
interface standard and I/O standard. That would be 60-70% of all cases 
you need a string. Most string operations like search, comparison, 
slicing are isomorphic between code points and octets. So you would win 
nothing from keeping strings internally as arrays of code points.

The situation is comparable to Unbounded_Strings. The implementation is 
relatively simple, but the user must carry the burden of calling 
To_String and To_Unbounded_String all over the application and the 
processor must suffer the overhead of copying arrays here and there.

>> Encoding /= interface, e.g. an interface of a string viewed as an
>> array of characters. That interface just same for ASCII, Latin-1,
>> EBCDIC, RADIX50, UTF-8 etc strings. Why do you care what is inside?
> 
> With a user's hat on, I don't. Implementers might have a different point
> of view.

Sure, but in Ada philosophy their opinion should carry less weight, 
than, say, in C.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de