comp.lang.ada
 help / color / mirror / Atom feed
From: "Luke A. Guest" <laguest@archeia.com>
Subject: Re: Ada and Unicode
Date: Mon, 19 Apr 2021 12:56:34 +0100	[thread overview]
Message-ID: <s5jr59$1tkq$1@gioia.aioe.org> (raw)
In-Reply-To: 86mttuk5f0.fsf@stephe-leake.org

On 19/04/2021 10:08, Stephen Leake wrote:
>> What's the way to manage Unicode correctly ?
> 
> There are two issues: Unicode in source code, that the compiler must
> understand, and Unicode in strings, that your program must understand.

And this is there the Ada standard gets it wrong, in the encodings 
package re utf-8.

Unicode is a superset of 7-bit ASCII not Latin 1. The high bit in the 
leading octet indicates whether there are trailing octets. See 
https://github.com/Lucretia/uca/blob/master/src/uca.ads#L70 for the data 
layout. The first 128 "characters" in Unicode match that of 7-bit ASCII, 
not 8-bit ASCII, and certainly not Latin 1. Therefore this:

package Ada.Strings.UTF_Encoding
    ...
    subtype UTF_8_String is String;
    ...
end Ada.Strings.UTF_Encoding;

Was absolutely and totally wrong.

  parent reply	other threads:[~2021-04-19 11:56 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-17 22:03 Ada and Unicode DrPi
2021-04-18  0:02 ` Luke A. Guest
2021-04-19  9:09   ` DrPi
2021-04-19  8:29 ` Maxim Reznik
2021-04-19  9:28   ` DrPi
2021-04-19 13:50     ` Maxim Reznik
2021-04-19 15:51       ` DrPi
2021-04-19 11:15   ` Simon Wright
2021-04-19 11:50     ` Luke A. Guest
2021-04-19 15:53     ` DrPi
2022-04-03 19:20     ` Thomas
2022-04-04  6:10       ` Vadim Godunko
2022-04-04 14:19         ` Simon Wright
2022-04-04 15:11           ` Simon Wright
2022-04-05  7:59           ` Vadim Godunko
2022-04-08  9:01             ` Simon Wright
2023-03-30 23:35         ` Thomas
2022-04-04 14:33       ` Simon Wright
2021-04-19  9:08 ` Stephen Leake
2021-04-19  9:34   ` Dmitry A. Kazakov
2021-04-19 11:56   ` Luke A. Guest [this message]
2021-04-19 12:13     ` Luke A. Guest
2021-04-19 15:48       ` DrPi
2021-04-19 12:52     ` Dmitry A. Kazakov
2021-04-19 13:00       ` Luke A. Guest
2021-04-19 13:10         ` Dmitry A. Kazakov
2021-04-19 13:15           ` Luke A. Guest
2021-04-19 13:31             ` Dmitry A. Kazakov
2022-04-03 17:24               ` Thomas
2021-04-19 13:24         ` J-P. Rosen
2021-04-20 19:13           ` Randy Brukardt
2022-04-03 18:04           ` Thomas
2022-04-06 18:57             ` J-P. Rosen
2022-04-07  1:30               ` Randy Brukardt
2022-04-08  8:56                 ` Simon Wright
2022-04-08  9:26                   ` Dmitry A. Kazakov
2022-04-08 19:19                     ` Simon Wright
2022-04-08 19:45                       ` Dmitry A. Kazakov
2022-04-09  4:05                         ` Randy Brukardt
2022-04-09  7:43                           ` Simon Wright
2022-04-09 10:27                           ` DrPi
2022-04-09 16:46                             ` Dennis Lee Bieber
2022-04-09 18:59                               ` DrPi
2022-04-10  5:58                             ` Vadim Godunko
2022-04-10 18:59                               ` DrPi
2022-04-12  6:13                               ` Randy Brukardt
2021-04-19 16:07         ` DrPi
2021-04-20 19:06         ` Randy Brukardt
2022-04-03 18:37           ` Thomas
2022-04-04 23:52             ` Randy Brukardt
2023-03-31  3:06               ` Thomas
2023-04-01 10:18                 ` Randy Brukardt
2021-04-19 16:14   ` DrPi
2021-04-19 17:12     ` Björn Lundin
2021-04-19 19:44       ` DrPi
2022-04-16  2:32   ` Thomas
2021-04-19 13:18 ` Vadim Godunko
2022-04-03 16:51   ` Thomas
2023-04-04  0:02     ` Thomas
2021-04-19 22:40 ` Shark8
2021-04-20 15:05   ` Simon Wright
2021-04-20 19:17     ` Randy Brukardt
2021-04-20 20:04       ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox