From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-65-14.ec2.internal X-Spam-Level: * X-Spam-Status: No, score=1.1 required=3.0 tests=AC_FROM_MANY_DOTS,BAYES_00, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 Path: eternal-september.org!feeder.eternal-september.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!cleanfeed1-b.proxad.net!nnrp1-1.free.fr!not-for-mail From: Thomas Newsgroups: comp.lang.ada Mail-Copies-To: nobody Subject: Re: Ada and Unicode References: <607b5b20$0$27442$426a74cc@news.free.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit User-Agent: MT-NewsWatcher/3.5.3b3 (Intel Mac OS X) Date: Tue, 04 Apr 2023 02:02:03 +0200 Message-ID: <642b68fb$0$3206$426a34cc@news.free.fr> Organization: Guest of ProXad - France NNTP-Posting-Date: 04 Apr 2023 02:02:03 CEST NNTP-Posting-Host: 91.175.52.121 X-Trace: 1680566523 news-4.free.fr 3206 91.175.52.121:3426 X-Complaints-To: abuse@proxad.net Xref: feeder.eternal-september.org comp.lang.ada:65067 List-Id: In article , Thomas wrote: > In article , > Vadim Godunko wrote: > > > On Sunday, April 18, 2021 at 1:03:14 AM UTC+3, DrPi wrote: > > > > What's the way to manage Unicode correctly ? > > Ada doesn't have good Unicode support. :( So, you need to find suitable set > > of "workarounds". > > > > There are few different aspects of Unicode support need to be considered: > > > > 1. Representation of string literals. If you want to use non-ASCII > > characters > > in source code, you need to use -gnatW8 switch and it will require use of > > Wide_Wide_String everywhere. > > 2. Internal representation during application execution. You are forced to > > use Wide_Wide_String at previous step, so it will be UCS4/UTF32. > > > It is hard to say that it is reasonable set of features for modern world. > > I don't think Ada would be lacking that much, for having good UTF-8 > support. > > the cardinal point is to be able to fill a > Ada.Strings.UTF_Encoding.UTF_8_String with a litteral. > (once you got it, when you'll try to fill a Standard.String with a > non-Latin-1 character, it'll make an error, i think it's fine :-) ) > > does Ada 202x allow it ? hi ! I think I found a quite nice solution! (reading again) (not tested yet) it's not perfect as in the rules of the art, but it is: - Ada 2012 compatible - better than writing UTF-8 Ada code and then telling gnat it is Latin-1 (in this way it would take UTF_8_String for what it is: an array of octets, but it would not detect an invalid UTF-8 string, and if someone tells it's really UTF-8 all goes wrong) - better than being limited to ASCII in string literals - never need to explicitely declare Wide_Wide_String: it's always implicit, for very short time, and AFAIK eligible for optimization package UTF_Encoding is subtype UTF_8_String is Ada.Strings.UTF_Encoding.UTF_8_String; function "+" (A : in Wide_Wide_String) return UTF_8_String renames Ada.Strings.UTF_Encoding.Wide_Wide_Strings.Encode; end UTF_Encoding; then we can do: package User is use UTF_Encoding; My_String : UTF_8_String := + "Greek characters + smileys"; end User; if you want to avoid "use UTF_Encoding;", i think "use type UTF_Encoding.UTF_8_String;" doesn't work, but this should work: package UTF_Encoding is subtype UTF_8_String is Ada.Strings.UTF_Encoding.UTF_8_String; type Literals_For_UTF_8_String is new Wide_Wide_String; function "+" (A : in Literals_For_UTF_8_String) return UTF_8_String renames Ada.Strings.UTF_Encoding.Wide_Wide_Strings.Encode; end UTF_Encoding; package User is use type UTF_Encoding.Literals_For_UTF_8_String; My_String : UTF_Encoding.UTF_8_String := + "Greek characters + smileys"; end User; what do you think about that ? good idea or not ? :-) -- RAPID maintainer http://savannah.nongnu.org/projects/rapid/