From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.5-pre1 X-Received: by 2002:ac8:4d03:: with SMTP id w3mr6355715qtv.222.1618838302951; Mon, 19 Apr 2021 06:18:22 -0700 (PDT) X-Received: by 2002:a25:d6d2:: with SMTP id n201mr616469ybg.504.1618838302610; Mon, 19 Apr 2021 06:18:22 -0700 (PDT) Path: eternal-september.org!reader02.eternal-september.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Mon, 19 Apr 2021 06:18:22 -0700 (PDT) In-Reply-To: <607b5b20$0$27442$426a74cc@news.free.fr> Injection-Info: google-groups.googlegroups.com; posting-host=77.75.10.58; posting-account=niG3UgoAAAD7iQ3takWjEn_gw6D9X3ww NNTP-Posting-Host: 77.75.10.58 References: <607b5b20$0$27442$426a74cc@news.free.fr> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Ada and Unicode From: Vadim Godunko Injection-Date: Mon, 19 Apr 2021 13:18:22 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader02.eternal-september.org comp.lang.ada:61844 List-Id: On Sunday, April 18, 2021 at 1:03:14 AM UTC+3, DrPi wrote: >=20 > I have a good knowledge of Unicode : code points, encoding...=20 > What I don't understand is how to manage Unicode strings with Ada. I've= =20 > read part of ARM and did some tests without success.=20 >=20 > I managed to be partly successful with source code encoded in Latin-1.=20 > Any other encoding failed.=20 > Any way to use source code encoded in UTF-8 ?=20 > In some languages, it is possible to set a tag at the beginning of the=20 > source file to direct the compiler which encoding to use.=20 > I wasn't successful using -gnatW8 switch. But maybe I made to many tests= =20 > and my brain was scrambled.=20 >=20 > Even with source code encoded in Latin-1, I've not been able to manage=20 > Unicode strings correctly.=20 >=20 > What's the way to manage Unicode correctly ?=20 >=20 Ada doesn't have good Unicode support. :( So, you need to find suitable set= of "workarounds". There are few different aspects of Unicode support need to be considered: 1. Representation of string literals. If you want to use non-ASCII characte= rs in source code, you need to use -gnatW8 switch and it will require use o= f Wide_Wide_String everywhere. 2. Internal representation during application execution. You are forced to = use Wide_Wide_String at previous step, so it will be UCS4/UTF32. 3. Text encoding/decoding on input/output operations. GNAT allows to use UT= F-8 by providing some magic string for Form parameter of Text_IO. It is hard to say that it is reasonable set of features for modern world. T= o fix some of drawbacks of current situation we are developing new text pro= cessing library, know as VSS.=20 https://github.com/AdaCore/VSS At current stage it provides encoding independent API for text manipulation= , encoders and decoders API for I/O, and JSON reader/writer; regexp support= should come soon. Encoding independent API means that application always use Unicode characte= rs to process text, independently from the real encoding used to store info= rmation in memory (UTF-8 is used for now, UTF-16 will be added later for in= teroperability with Windows API and WASM). Coders and encoders allow transl= ation from/to different encodings when application exchange information wit= h the world.