From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on
	ip-172-31-74-118.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.5-pre1
X-Received: by 2002:ac8:4d03:: with SMTP id w3mr6355715qtv.222.1618838302951;
        Mon, 19 Apr 2021 06:18:22 -0700 (PDT)
X-Received: by 2002:a25:d6d2:: with SMTP id n201mr616469ybg.504.1618838302610;
 Mon, 19 Apr 2021 06:18:22 -0700 (PDT)
Path: eternal-september.org!reader02.eternal-september.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 19 Apr 2021 06:18:22 -0700 (PDT)
In-Reply-To: <607b5b20$0$27442$426a74cc@news.free.fr>
Injection-Info: google-groups.googlegroups.com; posting-host=77.75.10.58; posting-account=niG3UgoAAAD7iQ3takWjEn_gw6D9X3ww
NNTP-Posting-Host: 77.75.10.58
References: <607b5b20$0$27442$426a74cc@news.free.fr>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f9d91cb0-c9bb-4d42-a1a9-0cd546da436cn@googlegroups.com>
Subject: Re: Ada and Unicode
From: Vadim Godunko <vgodunko@gmail.com>
Injection-Date: Mon, 19 Apr 2021 13:18:22 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: reader02.eternal-september.org comp.lang.ada:61844
List-Id: <comp.lang.ada>

On Sunday, April 18, 2021 at 1:03:14 AM UTC+3, DrPi wrote:
>=20
> I have a good knowledge of Unicode : code points, encoding...=20
> What I don't understand is how to manage Unicode strings with Ada. I've=
=20
> read part of ARM and did some tests without success.=20
>=20
> I managed to be partly successful with source code encoded in Latin-1.=20
> Any other encoding failed.=20
> Any way to use source code encoded in UTF-8 ?=20
> In some languages, it is possible to set a tag at the beginning of the=20
> source file to direct the compiler which encoding to use.=20
> I wasn't successful using -gnatW8 switch. But maybe I made to many tests=
=20
> and my brain was scrambled.=20
>=20
> Even with source code encoded in Latin-1, I've not been able to manage=20
> Unicode strings correctly.=20
>=20
> What's the way to manage Unicode correctly ?=20
>=20

Ada doesn't have good Unicode support. :( So, you need to find suitable set=
 of "workarounds".

There are few different aspects of Unicode support need to be considered:

1. Representation of string literals. If you want to use non-ASCII characte=
rs in source code, you need to use -gnatW8 switch and it will require use o=
f Wide_Wide_String everywhere.
2. Internal representation during application execution. You are forced to =
use Wide_Wide_String at previous step, so it will be UCS4/UTF32.
3. Text encoding/decoding on input/output operations. GNAT allows to use UT=
F-8 by providing some magic string for Form parameter of Text_IO.

It is hard to say that it is reasonable set of features for modern world. T=
o fix some of drawbacks of current situation we are developing new text pro=
cessing library, know as VSS.=20

https://github.com/AdaCore/VSS

At current stage it provides encoding independent API for text manipulation=
, encoders and decoders API for I/O, and JSON reader/writer; regexp support=
 should come soon.

Encoding independent API means that application always use Unicode characte=
rs to process text, independently from the real encoding used to store info=
rmation in memory (UTF-8 is used for now, UTF-16 will be added later for in=
teroperability with Windows API and WASM). Coders and encoders allow transl=
ation from/to different encodings when application exchange information wit=
h the world.