From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on
	ip-172-31-74-118.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.5-pre1
Path: eternal-september.org!reader02.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!cleanfeed1-a.proxad.net!nnrp1-1.free.fr!not-for-mail
Newsgroups: comp.lang.ada
References: <607b5b20$0$27442$426a74cc@news.free.fr>
 <660e25a5-506b-43c0-b4ac-e7738e5500e5n@googlegroups.com>
From: DrPi <314@drpi.fr>
Subject: Re: Ada and Unicode
Date: Mon, 19 Apr 2021 11:28:34 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.9.1
MIME-Version: 1.0
In-Reply-To: <660e25a5-506b-43c0-b4ac-e7738e5500e5n@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Message-ID: <607d4d46$0$3684$426a34cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 19 Apr 2021 11:28:38 CEST
NNTP-Posting-Host: 82.65.30.55
X-Trace: 1618824518 news-4.free.fr 3684 82.65.30.55:52367
X-Complaints-To: abuse@proxad.net
Xref: reader02.eternal-september.org comp.lang.ada:61833
List-Id: <comp.lang.ada>

Le 19/04/2021 à 10:29, Maxim Reznik a écrit :
> воскресенье, 18 апреля 2021 г. в 01:03:14 UTC+3, DrPi:
>>
>> Any way to use source code encoded in UTF-8 ?
> 
> Yes, with GNAT just use "-gnatW8" for compiler flag (in command line or your project file):
> 
> --  main.adb:
> with Ada.Wide_Wide_Text_IO;
> 
> procedure Main is
>     Привет : constant Wide_Wide_String := "Привет";
> begin
>     Ada.Wide_Wide_Text_IO.Put_Line (Привет);
> end Main;
> 
> $ gprbuild -gnatW8 main.adb
> $ ./main
> Привет
> 
> 
>> In some languages, it is possible to set a tag at the beginning of the
>> source file to direct the compiler which encoding to use.
> 
> You can do this with putting the Wide_Character_Encoding pragma (This is a GNAT specific pragma) at the top of the file. Take a look:
> 
> --  main.adb:
> pragma Wide_Character_Encoding (UTF8);
> 
> with Ada.Wide_Wide_Text_IO;
> 
> procedure Main is
>     Привет : constant Wide_Wide_String := "Привет";
> begin
>     Ada.Wide_Wide_Text_IO.Put_Line (Привет);
> end Main;
> 
> $ gprbuild main.adb
> $ ./main
> Привет
> 
Wide and Wide_Wide characters and UTF-8 are two distinct things.
Wide and Wide_Wide characters are supposed to contain Unicode code 
points (Unicode characters).
UTF-8 is a stream of bytes, the encoding of Wide or Wide_Wide characters.
What's the purpose of "pragma Wide_Character_Encoding (UTF8);" ?

> 
> 
>> What's the way to manage Unicode correctly ?
>>
> 
> You can use Wide_Wide_String and Unbounded_Wide_Wide_String type to process Unicode strings. But this is not very handy. I use the Matreshka library for Unicode strings. It has a lot of features (regexp, string vectors, XML, JSON, databases, Web Servlets, template engine, etc.). URL: https://forge.ada-ru.org/matreshka

Thanks
> 
>> Regards,
>> Nicolas