From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.5-pre1 Path: eternal-september.org!reader02.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!cleanfeed1-a.proxad.net!nnrp1-1.free.fr!not-for-mail Newsgroups: comp.lang.ada References: <607b5b20$0$27442$426a74cc@news.free.fr> <660e25a5-506b-43c0-b4ac-e7738e5500e5n@googlegroups.com> From: DrPi <314@drpi.fr> Subject: Re: Ada and Unicode Date: Mon, 19 Apr 2021 11:28:34 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: <660e25a5-506b-43c0-b4ac-e7738e5500e5n@googlegroups.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Message-ID: <607d4d46$0$3684$426a34cc@news.free.fr> Organization: Guest of ProXad - France NNTP-Posting-Date: 19 Apr 2021 11:28:38 CEST NNTP-Posting-Host: 82.65.30.55 X-Trace: 1618824518 news-4.free.fr 3684 82.65.30.55:52367 X-Complaints-To: abuse@proxad.net Xref: reader02.eternal-september.org comp.lang.ada:61833 List-Id: Le 19/04/2021 à 10:29, Maxim Reznik a écrit : > воскресенье, 18 апреля 2021 г. в 01:03:14 UTC+3, DrPi: >> >> Any way to use source code encoded in UTF-8 ? > > Yes, with GNAT just use "-gnatW8" for compiler flag (in command line or your project file): > > -- main.adb: > with Ada.Wide_Wide_Text_IO; > > procedure Main is > Привет : constant Wide_Wide_String := "Привет"; > begin > Ada.Wide_Wide_Text_IO.Put_Line (Привет); > end Main; > > $ gprbuild -gnatW8 main.adb > $ ./main > Привет > > >> In some languages, it is possible to set a tag at the beginning of the >> source file to direct the compiler which encoding to use. > > You can do this with putting the Wide_Character_Encoding pragma (This is a GNAT specific pragma) at the top of the file. Take a look: > > -- main.adb: > pragma Wide_Character_Encoding (UTF8); > > with Ada.Wide_Wide_Text_IO; > > procedure Main is > Привет : constant Wide_Wide_String := "Привет"; > begin > Ada.Wide_Wide_Text_IO.Put_Line (Привет); > end Main; > > $ gprbuild main.adb > $ ./main > Привет > Wide and Wide_Wide characters and UTF-8 are two distinct things. Wide and Wide_Wide characters are supposed to contain Unicode code points (Unicode characters). UTF-8 is a stream of bytes, the encoding of Wide or Wide_Wide characters. What's the purpose of "pragma Wide_Character_Encoding (UTF8);" ? > > >> What's the way to manage Unicode correctly ? >> > > You can use Wide_Wide_String and Unbounded_Wide_Wide_String type to process Unicode strings. But this is not very handy. I use the Matreshka library for Unicode strings. It has a lot of features (regexp, string vectors, XML, JSON, databases, Web Servlets, template engine, etc.). URL: https://forge.ada-ru.org/matreshka Thanks > >> Regards, >> Nicolas