From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on
	ip-172-31-74-118.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.5-pre1
X-Received: by 2002:a37:441:: with SMTP id 62mr10127772qke.366.1618872036225;
        Mon, 19 Apr 2021 15:40:36 -0700 (PDT)
X-Received: by 2002:a5b:448:: with SMTP id s8mr19839278ybp.363.1618872036022;
 Mon, 19 Apr 2021 15:40:36 -0700 (PDT)
Path: eternal-september.org!reader02.eternal-september.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 19 Apr 2021 15:40:35 -0700 (PDT)
In-Reply-To: <607b5b20$0$27442$426a74cc@news.free.fr>
Injection-Info: google-groups.googlegroups.com; posting-host=146.5.2.231; posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC
NNTP-Posting-Host: 146.5.2.231
References: <607b5b20$0$27442$426a74cc@news.free.fr>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <71f7d65f-ba29-46f1-a187-4c8347a1db03n@googlegroups.com>
Subject: Re: Ada and Unicode
From: Shark8 <onewingedshark@gmail.com>
Injection-Date: Mon, 19 Apr 2021 22:40:36 +0000
Content-Type: text/plain; charset="UTF-8"
Xref: reader02.eternal-september.org comp.lang.ada:61855
List-Id: <comp.lang.ada>

On Saturday, April 17, 2021 at 4:03:14 PM UTC-6, DrPi wrote:
> Hi, 
> 
> I have a good knowledge of Unicode : code points, encoding... 
> What I don't understand is how to manage Unicode strings with Ada. I've 
> read part of ARM and did some tests without success. 
> 
> I managed to be partly successful with source code encoded in Latin-1. 
Ah.
Yes, this is an issue in GNAT, and possibly other compilers.
The easiest method for me is to right-click the text-buffer for the file in GPS, click properties in the menu that pops up, then in the dialog select from the Character Set drop-down "Unicode UTF-#".
> Any other encoding failed. 
> Any way to use source code encoded in UTF-8 ? 
There's the above method with GPS.
IIRC there's also a Pragma and a compiler-flag for GNAT.

It's actually a non-issue for Byron, because the file-reader does a BOM-check [IIRC defaulting to ASCII in the absence of a BOM] and outputs to the lexer the Wide_Wide_Character equivalent of the input-encoding.
See: https://github.com/OneWingedShark/Byron/blob/master/src/reader/readington.adb

> In some languages, it is possible to set a tag at the beginning of the 
> source file to direct the compiler which encoding to use. 
> I wasn't successful using -gnatW8 switch. But maybe I made to many tests 
> and my brain was scrambled.
IIRC the gnatW8 flag sets it to UTF-8, so if your editor is saving in something else like UTF-16 BE, the compiler [probably] won't read it correctly.

> Even with source code encoded in Latin-1, I've not been able to manage 
> Unicode strings correctly. 
> 
> What's the way to manage Unicode correctly ? 
I typically use the GPS file/properties method above, and then I might also use the pragma.