From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.5-pre1 X-Received: by 2002:a37:441:: with SMTP id 62mr10127772qke.366.1618872036225; Mon, 19 Apr 2021 15:40:36 -0700 (PDT) X-Received: by 2002:a5b:448:: with SMTP id s8mr19839278ybp.363.1618872036022; Mon, 19 Apr 2021 15:40:36 -0700 (PDT) Path: eternal-september.org!reader02.eternal-september.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Mon, 19 Apr 2021 15:40:35 -0700 (PDT) In-Reply-To: <607b5b20$0$27442$426a74cc@news.free.fr> Injection-Info: google-groups.googlegroups.com; posting-host=146.5.2.231; posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC NNTP-Posting-Host: 146.5.2.231 References: <607b5b20$0$27442$426a74cc@news.free.fr> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <71f7d65f-ba29-46f1-a187-4c8347a1db03n@googlegroups.com> Subject: Re: Ada and Unicode From: Shark8 Injection-Date: Mon, 19 Apr 2021 22:40:36 +0000 Content-Type: text/plain; charset="UTF-8" Xref: reader02.eternal-september.org comp.lang.ada:61855 List-Id: On Saturday, April 17, 2021 at 4:03:14 PM UTC-6, DrPi wrote: > Hi, > > I have a good knowledge of Unicode : code points, encoding... > What I don't understand is how to manage Unicode strings with Ada. I've > read part of ARM and did some tests without success. > > I managed to be partly successful with source code encoded in Latin-1. Ah. Yes, this is an issue in GNAT, and possibly other compilers. The easiest method for me is to right-click the text-buffer for the file in GPS, click properties in the menu that pops up, then in the dialog select from the Character Set drop-down "Unicode UTF-#". > Any other encoding failed. > Any way to use source code encoded in UTF-8 ? There's the above method with GPS. IIRC there's also a Pragma and a compiler-flag for GNAT. It's actually a non-issue for Byron, because the file-reader does a BOM-check [IIRC defaulting to ASCII in the absence of a BOM] and outputs to the lexer the Wide_Wide_Character equivalent of the input-encoding. See: https://github.com/OneWingedShark/Byron/blob/master/src/reader/readington.adb > In some languages, it is possible to set a tag at the beginning of the > source file to direct the compiler which encoding to use. > I wasn't successful using -gnatW8 switch. But maybe I made to many tests > and my brain was scrambled. IIRC the gnatW8 flag sets it to UTF-8, so if your editor is saving in something else like UTF-16 BE, the compiler [probably] won't read it correctly. > Even with source code encoded in Latin-1, I've not been able to manage > Unicode strings correctly. > > What's the way to manage Unicode correctly ? I typically use the GPS file/properties method above, and then I might also use the pragma.