comp.lang.ada
 help / color / mirror / Atom feed
From: Vadim Godunko <vgodunko@gmail.com>
Subject: Re: Ada and Unicode
Date: Mon, 19 Apr 2021 06:18:22 -0700 (PDT)	[thread overview]
Message-ID: <f9d91cb0-c9bb-4d42-a1a9-0cd546da436cn@googlegroups.com> (raw)
In-Reply-To: <607b5b20$0$27442$426a74cc@news.free.fr>

On Sunday, April 18, 2021 at 1:03:14 AM UTC+3, DrPi wrote:
> 
> I have a good knowledge of Unicode : code points, encoding... 
> What I don't understand is how to manage Unicode strings with Ada. I've 
> read part of ARM and did some tests without success. 
> 
> I managed to be partly successful with source code encoded in Latin-1. 
> Any other encoding failed. 
> Any way to use source code encoded in UTF-8 ? 
> In some languages, it is possible to set a tag at the beginning of the 
> source file to direct the compiler which encoding to use. 
> I wasn't successful using -gnatW8 switch. But maybe I made to many tests 
> and my brain was scrambled. 
> 
> Even with source code encoded in Latin-1, I've not been able to manage 
> Unicode strings correctly. 
> 
> What's the way to manage Unicode correctly ? 
> 

Ada doesn't have good Unicode support. :( So, you need to find suitable set of "workarounds".

There are few different aspects of Unicode support need to be considered:

1. Representation of string literals. If you want to use non-ASCII characters in source code, you need to use -gnatW8 switch and it will require use of Wide_Wide_String everywhere.
2. Internal representation during application execution. You are forced to use Wide_Wide_String at previous step, so it will be UCS4/UTF32.
3. Text encoding/decoding on input/output operations. GNAT allows to use UTF-8 by providing some magic string for Form parameter of Text_IO.

It is hard to say that it is reasonable set of features for modern world. To fix some of drawbacks of current situation we are developing new text processing library, know as VSS. 

https://github.com/AdaCore/VSS

At current stage it provides encoding independent API for text manipulation, encoders and decoders API for I/O, and JSON reader/writer; regexp support should come soon.

Encoding independent API means that application always use Unicode characters to process text, independently from the real encoding used to store information in memory (UTF-8 is used for now, UTF-16 will be added later for interoperability with Windows API and WASM). Coders and encoders allow translation from/to different encodings when application exchange information with the world.

  parent reply	other threads:[~2021-04-19 13:18 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-17 22:03 Ada and Unicode DrPi
2021-04-18  0:02 ` Luke A. Guest
2021-04-19  9:09   ` DrPi
2021-04-19  8:29 ` Maxim Reznik
2021-04-19  9:28   ` DrPi
2021-04-19 13:50     ` Maxim Reznik
2021-04-19 15:51       ` DrPi
2021-04-19 11:15   ` Simon Wright
2021-04-19 11:50     ` Luke A. Guest
2021-04-19 15:53     ` DrPi
2022-04-03 19:20     ` Thomas
2022-04-04  6:10       ` Vadim Godunko
2022-04-04 14:19         ` Simon Wright
2022-04-04 15:11           ` Simon Wright
2022-04-05  7:59           ` Vadim Godunko
2022-04-08  9:01             ` Simon Wright
2023-03-30 23:35         ` Thomas
2022-04-04 14:33       ` Simon Wright
2021-04-19  9:08 ` Stephen Leake
2021-04-19  9:34   ` Dmitry A. Kazakov
2021-04-19 11:56   ` Luke A. Guest
2021-04-19 12:13     ` Luke A. Guest
2021-04-19 15:48       ` DrPi
2021-04-19 12:52     ` Dmitry A. Kazakov
2021-04-19 13:00       ` Luke A. Guest
2021-04-19 13:10         ` Dmitry A. Kazakov
2021-04-19 13:15           ` Luke A. Guest
2021-04-19 13:31             ` Dmitry A. Kazakov
2022-04-03 17:24               ` Thomas
2021-04-19 13:24         ` J-P. Rosen
2021-04-20 19:13           ` Randy Brukardt
2022-04-03 18:04           ` Thomas
2022-04-06 18:57             ` J-P. Rosen
2022-04-07  1:30               ` Randy Brukardt
2022-04-08  8:56                 ` Simon Wright
2022-04-08  9:26                   ` Dmitry A. Kazakov
2022-04-08 19:19                     ` Simon Wright
2022-04-08 19:45                       ` Dmitry A. Kazakov
2022-04-09  4:05                         ` Randy Brukardt
2022-04-09  7:43                           ` Simon Wright
2022-04-09 10:27                           ` DrPi
2022-04-09 16:46                             ` Dennis Lee Bieber
2022-04-09 18:59                               ` DrPi
2022-04-10  5:58                             ` Vadim Godunko
2022-04-10 18:59                               ` DrPi
2022-04-12  6:13                               ` Randy Brukardt
2021-04-19 16:07         ` DrPi
2021-04-20 19:06         ` Randy Brukardt
2022-04-03 18:37           ` Thomas
2022-04-04 23:52             ` Randy Brukardt
2023-03-31  3:06               ` Thomas
2023-04-01 10:18                 ` Randy Brukardt
2021-04-19 16:14   ` DrPi
2021-04-19 17:12     ` Björn Lundin
2021-04-19 19:44       ` DrPi
2022-04-16  2:32   ` Thomas
2021-04-19 13:18 ` Vadim Godunko [this message]
2022-04-03 16:51   ` Thomas
2023-04-04  0:02     ` Thomas
2021-04-19 22:40 ` Shark8
2021-04-20 15:05   ` Simon Wright
2021-04-20 19:17     ` Randy Brukardt
2021-04-20 20:04       ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox