comp.lang.ada
 help / color / mirror / Atom feed
From: "196...@googlemail.com" <1963bib@googlemail.com>
Subject: Re: XMLAda & unicode symbols
Date: Mon, 21 Jun 2021 13:06:58 -0700 (PDT)	[thread overview]
Message-ID: <7da5a442-2ad9-4bfd-9d6c-c8885da02d05n@googlegroups.com> (raw)
In-Reply-To: <8d443406-48dc-4d4e-868c-832caabebd1en@googlegroups.com>

On Monday, 21 June 2021 at 19:33:58 UTC+1, briot.e...@gmail.com wrote:
> > A scan through XML/Ada shows that the only uses of Unicode_Char are in 
> > the SAX subset. I don't see any way in the DOM subset of XML/Ada of 
> > using them - someone please prove me wrong!
> Those two subsets are not independent, in fact the DOM subset is entirely based on the SAX one. 
> So anything that applies to SAX also applies to DOM. 
> 
> That said, the DOM standard (at the time I built XML/Ada, which is 20 years ago whereabouts) likely 
> did not have standard functions that receives unicode characters, only strings. 
> DOM implementations are free to use any internal representation they want, and I think they did not 
> have to accept any random encoding. XML/Ada is not user-friendly, it really is only a fairly low-level 
> implementation of the DOM standard. Using DOM without high-level things like XPath is a real 
> pain. At the time, someone else had done an XPath implementation, so I never took the time to 
> duplicate that effort. 
> 
> Conversion between various encodings (8bit, unicode utf-8, utf-16 or utf-32) is done via the 
> `unicode` module of XML/Ada, namely for instance `unicode-ces-utf8.ads`. They all provide a similar API. In this case 
> you want the `Encode` procedure. This is not a function (so doesn't return a Byte_Sequence directly) for efficiency 
> reason, even if it would be convenient for end-users, admittedly. 
> 
> As someone rightly mentioned, it doesn't really make sense to use XML/Ada to build a tree in memory just for the 
> sake of printing it, though. Ada.Text_IO or streams will be much much more efficient. XML/Ada is only useful 
> to parse XML streams (in which case you never have to yourself encode a character to a byte sequence in 
> general).
> > > we need to convert it, then let us do so outside of it. 
> > That is *exactly* what you have to do (convert outside, not throw any 
> > old sequence of octets and 32-bit values somehow mashed together at 
> > it
> Well said Simon, thanks. Basically, the whole application should be utf-8 if you at all care about international 
> characters (if you don't, feel free to use latin-1, or any encoding your terminal supports). So conversion should not 
> occur just at the interface to XML/Ada, but only on input and output of your program. 
> XML/Ada just assumes a string is a sequence of bytes. The actual encoding has to be known by the application, 
> and be consistent. 
> If for some reason (Windows ?) you prefer utf-16 internally, you can change `sax-encodings.ads` and recompile. 
> (would have been neater to use generic traits packages, but I did not realize about them until a few years later). 
> 
> It would also have been nicer to use a string type that knows about the encoding. I wrote GNATCOLL.Strings for 
> that purpose several years alter too. XML/Ada was never used extensively, so it was never a priority for AdaCore 
> to update it to use all these packages, at the risk of either breaking backward compatibility, or duplicating the 
> whole API to allow for the various string types. Not worth it. 
> 
> Emmanuel

Okay, now I think I am getting somewhere. A push and a prod is always welcome.

  reply	other threads:[~2021-06-21 20:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
2021-06-19 19:53 ` Jeffrey R. Carter
2021-06-20 17:02   ` 196...@googlemail.com
2021-06-20 17:23     ` Dmitry A. Kazakov
2021-06-20 17:58       ` 196...@googlemail.com
2021-06-20 18:16         ` Dmitry A. Kazakov
2021-06-21 19:40           ` 196...@googlemail.com
2021-06-21 20:18             ` Dmitry A. Kazakov
2021-06-21 15:37         ` Simon Wright
2021-06-21 19:49           ` 196...@googlemail.com
2021-06-21 20:23             ` Dmitry A. Kazakov
2021-06-21 20:47             ` Simon Wright
2021-06-22  0:30             ` Spiros Bousbouras
2021-06-20 18:21     ` Jeffrey R. Carter
2021-06-20 18:47       ` Dmitry A. Kazakov
2021-06-20 22:50         ` Jeffrey R. Carter
2021-06-21  4:16           ` Marius Amado-Alves
2021-06-21  9:39             ` Jeffrey R. Carter
2021-06-21  6:14           ` Dmitry A. Kazakov
2021-06-19 21:24 ` Simon Wright
2021-06-20 17:10   ` 196...@googlemail.com
2021-06-21 15:26     ` Simon Wright
2021-06-21 18:33       ` Emmanuel Briot
2021-06-21 20:06         ` 196...@googlemail.com [this message]
2021-06-21 21:26         ` Simon Wright
2021-06-22  6:52           ` Emmanuel Briot
2021-06-21 21:22       ` Simon Wright
2021-06-21  6:07 ` Vadim Godunko
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox