comp.lang.ada
 help / color / mirror / Atom feed
* XMLAda & unicode symbols
@ 2021-06-19 18:28 196...@googlemail.com
  2021-06-19 19:53 ` Jeffrey R. Carter
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-19 18:28 UTC (permalink / raw)


I'm creating SVG files with XMLAda and I need to have a degree symbol within some text.

I have:
procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
      Text_Node : DOM.Core.Element;
      Text      : DOM.Core.Text;
   begin
      Text_Node := DOM.Core.Documents.Create_Element (LDocument, "text");
      DOM.Core.Elements.Set_Attribute (Text_Node, "x", X_Pos);
      DOM.Core.Elements.Set_Attribute (Text_Node, "y", Y_Pos);
      DOM.Core.Elements.Set_Attribute (Text_Node, "class", "def-maroon");
      DOM.Core.Elements.Set_Attribute (Text_Node, "text-anchor", "left");
      Text_Node := DOM.Core.Nodes.Append_Child (Root_Node, Text_Node);
      Text := DOM.Core.Documents.Create_Text_Node (LDocument, Min_Max_Str);
      Text := DOM.Core.Nodes.Append_Child (Text_Node, Text);
   end Add_Min_Max;

and I just pass a string in. The degree symbol is unicode 00B0 and you would then normally have it as &#00B0, except if I do, then XMLAda changes that initial '&' to '&amp' and so what is then  coded is '&amp#00B0' and it fails to display properly.

Nor can I apply Unicode.Names.Latin_1_Supplement.Degree_Sign to the string, since, well, strict typing...

To me it seems like XMLAda is being far too eager and is not willing to just publish what I enter.

I raised a call on the github repository, but it was closed saying basically use the unicode name, which fails.

Does anyone have a clue how this can be done?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
@ 2021-06-19 19:53 ` Jeffrey R. Carter
  2021-06-20 17:02   ` 196...@googlemail.com
  2021-06-19 21:24 ` Simon Wright
  2021-06-21  6:07 ` Vadim Godunko
  2 siblings, 1 reply; 28+ messages in thread
From: Jeffrey R. Carter @ 2021-06-19 19:53 UTC (permalink / raw)


On 6/19/21 8:28 PM, 196...@googlemail.com wrote:
> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text.
> 
> I have:
> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is

The degree symbol is part of Latin-1, so why not include it directly in your string?

S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign;

-- 
Jeff Carter
"I would never want to belong to any club that
would have someone like me for a member."
Annie Hall
41

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
  2021-06-19 19:53 ` Jeffrey R. Carter
@ 2021-06-19 21:24 ` Simon Wright
  2021-06-20 17:10   ` 196...@googlemail.com
  2021-06-21  6:07 ` Vadim Godunko
  2 siblings, 1 reply; 28+ messages in thread
From: Simon Wright @ 2021-06-19 21:24 UTC (permalink / raw)


"196...@googlemail.com" <1963bib@googlemail.com> writes:

> I'm creating SVG files with XMLAda and I need to have a degree symbol
> within some text.
>
> I have:
> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
>       Text_Node : DOM.Core.Element;
>       Text      : DOM.Core.Text;
>    begin
>       Text_Node := DOM.Core.Documents.Create_Element (LDocument, "text");
>       DOM.Core.Elements.Set_Attribute (Text_Node, "x", X_Pos);
>       DOM.Core.Elements.Set_Attribute (Text_Node, "y", Y_Pos);
>       DOM.Core.Elements.Set_Attribute (Text_Node, "class", "def-maroon");
>       DOM.Core.Elements.Set_Attribute (Text_Node, "text-anchor", "left");
>       Text_Node := DOM.Core.Nodes.Append_Child (Root_Node, Text_Node);
>       Text := DOM.Core.Documents.Create_Text_Node (LDocument, Min_Max_Str);
>       Text := DOM.Core.Nodes.Append_Child (Text_Node, Text);
>    end Add_Min_Max;
>
> and I just pass a string in. The degree symbol is unicode 00B0 and you
> would then normally have it as &#00B0, except if I do, then XMLAda
> changes that initial '&' to '&amp' and so what is then coded is
> '&amp#00B0' and it fails to display properly.
>
> Nor can I apply Unicode.Names.Latin_1_Supplement.Degree_Sign to the
> string, since, well, strict typing...
>
> To me it seems like XMLAda is being far too eager and is not willing
> to just publish what I enter.
>
> I raised a call on the github repository, but it was closed saying
> basically use the unicode name, which fails.

Set_Attribute takes a Dom_String, which is a subtype of
Unicode.CES.Byte_Sequence, which is a subtype of String. The question
is, what encoding? I suspect it's utf-8, so we need to encode
Ada.Characters.Latin_1.Degree_Sign in utf-8, & this code using XML/Ada
support seems to do the trick:

   with Ada.Characters.Latin_1;
   with Ada.Text_IO;
   with Unicode.CES;
   with Unicode.Encodings;
   procedure Conversion is
      Fifty_Degrees_Latin1 : constant String
        := "50" & Ada.Characters.Latin_1.Degree_Sign;
      Fifty_Degrees_UTF8 : constant Unicode.CES.Byte_Sequence
        := "50"
          & Unicode.Encodings.Convert
            ((1 => Ada.Characters.Latin_1.Degree_Sign),
             From => Unicode.Encodings.Get_By_Name ("iso-8859-15"),
             To => Unicode.Encodings.Get_By_Name ("utf-8"));
   begin
      Ada.Text_IO.Put_Line (Fifty_Degrees_Latin1);
      Ada.Text_IO.Put_Line (Fifty_Degrees_UTF8);
   end Conversion;

(note that Convert's From and To parameters are the default). On this
Mac (Terminal displays utf-8 text) the first line is garbage, the second
fine.

I'm So Wildly Impressed (maybe "cast down" would be more accurate) by
all that subtyping in our wondrously safe language.

I also agree with you that suggesting you use a Unicode_Char
(Wide_Wide_Character) without saying *how* is less helpful than it could
be.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-19 19:53 ` Jeffrey R. Carter
@ 2021-06-20 17:02   ` 196...@googlemail.com
  2021-06-20 17:23     ` Dmitry A. Kazakov
  2021-06-20 18:21     ` Jeffrey R. Carter
  0 siblings, 2 replies; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-20 17:02 UTC (permalink / raw)


On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote:
> On 6/19/21 8:28 PM, 196...@googlemail.com wrote: 
> > I'm creating SVG files with XMLAda and I need to have a degree symbol within some text. 
> > 
> > I have: 
> > procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
> The degree symbol is part of Latin-1, so why not include it directly in your string? 
> 
> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign; 
> 
> -- 
> Jeff Carter 
> "I would never want to belong to any club that 
> would have someone like me for a member." 
> Annie Hall 
> 41

Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with:
raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-19 21:24 ` Simon Wright
@ 2021-06-20 17:10   ` 196...@googlemail.com
  2021-06-21 15:26     ` Simon Wright
  0 siblings, 1 reply; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-20 17:10 UTC (permalink / raw)


On Saturday, 19 June 2021 at 22:24:47 UTC+1, Simon Wright wrote:
> "196...@googlemail.com" <196...@googlemail.com> writes: 
> 
> > I'm creating SVG files with XMLAda and I need to have a degree symbol 
> > within some text. 
> > 
> > I have: 
> > procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is 
> > Text_Node : DOM.Core.Element; 
> > Text : DOM.Core.Text; 
> > begin 
> > Text_Node := DOM.Core.Documents.Create_Element (LDocument, "text"); 
> > DOM.Core.Elements.Set_Attribute (Text_Node, "x", X_Pos); 
> > DOM.Core.Elements.Set_Attribute (Text_Node, "y", Y_Pos); 
> > DOM.Core.Elements.Set_Attribute (Text_Node, "class", "def-maroon"); 
> > DOM.Core.Elements.Set_Attribute (Text_Node, "text-anchor", "left"); 
> > Text_Node := DOM.Core.Nodes.Append_Child (Root_Node, Text_Node); 
> > Text := DOM.Core.Documents.Create_Text_Node (LDocument, Min_Max_Str); 
> > Text := DOM.Core.Nodes.Append_Child (Text_Node, Text); 
> > end Add_Min_Max; 
> > 
> > and I just pass a string in. The degree symbol is unicode 00B0 and you 
> > would then normally have it as &#00B0, except if I do, then XMLAda 
> > changes that initial '&' to '&amp' and so what is then coded is 
> > '&amp#00B0' and it fails to display properly. 
> > 
> > Nor can I apply Unicode.Names.Latin_1_Supplement.Degree_Sign to the 
> > string, since, well, strict typing... 
> > 
> > To me it seems like XMLAda is being far too eager and is not willing 
> > to just publish what I enter. 
> > 
> > I raised a call on the github repository, but it was closed saying 
> > basically use the unicode name, which fails.
> Set_Attribute takes a Dom_String, which is a subtype of 
> Unicode.CES.Byte_Sequence, which is a subtype of String. The question 
> is, what encoding? I suspect it's utf-8, so we need to encode 
> Ada.Characters.Latin_1.Degree_Sign in utf-8, & this code using XML/Ada 
> support seems to do the trick: 
> 
> with Ada.Characters.Latin_1; 
> with Ada.Text_IO; 
> with Unicode.CES; 
> with Unicode.Encodings; 
> procedure Conversion is 
> Fifty_Degrees_Latin1 : constant String 
> := "50" & Ada.Characters.Latin_1.Degree_Sign; 
> Fifty_Degrees_UTF8 : constant Unicode.CES.Byte_Sequence 
> := "50" 
> & Unicode.Encodings.Convert 
> ((1 => Ada.Characters.Latin_1.Degree_Sign), 
> From => Unicode.Encodings.Get_By_Name ("iso-8859-15"), 
> To => Unicode.Encodings.Get_By_Name ("utf-8")); 
> begin 
> Ada.Text_IO.Put_Line (Fifty_Degrees_Latin1); 
> Ada.Text_IO.Put_Line (Fifty_Degrees_UTF8); 
> end Conversion; 
> 
> (note that Convert's From and To parameters are the default). On this 
> Mac (Terminal displays utf-8 text) the first line is garbage, the second 
> fine. 
> 
> I'm So Wildly Impressed (maybe "cast down" would be more accurate) by 
> all that subtyping in our wondrously safe language. 
> 
> I also agree with you that suggesting you use a Unicode_Char 
> (Wide_Wide_Character) without saying *how* is less helpful than it could 
> be.

Asking for the degree sign, was probably a slight mistake. There is Degree_Celsius and also Degree_Fahrenheit for those who have not yet embraced metric. These are the "correct" symbols.

Both of these exist in Unicode.Names.Letterlike_Symbols, and probably elsewhere,but trying to shoehorn these in seems impossible.

I just wish XMLAda could just accept whatever we throw at it, and if we need to convert it, then let us do so outside of it.

Using Text_IO is fine, but not where XMLAda is concerned.


B

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:02   ` 196...@googlemail.com
@ 2021-06-20 17:23     ` Dmitry A. Kazakov
  2021-06-20 17:58       ` 196...@googlemail.com
  2021-06-20 18:21     ` Jeffrey R. Carter
  1 sibling, 1 reply; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-20 17:23 UTC (permalink / raw)


On 2021-06-20 19:02, 196...@googlemail.com wrote:
> On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote:
>> On 6/19/21 8:28 PM, 196...@googlemail.com wrote:
>>> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text.
>>>
>>> I have:
>>> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
>> The degree symbol is part of Latin-1, so why not include it directly in your string?
>>
>> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign;
> 
> Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with:
> raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258

Maybe it expects UTF-8, as most third party Ada libraries do. In that 
case use:

    Character'Val (16#C2#) & Character'Val (16#B0#)

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:23     ` Dmitry A. Kazakov
@ 2021-06-20 17:58       ` 196...@googlemail.com
  2021-06-20 18:16         ` Dmitry A. Kazakov
  2021-06-21 15:37         ` Simon Wright
  0 siblings, 2 replies; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-20 17:58 UTC (permalink / raw)


On Sunday, 20 June 2021 at 18:23:35 UTC+1, Dmitry A. Kazakov wrote:
> On 2021-06-20 19:02, 196...@googlemail.com wrote: 
> > On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote: 
> >> On 6/19/21 8:28 PM, 196...@googlemail.com wrote: 
> >>> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text. 
> >>> 
> >>> I have: 
> >>> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is 
> >> The degree symbol is part of Latin-1, so why not include it directly in your string? 
> >> 
> >> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign; 
> >
> > Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with: 
> > raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258
> Maybe it expects UTF-8, as most third party Ada libraries do. In that 
> case use: 
> 
> Character'Val (16#C2#) & Character'Val (16#B0#) 

That's the degree symbol, what I really need is the degree centigrade symbol which is U+2103.

Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.

I'm sure it's easy enough, and when I get it, I'll be banging my head against the desk.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:58       ` 196...@googlemail.com
@ 2021-06-20 18:16         ` Dmitry A. Kazakov
  2021-06-21 19:40           ` 196...@googlemail.com
  2021-06-21 15:37         ` Simon Wright
  1 sibling, 1 reply; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-20 18:16 UTC (permalink / raw)


On 2021-06-20 19:58, 196...@googlemail.com wrote:
> On Sunday, 20 June 2021 at 18:23:35 UTC+1, Dmitry A. Kazakov wrote:
>> On 2021-06-20 19:02, 196...@googlemail.com wrote:
>>> On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote:
>>>> On 6/19/21 8:28 PM, 196...@googlemail.com wrote:
>>>>> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text.
>>>>>
>>>>> I have:
>>>>> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
>>>> The degree symbol is part of Latin-1, so why not include it directly in your string?
>>>>
>>>> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign;
>>>
>>> Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with:
>>> raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258
>> Maybe it expects UTF-8, as most third party Ada libraries do. In that
>> case use:
>>
>> Character'Val (16#C2#) & Character'Val (16#B0#)
> 
> That's the degree symbol, what I really need is the degree centigrade symbol which is U+2103.
> 
> Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.
> 
> I'm sure it's easy enough, and when I get it, I'll be banging my head against the desk.

Why do you use XMLAda? SVG is a text file, I would write directly. It is 
the reverse, rendering SVG image, that is difficult to write from scratch.

And why do you want to create SVG files?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:02   ` 196...@googlemail.com
  2021-06-20 17:23     ` Dmitry A. Kazakov
@ 2021-06-20 18:21     ` Jeffrey R. Carter
  2021-06-20 18:47       ` Dmitry A. Kazakov
  1 sibling, 1 reply; 28+ messages in thread
From: Jeffrey R. Carter @ 2021-06-20 18:21 UTC (permalink / raw)


On 6/20/21 7:02 PM, 196...@googlemail.com wrote:
> On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote:
>> On 6/19/21 8:28 PM, 196...@googlemail.com wrote:
>>> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text.
>>>
>>> I have:
>>> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is
>> The degree symbol is part of Latin-1, so why not include it directly in your string?
>>
>> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign;
>>
>> -- 
>> Jeff Carter
>> "I would never want to belong to any club that
>> would have someone like me for a member."
>> Annie Hall
>> 41
> 
> Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with:
> raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258

I would call that an error in XMLAda. Anything that uses String should accept 
any String.

The exception name indicates that XMLAda is probably misusing String to hold 
encoded Unicode text, probably with UTF-8 encoding. Any use of String as 
anything other than its intended use, as a sequence of Latin-1 characters, is a 
mistake.

-- 
Jeff Carter
"Help! Help! I'm being repressed!"
Monty Python & the Holy Grail
67

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 18:21     ` Jeffrey R. Carter
@ 2021-06-20 18:47       ` Dmitry A. Kazakov
  2021-06-20 22:50         ` Jeffrey R. Carter
  0 siblings, 1 reply; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-20 18:47 UTC (permalink / raw)


On 2021-06-20 20:21, Jeffrey R. Carter wrote:

> The exception name indicates that XMLAda is probably misusing String to 
> hold encoded Unicode text, probably with UTF-8 encoding. Any use of 
> String as anything other than its intended use, as a sequence of Latin-1 
> characters, is a mistake.

That ship has sailed. I would say that any use of String as Latin-1 is a 
mistake now because most of the libraries would use UTF-8 encoding 
instead of Latin-1. Latin is a dead language, you know... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 18:47       ` Dmitry A. Kazakov
@ 2021-06-20 22:50         ` Jeffrey R. Carter
  2021-06-21  4:16           ` Marius Amado-Alves
  2021-06-21  6:14           ` Dmitry A. Kazakov
  0 siblings, 2 replies; 28+ messages in thread
From: Jeffrey R. Carter @ 2021-06-20 22:50 UTC (permalink / raw)


On 6/20/21 8:47 PM, Dmitry A. Kazakov wrote:
> On 2021-06-20 20:21, Jeffrey R. Carter wrote:
> 
> That ship has sailed. I would say that any use of String as Latin-1 is a mistake 
> now because most of the libraries would use UTF-8 encoding instead of Latin-1. 

I have never subscribed to the illogic that if enough people make the same 
mistake, it ceases to be a mistake.

> Latin is a dead language, you know... (:-))

Some people still speak it. No one has ever spoken Unicode.

-- 
Jeff Carter
"Help! Help! I'm being repressed!"
Monty Python & the Holy Grail
67

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 22:50         ` Jeffrey R. Carter
@ 2021-06-21  4:16           ` Marius Amado-Alves
  2021-06-21  9:39             ` Jeffrey R. Carter
  2021-06-21  6:14           ` Dmitry A. Kazakov
  1 sibling, 1 reply; 28+ messages in thread
From: Marius Amado-Alves @ 2021-06-21  4:16 UTC (permalink / raw)


> No one has ever spoken Unicode.

Tell that to the billions of speakers using the thousands of languages written in the hundreds of Unicode scripts.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
  2021-06-19 19:53 ` Jeffrey R. Carter
  2021-06-19 21:24 ` Simon Wright
@ 2021-06-21  6:07 ` Vadim Godunko
  2 siblings, 0 replies; 28+ messages in thread
From: Vadim Godunko @ 2021-06-21  6:07 UTC (permalink / raw)


There is another library which can generate XML documents and use real Unicode for all data manipulations, see XML writer example in Matreshka:

http://forge.ada-ru.org/matreshka/wiki/XML/SAX

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 22:50         ` Jeffrey R. Carter
  2021-06-21  4:16           ` Marius Amado-Alves
@ 2021-06-21  6:14           ` Dmitry A. Kazakov
  1 sibling, 0 replies; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-21  6:14 UTC (permalink / raw)


On 2021-06-21 00:50, Jeffrey R. Carter wrote:
> On 6/20/21 8:47 PM, Dmitry A. Kazakov wrote:
>> On 2021-06-20 20:21, Jeffrey R. Carter wrote:
>>
>> That ship has sailed. I would say that any use of String as Latin-1 is 
>> a mistake now because most of the libraries would use UTF-8 encoding 
>> instead of Latin-1. 
> 
> I have never subscribed to the illogic that if enough people make the 
> same mistake, it ceases to be a mistake.

The mistake is on the Ada type system design side. People repurposed 
Latin-1 strings for UTF-8 strings because there was no other feasible way.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21  4:16           ` Marius Amado-Alves
@ 2021-06-21  9:39             ` Jeffrey R. Carter
  0 siblings, 0 replies; 28+ messages in thread
From: Jeffrey R. Carter @ 2021-06-21  9:39 UTC (permalink / raw)


On 6/21/21 6:16 AM, Marius Amado-Alves wrote:
>> No one has ever spoken Unicode.
> 
> Tell that to the billions of speakers using the thousands of languages written in the hundreds of Unicode scripts.

None of whom has ever spoken Unicode.

-- 
Jeff Carter
"[I]f we should ever separate, my little plum,
I want to give you one little bit of fatherly advice. ... Never
give a sucker an even break."
Poppy
97

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:10   ` 196...@googlemail.com
@ 2021-06-21 15:26     ` Simon Wright
  2021-06-21 18:33       ` Emmanuel Briot
  2021-06-21 21:22       ` Simon Wright
  0 siblings, 2 replies; 28+ messages in thread
From: Simon Wright @ 2021-06-21 15:26 UTC (permalink / raw)


"196...@googlemail.com" <1963bib@googlemail.com> writes:

> Asking for the degree sign, was probably a slight mistake. There is
> Degree_Celsius and also Degree_Fahrenheit for those who have not yet
> embraced metric. These are the "correct" symbols.

You might equally have meant angular degrees.

> Both of these exist in Unicode.Names.Letterlike_Symbols, and probably
> elsewhere,but trying to shoehorn these in seems impossible.

A scan through XML/Ada shows that the only uses of Unicode_Char are in
the SAX subset. I don't see any way in the DOM subset of XML/Ada of
using them - someone please prove me wrong!

You could build a Unicode_Char to UTF_8_String converter using
Ada.Strings.UTF_Encoding.Wide_Wide_Strings, ARM 4.11(30)
http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-A-4-11.html#p30

> I just wish XMLAda could just accept whatever we throw at it, and if
> we need to convert it, then let us do so outside of it.

That is *exactly* what you have to do (convert outside, not throw any
old sequence of octets and 32-bit values somehow mashed together at
it). It wants a utf-8-encoded string (though XML/Ada doesn't seem to say
so - RFC 3076 implies it, 7303 (8.1) recommends it).

OK, Text_IO might not prove the point to you, but what about this?

   with Ada.Characters.Latin_1;
   with DOM.Core.Documents;
   with DOM.Core.Elements;
   with DOM.Core.Nodes;
   with DOM.Core;
   with Unicode.CES;
   with Unicode.Encodings;

   procedure Utf is
      Impl : DOM.Core.DOM_Implementation;
      Doc : DOM.Core.Document;
      Dummy, Element : DOM.Core.Node;
      Fifty_Degrees_Latin1 : constant String
        := "50" & Ada.Characters.Latin_1.Degree_Sign;
      Fifty_Degrees_UTF8 : constant Unicode.CES.Byte_Sequence
        := Unicode.Encodings.Convert
          (Fifty_Degrees_Latin1,
           From => Unicode.Encodings.Get_By_Name ("iso-8859-15"),
           To => Unicode.Encodings.Get_By_Name ("utf-8"));
   begin
      Doc := DOM.Core.Create_Document (Impl);

      Element := DOM.Core.Documents.Create_Element (Doc, "utf");
      DOM.Core.Elements.Set_Attribute (Element, "temp", Fifty_Degrees_UTF8);
      Dummy := DOM.Core.Nodes.Append_Child (Doc, Element);

      DOM.Core.Nodes.Print (Doc);
   end Utf;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 17:58       ` 196...@googlemail.com
  2021-06-20 18:16         ` Dmitry A. Kazakov
@ 2021-06-21 15:37         ` Simon Wright
  2021-06-21 19:49           ` 196...@googlemail.com
  1 sibling, 1 reply; 28+ messages in thread
From: Simon Wright @ 2021-06-21 15:37 UTC (permalink / raw)


"196...@googlemail.com" <1963bib@googlemail.com> writes:

> That's the degree symbol, what I really need is the degree centigrade
> symbol which is U+2103.
>
> Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.

That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 15:26     ` Simon Wright
@ 2021-06-21 18:33       ` Emmanuel Briot
  2021-06-21 20:06         ` 196...@googlemail.com
  2021-06-21 21:26         ` Simon Wright
  2021-06-21 21:22       ` Simon Wright
  1 sibling, 2 replies; 28+ messages in thread
From: Emmanuel Briot @ 2021-06-21 18:33 UTC (permalink / raw)


> A scan through XML/Ada shows that the only uses of Unicode_Char are in 
> the SAX subset. I don't see any way in the DOM subset of XML/Ada of 
> using them - someone please prove me wrong! 

Those two subsets are not independent, in fact the DOM subset is entirely based on the SAX one.
So anything that applies to SAX also applies to DOM.

That said, the DOM standard (at the time I built XML/Ada, which is 20 years ago whereabouts) likely
did not have standard functions that receives unicode characters, only strings.
DOM implementations are free to use any internal representation they want, and I think they did not
have to accept any random encoding. XML/Ada is not user-friendly, it really is only a fairly low-level
implementation of the DOM standard. Using DOM without high-level things like XPath is a real
pain. At the time, someone else had done an XPath implementation, so I never took the time to
duplicate that effort.

Conversion between various encodings (8bit, unicode utf-8, utf-16 or utf-32) is done via the
`unicode` module of XML/Ada, namely for instance `unicode-ces-utf8.ads`. They all provide a similar API. In this case
you want the `Encode` procedure. This is not a function (so doesn't return a Byte_Sequence directly) for efficiency
reason, even if it would be convenient for end-users, admittedly.

As someone rightly mentioned, it doesn't really make sense to use XML/Ada to build a tree in memory just for the
sake of printing it, though. Ada.Text_IO or streams will be much much more efficient. XML/Ada is only useful
to parse XML streams (in which case you never have to yourself encode a character to a byte sequence in
general).

> > we need to convert it, then let us do so outside of it.
> That is *exactly* what you have to do (convert outside, not throw any 
> old sequence of octets and 32-bit values somehow mashed together at 
> it

Well said Simon, thanks. Basically, the whole application should be utf-8 if you at all care about international
characters (if you don't, feel free to use latin-1, or any encoding your terminal supports). So conversion should not
occur just at the interface to XML/Ada, but only on input and output of your program.
XML/Ada just assumes a string is a sequence of bytes. The actual encoding has to be known by the application,
and be consistent.
If for some reason (Windows ?) you prefer utf-16 internally, you can change `sax-encodings.ads` and recompile.
(would have been neater to use generic traits packages, but I did not realize about them until a few years later).

It would also have been nicer to use a string type that knows about the encoding. I wrote GNATCOLL.Strings for
that purpose several years alter too. XML/Ada was never used extensively, so it was never a priority for AdaCore
to update it to use all these packages, at the risk of either breaking backward compatibility, or duplicating the
whole API to allow for the various string types. Not worth it.

Emmanuel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-20 18:16         ` Dmitry A. Kazakov
@ 2021-06-21 19:40           ` 196...@googlemail.com
  2021-06-21 20:18             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-21 19:40 UTC (permalink / raw)


On Sunday, 20 June 2021 at 19:16:57 UTC+1, Dmitry A. Kazakov wrote:
> On 2021-06-20 19:58, 196...@googlemail.com wrote: 
> > On Sunday, 20 June 2021 at 18:23:35 UTC+1, Dmitry A. Kazakov wrote: 
> >> On 2021-06-20 19:02, 196...@googlemail.com wrote: 
> >>> On Saturday, 19 June 2021 at 20:53:49 UTC+1, Jeffrey R. Carter wrote: 
> >>>> On 6/19/21 8:28 PM, 196...@googlemail.com wrote: 
> >>>>> I'm creating SVG files with XMLAda and I need to have a degree symbol within some text. 
> >>>>> 
> >>>>> I have: 
> >>>>> procedure Add_Min_Max (Min_Max_Str : String; X_Pos : String; Y_Pos : String) is 
> >>>> The degree symbol is part of Latin-1, so why not include it directly in your string? 
> >>>> 
> >>>> S : constant String := "50" & Ada.Characters.Handling.Latin_1.Degree_Sign; 
> >>> 
> >>> Unfortunately, when XMLAda comes to exporting the DOM tree, it crashed with: 
> >>> raised UNICODE.CES.INVALID_ENCODING : unicode-ces-utf8.adb:258 
> >> Maybe it expects UTF-8, as most third party Ada libraries do. In that 
> >> case use: 
> >> 
> >> Character'Val (16#C2#) & Character'Val (16#B0#) 
> > 
> > That's the degree symbol, what I really need is the degree centigrade symbol which is U+2103. 
> > 
> > Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime. 
> > 
> > I'm sure it's easy enough, and when I get it, I'll be banging my head against the desk.
> Why do you use XMLAda? SVG is a text file, I would write directly. It is 
> the reverse, rendering SVG image, that is difficult to write from scratch. 
> 
> And why do you want to create SVG files?
> -- 
> Regards, 
> Dmitry A. Kazakov 
> http://www.dmitry-kazakov.de

I am using XML/Ada as I wish to do it "properly", it's the way you learn.

As for SVG, I am graphing temps, humidity & pressure, and when you zoom in, it still looks sharp. The previous system I coded in C, used png's which were screwed up when Google screwed up and forced HDPI settings on chrome users. THE svg's will also contain code to highlight etc points.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 15:37         ` Simon Wright
@ 2021-06-21 19:49           ` 196...@googlemail.com
  2021-06-21 20:23             ` Dmitry A. Kazakov
                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-21 19:49 UTC (permalink / raw)


On Monday, 21 June 2021 at 16:37:21 UTC+1, Simon Wright wrote:
> "196...@googlemail.com" <196...@googlemail.com> writes: 
> 
> > That's the degree symbol, what I really need is the degree centigrade 
> > symbol which is U+2103. 
> > 
> > Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.
> That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83

Yup, that works, but just how the heck do you get from U+2103 to those 3 octets?

I can see from http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=2103&mode=hex that it gives it.

Anyway, the dent in my desk is now a couple of mill deeper.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 18:33       ` Emmanuel Briot
@ 2021-06-21 20:06         ` 196...@googlemail.com
  2021-06-21 21:26         ` Simon Wright
  1 sibling, 0 replies; 28+ messages in thread
From: 196...@googlemail.com @ 2021-06-21 20:06 UTC (permalink / raw)


On Monday, 21 June 2021 at 19:33:58 UTC+1, briot.e...@gmail.com wrote:
> > A scan through XML/Ada shows that the only uses of Unicode_Char are in 
> > the SAX subset. I don't see any way in the DOM subset of XML/Ada of 
> > using them - someone please prove me wrong!
> Those two subsets are not independent, in fact the DOM subset is entirely based on the SAX one. 
> So anything that applies to SAX also applies to DOM. 
> 
> That said, the DOM standard (at the time I built XML/Ada, which is 20 years ago whereabouts) likely 
> did not have standard functions that receives unicode characters, only strings. 
> DOM implementations are free to use any internal representation they want, and I think they did not 
> have to accept any random encoding. XML/Ada is not user-friendly, it really is only a fairly low-level 
> implementation of the DOM standard. Using DOM without high-level things like XPath is a real 
> pain. At the time, someone else had done an XPath implementation, so I never took the time to 
> duplicate that effort. 
> 
> Conversion between various encodings (8bit, unicode utf-8, utf-16 or utf-32) is done via the 
> `unicode` module of XML/Ada, namely for instance `unicode-ces-utf8.ads`. They all provide a similar API. In this case 
> you want the `Encode` procedure. This is not a function (so doesn't return a Byte_Sequence directly) for efficiency 
> reason, even if it would be convenient for end-users, admittedly. 
> 
> As someone rightly mentioned, it doesn't really make sense to use XML/Ada to build a tree in memory just for the 
> sake of printing it, though. Ada.Text_IO or streams will be much much more efficient. XML/Ada is only useful 
> to parse XML streams (in which case you never have to yourself encode a character to a byte sequence in 
> general).
> > > we need to convert it, then let us do so outside of it. 
> > That is *exactly* what you have to do (convert outside, not throw any 
> > old sequence of octets and 32-bit values somehow mashed together at 
> > it
> Well said Simon, thanks. Basically, the whole application should be utf-8 if you at all care about international 
> characters (if you don't, feel free to use latin-1, or any encoding your terminal supports). So conversion should not 
> occur just at the interface to XML/Ada, but only on input and output of your program. 
> XML/Ada just assumes a string is a sequence of bytes. The actual encoding has to be known by the application, 
> and be consistent. 
> If for some reason (Windows ?) you prefer utf-16 internally, you can change `sax-encodings.ads` and recompile. 
> (would have been neater to use generic traits packages, but I did not realize about them until a few years later). 
> 
> It would also have been nicer to use a string type that knows about the encoding. I wrote GNATCOLL.Strings for 
> that purpose several years alter too. XML/Ada was never used extensively, so it was never a priority for AdaCore 
> to update it to use all these packages, at the risk of either breaking backward compatibility, or duplicating the 
> whole API to allow for the various string types. Not worth it. 
> 
> Emmanuel

Okay, now I think I am getting somewhere. A push and a prod is always welcome.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 19:40           ` 196...@googlemail.com
@ 2021-06-21 20:18             ` Dmitry A. Kazakov
  0 siblings, 0 replies; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-21 20:18 UTC (permalink / raw)


On 2021-06-21 21:40, 196...@googlemail.com wrote:

> I am using XML/Ada as I wish to do it "properly", it's the way you learn.

It is a huge overhead, and, honestly, there is nothing useful to learn 
about XML.

> As for SVG, I am graphing temps, humidity & pressure, and when you zoom in, it still looks sharp.

Why do not you render things directly? Rendering SVG files for the 
purpose is like scratch behind the ear with your foot.

> The previous system I coded in C, used png's which were screwed up when Google screwed up and forced HDPI settings on chrome users. THE svg's will also contain code to highlight etc points.

Is it a HTTP server you are doing?

Even more, I would never write any files rather generate page content on 
the fly embedding all images.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 19:49           ` 196...@googlemail.com
@ 2021-06-21 20:23             ` Dmitry A. Kazakov
  2021-06-21 20:47             ` Simon Wright
  2021-06-22  0:30             ` Spiros Bousbouras
  2 siblings, 0 replies; 28+ messages in thread
From: Dmitry A. Kazakov @ 2021-06-21 20:23 UTC (permalink / raw)


On 2021-06-21 21:49, 196...@googlemail.com wrote:
> On Monday, 21 June 2021 at 16:37:21 UTC+1, Simon Wright wrote:
>> "196...@googlemail.com" <196...@googlemail.com> writes:
>>
>>> That's the degree symbol, what I really need is the degree centigrade
>>> symbol which is U+2103.
>>>
>>> Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.
>> That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83
> 
> Yup, that works, but just how the heck do you get from U+2103 to those 3 octets?

This is how UTF-8 encoding works. It is variable length. Lager the code 
point is more octets you need.

    https://en.wikipedia.org/wiki/UTF-8

has a nice table explaining how code point bits gets distributed across 
the octets.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 19:49           ` 196...@googlemail.com
  2021-06-21 20:23             ` Dmitry A. Kazakov
@ 2021-06-21 20:47             ` Simon Wright
  2021-06-22  0:30             ` Spiros Bousbouras
  2 siblings, 0 replies; 28+ messages in thread
From: Simon Wright @ 2021-06-21 20:47 UTC (permalink / raw)


"196...@googlemail.com" <1963bib@googlemail.com> writes:

> On Monday, 21 June 2021 at 16:37:21 UTC+1, Simon Wright wrote:
>> "196...@googlemail.com" <196...@googlemail.com> writes: 
>> 
>> > That's the degree symbol, what I really need is the degree centigrade 
>> > symbol which is U+2103. 
>> > 
>> > Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.
>> That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83
>
> Yup, that works, but just how the heck do you get from U+2103 to those
> 3 octets?
>
> I can see from
> http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=2103&mode=hex that it
> gives it.

Google was my friend (not that site, tho)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 15:26     ` Simon Wright
  2021-06-21 18:33       ` Emmanuel Briot
@ 2021-06-21 21:22       ` Simon Wright
  1 sibling, 0 replies; 28+ messages in thread
From: Simon Wright @ 2021-06-21 21:22 UTC (permalink / raw)


Simon Wright <simon@pushface.org> writes:

> A scan through XML/Ada shows that the only uses of Unicode_Char are in
> the SAX subset. I don't see any way in the DOM subset of XML/Ada of
> using them - someone please prove me wrong!

I missed Unicode itself.

   function To_Utf8 (U : Unicode.Unicode_Char)
                    return Unicode.CES.Byte_Sequence
   is
      Bytes : Unicode.CES.Byte_Sequence (1 .. 8);
      Index : Natural := 0; -- "previously written" position
   begin
      Unicode.CES.Utf8.Encode (U,
                               Output => Bytes,
                               Index => Index);
      return Bytes (1 .. Index);
   end To_Utf8;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 18:33       ` Emmanuel Briot
  2021-06-21 20:06         ` 196...@googlemail.com
@ 2021-06-21 21:26         ` Simon Wright
  2021-06-22  6:52           ` Emmanuel Briot
  1 sibling, 1 reply; 28+ messages in thread
From: Simon Wright @ 2021-06-21 21:26 UTC (permalink / raw)


Emmanuel Briot <briot.emmanuel@gmail.com> writes:

> At the time, someone else had done an XPath implementation, so I never
> took the time to duplicate that effort.

e.g. Marc Criley, source now at https://github.com/simonjwright/xia

I'm working on a more useful README (the doc/ folder was never
wonderful, but relied on AdaBrowse, which relies on ASIS, which ...)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 19:49           ` 196...@googlemail.com
  2021-06-21 20:23             ` Dmitry A. Kazakov
  2021-06-21 20:47             ` Simon Wright
@ 2021-06-22  0:30             ` Spiros Bousbouras
  2 siblings, 0 replies; 28+ messages in thread
From: Spiros Bousbouras @ 2021-06-22  0:30 UTC (permalink / raw)


On Mon, 21 Jun 2021 12:49:01 -0700 (PDT)
"196...@googlemail.com" <1963bib@googlemail.com> wrote:
> On Monday, 21 June 2021 at 16:37:21 UTC+1, Simon Wright wrote:
> > "196...@googlemail.com" <196...@googlemail.com> writes: 
> > 
> > > That's the degree symbol, what I really need is the degree centigrade 
> > > symbol which is U+2103. 
> > > 
> > > Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime.
> > That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83
> 
> Yup, that works, but just how the heck do you get from U+2103 to those 3 octets?

If you don't mind downloading and compiling a small C programme you can use my
own  http://vlaho.ninja/prog/#literal .If you name the executable  lit  then
    lit -h u2103h
gives
    E2 84 83

> I can see from http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=2103&mode=hex that it gives it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: XMLAda & unicode symbols
  2021-06-21 21:26         ` Simon Wright
@ 2021-06-22  6:52           ` Emmanuel Briot
  0 siblings, 0 replies; 28+ messages in thread
From: Emmanuel Briot @ 2021-06-22  6:52 UTC (permalink / raw)


> e.g. Marc Criley, source now at https://github.com/simonjwright/xia 

That was indeed Marc ! As I remember, he was trying to sell support for this package, too, so it
would have been undermining his effort.

> I'm working on a more useful README (the doc/ folder was never 
> wonderful, but relied on AdaBrowse, which relies on ASIS, which ...)

All in good hands now ! :-)

I had also missed (and forgotten) about the `Unicode.To_UTF8` function.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-06-22  6:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
2021-06-19 19:53 ` Jeffrey R. Carter
2021-06-20 17:02   ` 196...@googlemail.com
2021-06-20 17:23     ` Dmitry A. Kazakov
2021-06-20 17:58       ` 196...@googlemail.com
2021-06-20 18:16         ` Dmitry A. Kazakov
2021-06-21 19:40           ` 196...@googlemail.com
2021-06-21 20:18             ` Dmitry A. Kazakov
2021-06-21 15:37         ` Simon Wright
2021-06-21 19:49           ` 196...@googlemail.com
2021-06-21 20:23             ` Dmitry A. Kazakov
2021-06-21 20:47             ` Simon Wright
2021-06-22  0:30             ` Spiros Bousbouras
2021-06-20 18:21     ` Jeffrey R. Carter
2021-06-20 18:47       ` Dmitry A. Kazakov
2021-06-20 22:50         ` Jeffrey R. Carter
2021-06-21  4:16           ` Marius Amado-Alves
2021-06-21  9:39             ` Jeffrey R. Carter
2021-06-21  6:14           ` Dmitry A. Kazakov
2021-06-19 21:24 ` Simon Wright
2021-06-20 17:10   ` 196...@googlemail.com
2021-06-21 15:26     ` Simon Wright
2021-06-21 18:33       ` Emmanuel Briot
2021-06-21 20:06         ` 196...@googlemail.com
2021-06-21 21:26         ` Simon Wright
2021-06-22  6:52           ` Emmanuel Briot
2021-06-21 21:22       ` Simon Wright
2021-06-21  6:07 ` Vadim Godunko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox