From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.5-pre1 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!yy9MKEJN2ULhWGfnfq4v5w.user.gioia.aioe.org.POSTED!not-for-mail From: Simon Wright Newsgroups: comp.lang.ada Subject: Re: XMLAda & unicode symbols Date: Mon, 21 Jun 2021 16:26:01 +0100 Organization: Aioe.org NNTP Server Message-ID: References: NNTP-Posting-Host: yy9MKEJN2ULhWGfnfq4v5w.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain X-Complaints-To: abuse@aioe.org User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (darwin) Cancel-Lock: sha1:nYKXMhoeMWRg1caXzJmxEvyRwow= X-Notice: Filtered by postfilter v. 0.9.2 Xref: reader02.eternal-september.org comp.lang.ada:62267 List-Id: "196...@googlemail.com" <1963bib@googlemail.com> writes: > Asking for the degree sign, was probably a slight mistake. There is > Degree_Celsius and also Degree_Fahrenheit for those who have not yet > embraced metric. These are the "correct" symbols. You might equally have meant angular degrees. > Both of these exist in Unicode.Names.Letterlike_Symbols, and probably > elsewhere,but trying to shoehorn these in seems impossible. A scan through XML/Ada shows that the only uses of Unicode_Char are in the SAX subset. I don't see any way in the DOM subset of XML/Ada of using them - someone please prove me wrong! You could build a Unicode_Char to UTF_8_String converter using Ada.Strings.UTF_Encoding.Wide_Wide_Strings, ARM 4.11(30) http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-A-4-11.html#p30 > I just wish XMLAda could just accept whatever we throw at it, and if > we need to convert it, then let us do so outside of it. That is *exactly* what you have to do (convert outside, not throw any old sequence of octets and 32-bit values somehow mashed together at it). It wants a utf-8-encoded string (though XML/Ada doesn't seem to say so - RFC 3076 implies it, 7303 (8.1) recommends it). OK, Text_IO might not prove the point to you, but what about this? with Ada.Characters.Latin_1; with DOM.Core.Documents; with DOM.Core.Elements; with DOM.Core.Nodes; with DOM.Core; with Unicode.CES; with Unicode.Encodings; procedure Utf is Impl : DOM.Core.DOM_Implementation; Doc : DOM.Core.Document; Dummy, Element : DOM.Core.Node; Fifty_Degrees_Latin1 : constant String := "50" & Ada.Characters.Latin_1.Degree_Sign; Fifty_Degrees_UTF8 : constant Unicode.CES.Byte_Sequence := Unicode.Encodings.Convert (Fifty_Degrees_Latin1, From => Unicode.Encodings.Get_By_Name ("iso-8859-15"), To => Unicode.Encodings.Get_By_Name ("utf-8")); begin Doc := DOM.Core.Create_Document (Impl); Element := DOM.Core.Documents.Create_Element (Doc, "utf"); DOM.Core.Elements.Set_Attribute (Element, "temp", Fifty_Degrees_UTF8); Dummy := DOM.Core.Nodes.Append_Child (Doc, Element); DOM.Core.Nodes.Print (Doc); end Utf;