From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.5-pre1 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!5WHqCw2XxjHb2npjM9GYbw.user.gioia.aioe.org.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: XMLAda & unicode symbols Date: Mon, 21 Jun 2021 22:23:09 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <491a3435-63c5-464d-83ac-6b82ac39b7d6n@googlegroups.com> <874bd6ad-df44-4d63-bb6c-9d2941781e6en@googlegroups.com> NNTP-Posting-Host: 5WHqCw2XxjHb2npjM9GYbw.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 Content-Language: en-US X-Notice: Filtered by postfilter v. 0.9.2 Xref: reader02.eternal-september.org comp.lang.ada:62275 List-Id: On 2021-06-21 21:49, 196...@googlemail.com wrote: > On Monday, 21 June 2021 at 16:37:21 UTC+1, Simon Wright wrote: >> "196...@googlemail.com" <196...@googlemail.com> writes: >> >>> That's the degree symbol, what I really need is the degree centigrade >>> symbol which is U+2103. >>> >>> Having Character'Val (16#21#) & Character'Val (16#03#) fails at runtime. >> That's because the utf-8 encoding is 3 octets, 0xE2 0x84 0x83 > > Yup, that works, but just how the heck do you get from U+2103 to those 3 octets? This is how UTF-8 encoding works. It is variable length. Lager the code point is more octets you need. https://en.wikipedia.org/wiki/UTF-8 has a nice table explaining how code point bits gets distributed across the octets. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de