From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Thu, 4 Nov 2021 13:13:12 +0100 [thread overview]
Message-ID: <sm0ion$1m0r$1@gioia.aioe.org> (raw)
In-Reply-To: c1973b0d-7f3e-487f-8766-586b2d8c69edn@googlegroups.com
On 2021-11-04 12:43, Marius Amado-Alves wrote:
> Great libraries, thanks.
>
> It still seems to me that Wide_Wide_Character is useful. It allows to represent the character directly in the sourcecode e.g.
>
> if C = '±' then ...
If the source supports Unicode, it should do UTF-8 as well. So, you
would write
if C = "±" then ...
where C is String.
> And Wide_Wide_Character'Pos should give the codepoint.
Yes, but you need no Wide_Wide to get an integer value and if your
objective is Unicode categorization, that is too complicated for manual
comparisons. Use a library function [generated from UnicodeData.txt]
instead.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2021-11-04 12:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03 7:43 ` Vadim Godunko
2021-11-03 8:48 ` Luke A. Guest
2021-11-04 11:43 ` Marius Amado-Alves
2021-11-04 12:13 ` Dmitry A. Kazakov [this message]
2021-11-04 14:30 ` Luke A. Guest
2021-11-05 10:56 ` Marius Amado-Alves
2021-11-05 19:55 ` Simon Wright
2021-11-16 11:55 ` Marius Amado-Alves
2021-11-16 12:36 ` Dmitry A. Kazakov
2021-11-16 13:52 ` Marius Amado-Alves
2021-11-16 20:23 ` Randy Brukardt
2021-11-16 15:25 ` Luke A. Guest
2021-11-16 17:38 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox