From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Tue, 2 Nov 2021 19:17:58 +0100 [thread overview]
Message-ID: <slrvcr$1inu$1@gioia.aioe.org> (raw)
In-Reply-To: d1c5ba75-bc0a-4e7b-a2df-394bc710cbcen@googlegroups.com
On 2021-11-02 18:42, Marius Amado-Alves wrote:
> So it should be possible to read a single UTF-8 character, right? Which might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? Or directly to Wide_Wide. Are there such functions?
You simply read a stream of Characters into a buffer. Never ever use
Wide or Wide_Wide, they are useless. Inside the buffer you must have 4
Characters ahead unless the file end is reached. Usually you read until
some separator like line end.
Then you call this:
http://www.dmitry-kazakov.de/ada/strings_edit.htm#Strings_Edit.UTF8.Get
That will give you a code point and advance the cursor to the next UTF-8
character.
However, normally, no text processing task needs that. Whatever you want
to do, you can accomplish it using normal String operations and normal
String-based data structures like maps and tables. You need not to care
about any UTF-8 character boundaries ever.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2021-11-02 18:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov [this message]
2021-11-03 7:43 ` Vadim Godunko
2021-11-03 8:48 ` Luke A. Guest
2021-11-04 11:43 ` Marius Amado-Alves
2021-11-04 12:13 ` Dmitry A. Kazakov
2021-11-04 14:30 ` Luke A. Guest
2021-11-05 10:56 ` Marius Amado-Alves
2021-11-05 19:55 ` Simon Wright
2021-11-16 11:55 ` Marius Amado-Alves
2021-11-16 12:36 ` Dmitry A. Kazakov
2021-11-16 13:52 ` Marius Amado-Alves
2021-11-16 20:23 ` Randy Brukardt
2021-11-16 15:25 ` Luke A. Guest
2021-11-16 17:38 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox