comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: How to read in a (long) UTF-8 file, incrementally?
Date: Tue, 16 Nov 2021 13:36:00 +0100	[thread overview]
Message-ID: <sn08jf$pkq$1@gioia.aioe.org> (raw)
In-Reply-To: f0d17e38-58c7-4914-ab9c-8632cecc8215n@googlegroups.com

On 2021-11-16 12:55, Marius Amado-Alves wrote:
> I'm worried. I need the concept of character, for proper text processing.

Simply ignore or reject decomposed characters.

> For example, I want to reference characters in a text file by their position.

That is no problem either. There are two alternatives:

1. Fixed font representation. Reduce everything to normal glyphs, use 
string position corresponding to the beginning of an UTF-8 sequence.

2. Proportional font. Use a graphical user interface like GTK. The GTK 
text buffer has a data type (iterator) to indicate a place in the 
buffer, e.g. when a selection happens. These iterators are consistent 
with the glyphs as rendered on the screen and you can convert between 
them and string position.

> (For me, a combining character is not a character, the combination is. Unicode agrees, right?)

No, Unicode disagrees, e.g. É can be composed from E and acute accent. 
But it is advised just to ignore all this nonsense and consider:

    code point = character

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  reply	other threads:[~2021-11-16 12:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03  7:43 ` Vadim Godunko
2021-11-03  8:48 ` Luke A. Guest
2021-11-04 11:43   ` Marius Amado-Alves
2021-11-04 12:13     ` Dmitry A. Kazakov
2021-11-04 14:30     ` Luke A. Guest
2021-11-05 10:56       ` Marius Amado-Alves
2021-11-05 19:55         ` Simon Wright
2021-11-16 11:55           ` Marius Amado-Alves
2021-11-16 12:36             ` Dmitry A. Kazakov [this message]
2021-11-16 13:52               ` Marius Amado-Alves
2021-11-16 20:23               ` Randy Brukardt
2021-11-16 15:25             ` Luke A. Guest
2021-11-16 17:38             ` Vadim Godunko
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox