From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!Lx7EM+81f32E0bqku+QpCA.user.46.165.242.75.POSTED!not-for-mail From: "Luke A. Guest" Newsgroups: comp.lang.ada Subject: Re: How to read in a (long) UTF-8 file, incrementally? Date: Tue, 16 Nov 2021 15:25:10 +0000 Organization: Aioe.org NNTP Server Message-ID: References: <1c6b151b-f017-496d-b381-ba08bef1bbb7n@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: gioia.aioe.org; logging-data="8162"; posting-host="Lx7EM+81f32E0bqku+QpCA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org"; User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-GB Xref: reader02.eternal-september.org comp.lang.ada:63127 List-Id: On 16/11/2021 11:55, Marius Amado-Alves wrote: > I'm worried. I need the concept of character, for proper text processing. For example, I want to reference characters in a text file by their position. Any tips/references on how to deal with combining characters, or any other perturbating feature of Unicode, greatly appreciated. > > (For me, a combining character is not a character, the combination is. Unicode agrees, right?) > You can't. The concept of character is dead, the new concept are grapheme clusters.