From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=0.8 required=3.0 tests=BAYES_50,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.6 X-Received: by 2002:a05:620a:2589:: with SMTP id x9mr5649400qko.454.1637063705923; Tue, 16 Nov 2021 03:55:05 -0800 (PST) X-Received: by 2002:a25:183:: with SMTP id 125mr7348029ybb.143.1637063705787; Tue, 16 Nov 2021 03:55:05 -0800 (PST) Path: eternal-september.org!reader02.eternal-september.org!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Tue, 16 Nov 2021 03:55:05 -0800 (PST) In-Reply-To: Injection-Info: google-groups.googlegroups.com; posting-host=94.60.27.164; posting-account=3cDqWgoAAAAZXc8D3pDqwa77IryJ2nnY NNTP-Posting-Host: 94.60.27.164 References: <1c6b151b-f017-496d-b381-ba08bef1bbb7n@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: How to read in a (long) UTF-8 file, incrementally? From: Marius Amado-Alves Injection-Date: Tue, 16 Nov 2021 11:55:05 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader02.eternal-september.org comp.lang.ada:63124 List-Id: I'm worried. I need the concept of character, for proper text processing. F= or example, I want to reference characters in a text file by their position= . Any tips/references on how to deal with combining characters, or any othe= r perturbating feature of Unicode, greatly appreciated. (For me, a combining character is not a character, the combination is. Unic= ode agrees, right?)