From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=0.0 required=3.0 tests=BAYES_40,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.6 X-Received: by 2002:a37:a8e:: with SMTP id 136mr6410349qkk.395.1637070780372; Tue, 16 Nov 2021 05:53:00 -0800 (PST) X-Received: by 2002:a25:2157:: with SMTP id h84mr8672151ybh.425.1637070780178; Tue, 16 Nov 2021 05:53:00 -0800 (PST) Path: eternal-september.org!reader02.eternal-september.org!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Tue, 16 Nov 2021 05:52:59 -0800 (PST) In-Reply-To: Injection-Info: google-groups.googlegroups.com; posting-host=94.60.27.164; posting-account=3cDqWgoAAAAZXc8D3pDqwa77IryJ2nnY NNTP-Posting-Host: 94.60.27.164 References: <1c6b151b-f017-496d-b381-ba08bef1bbb7n@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <88a83bf1-f1af-4252-bad1-cf86c3fa2eaen@googlegroups.com> Subject: Re: How to read in a (long) UTF-8 file, incrementally? From: Marius Amado-Alves Injection-Date: Tue, 16 Nov 2021 13:53:00 +0000 Content-Type: text/plain; charset="UTF-8" Xref: reader02.eternal-september.org comp.lang.ada:63126 List-Id: > Simply ignore or reject decomposed characters. Brilliant! > 1. Fixed font representation. Reduce everything to normal glyphs, use > string position corresponding to the beginning of an UTF-8 sequence. I am indeed resorting to byte position in UTF-8 files as the character position. Treating UTF-8 entities as the strings that they are:-) (Not dealing with fonts nor graphics yet, just plain text.) Thanks a lot.