From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!x6YkKUCkj2qHLwbKnVEeag.user.46.165.242.91.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: How to read in a (long) UTF-8 file, incrementally? Date: Tue, 2 Nov 2021 19:17:58 +0100 Organization: Aioe.org NNTP Server Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: gioia.aioe.org; logging-data="51966"; posting-host="x6YkKUCkj2qHLwbKnVEeag.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org"; User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-US Xref: reader02.eternal-september.org comp.lang.ada:63090 List-Id: On 2021-11-02 18:42, Marius Amado-Alves wrote: > So it should be possible to read a single UTF-8 character, right? Which might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? Or directly to Wide_Wide. Are there such functions? You simply read a stream of Characters into a buffer. Never ever use Wide or Wide_Wide, they are useless. Inside the buffer you must have 4 Characters ahead unless the file end is reached. Usually you read until some separator like line end. Then you call this: http://www.dmitry-kazakov.de/ada/strings_edit.htm#Strings_Edit.UTF8.Get That will give you a code point and advance the cursor to the next UTF-8 character. However, normally, no text processing task needs that. Whatever you want to do, you can accomplish it using normal String operations and normal String-based data structures like maps and tables. You need not to care about any UTF-8 character boundaries ever. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de