From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-74-118.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!reader02.eternal-september.org!aioe.org!8nKyDL3nVTTIdBB8axZhRA.user.46.165.242.75.POSTED!not-for-mail From: Simon Wright Newsgroups: comp.lang.ada Subject: Re: How to read in a (long) UTF-8 file, incrementally? Date: Fri, 05 Nov 2021 19:55:33 +0000 Organization: Aioe.org NNTP Server Message-ID: References: <1c6b151b-f017-496d-b381-ba08bef1bbb7n@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: gioia.aioe.org; logging-data="25053"; posting-host="8nKyDL3nVTTIdBB8axZhRA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org"; User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (darwin) X-Notice: Filtered by postfilter v. 0.9.2 Cancel-Lock: sha1:RNeOMK7AlyGDQPG63yTS5UlEawU= Xref: reader02.eternal-september.org comp.lang.ada:63100 List-Id: Marius Amado-Alves writes: >> Characters no longer exist as a thing as one can even be represented as >> multiple utf-32 code points. > > You're alluding to combining characters? Fun & games on macOS[1]: > $ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake -c p*.ads > gcc -c páck3.ads > páck3.ads:1:10: warning: file name does not match unit name, should be "páck3.ads" > > The reason for this apparently-bizarre message is that macOS takes the > composed form (lowercase a acute) and converts it under the hood to > what HFS+ insists on, the fully decomposed form (lowercase a, > combining acute); thus the names are actually different even though > they _look_ the same. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114#c1