comp.lang.ada
 help / color / mirror / Atom feed
* How to read in a (long) UTF-8 file, incrementally?
@ 2021-11-02 17:42 Marius Amado-Alves
  2021-11-02 18:17 ` Dmitry A. Kazakov
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Marius Amado-Alves @ 2021-11-02 17:42 UTC (permalink / raw)


As I understand it, to work with Unicode text inside the program it is better to use the Wide_Wide (UTF-32) variants of everything.

Now, Unicode files usually are in UTF-8.

One solution is to read the entire file in one gulp to a String, then convert to Wide_Wide. This solution is not memory efficient, and it may not be possible in some tasks e.g. real time processing of lines of text.

If the files has lines, I guess we can also work line by line (Text_IO). But the text may not have lines. Can be a long XML object, for example.

So it should be possible to read a single UTF-8 character, right? Which might be 1, 2, 3, or 4 bytes long, so it must be read into a String, right? Or directly to Wide_Wide. Are there such functions?

Thanks a lot.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-16 20:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-02 17:42 How to read in a (long) UTF-8 file, incrementally? Marius Amado-Alves
2021-11-02 18:17 ` Dmitry A. Kazakov
2021-11-03  7:43 ` Vadim Godunko
2021-11-03  8:48 ` Luke A. Guest
2021-11-04 11:43   ` Marius Amado-Alves
2021-11-04 12:13     ` Dmitry A. Kazakov
2021-11-04 14:30     ` Luke A. Guest
2021-11-05 10:56       ` Marius Amado-Alves
2021-11-05 19:55         ` Simon Wright
2021-11-16 11:55           ` Marius Amado-Alves
2021-11-16 12:36             ` Dmitry A. Kazakov
2021-11-16 13:52               ` Marius Amado-Alves
2021-11-16 20:23               ` Randy Brukardt
2021-11-16 15:25             ` Luke A. Guest
2021-11-16 17:38             ` Vadim Godunko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox