From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a37:e409:: with SMTP id y9mr11779988qkf.109.1560944741374; Wed, 19 Jun 2019 04:45:41 -0700 (PDT) X-Received: by 2002:a9d:51cf:: with SMTP id d15mr36590244oth.206.1560944741023; Wed, 19 Jun 2019 04:45:41 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!m24no2911798qtm.0!news-out.google.com!4ni131qtw.1!nntp.google.com!m24no2911797qtm.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 19 Jun 2019 04:45:40 -0700 (PDT) In-Reply-To: <800240ae-4c5f-424e-869f-2791e07a50d2@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=176.130.29.212; posting-account=6yLzewoAAABoisbSsCJH1SPMc9UrfXBH NNTP-Posting-Host: 176.130.29.212 References: <728c4668-8fa0-4a57-a502-2bf476fc3940@googlegroups.com> <800240ae-4c5f-424e-869f-2791e07a50d2@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <6be082a3-b8aa-4e7d-825c-bd998894f077@googlegroups.com> Subject: Re: Latest suggestion for 202x From: briot.emmanuel@gmail.com Injection-Date: Wed, 19 Jun 2019 11:45:41 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader01.eternal-september.org comp.lang.ada:56680 Date: 2019-06-19T04:45:40-07:00 List-Id: > 1) Follow String and Unbounded_String, by having a static length Unicode_= String which would be UTF8. Then have a number of iterators which act on th= e basic array: >=20 > a) The normal array iterator, built-in. > b) Code-point iterator which returns, 32-bit code points. > c) Grapheme cluster iterators. > d) Other iterators, i.e. words. >=20 > 2) Then the unbounded version which utilises the static stuff, same set o= f iterators. >=20 > 3) The character database with access via unicode names and index numbers= . >=20 > 4) Unicode regular expression engine. It seems to me that all of this can be implemented as a library, and doesn'= t need to be in the language itself. The nice thing with libraries is that = users can provide their own implementation tailored to their needs. When I implemented GNATCOLL.Strings, I was careful to optionally support un= icode via various formal parameters: internally, we store an array of codep= oints. Encoding and decoding to utf-8, utf-16 and others is orthogonal to s= tring manipulation (and right now you would have to use some other package = for the encoding). The various iterators you suggest are nice, but can also be implemented on = top of it (iterating by words, sentence, paragraph,... is tricky, and irrel= evant for most applications). You would provide a `Word_Iterator_Type`, wit= h a GNAT `Iterable` aspect to use, and a function that takes a string and r= eturn that iterator. Regexps are very difficult to implement for unicode, but I would suggest a = binding to an existing library like pcre. I would love to see more such libraries, and this is why I had started GNAT= COLL initially. The more the merrier, even when they compete with each othe= r. Users will have more choices. If this is part of the language, it is har= der to provide competitors. And distributions can package the compiler along with a number of such libs= to make things easier for new comers to the language. Project Alire (https://github.com/alire-project/alire) might be a nice way = to contribute such libs. GNATCOLL has the same drawback as other libraries = regularly mentioned here: it tends to be too monolithic.