From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a0c:874b:: with SMTP id 11mr19839872qvi.141.1560954854962; Wed, 19 Jun 2019 07:34:14 -0700 (PDT) X-Received: by 2002:a9d:76ce:: with SMTP id p14mr7280203otl.342.1560954854694; Wed, 19 Jun 2019 07:34:14 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!m24no3233475qtm.0!news-out.google.com!4ni150qtw.1!nntp.google.com!m24no3233469qtm.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 19 Jun 2019 07:34:14 -0700 (PDT) In-Reply-To: <6be082a3-b8aa-4e7d-825c-bd998894f077@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=47.185.234.171; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 47.185.234.171 References: <728c4668-8fa0-4a57-a502-2bf476fc3940@googlegroups.com> <800240ae-4c5f-424e-869f-2791e07a50d2@googlegroups.com> <6be082a3-b8aa-4e7d-825c-bd998894f077@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Latest suggestion for 202x From: Optikos Injection-Date: Wed, 19 Jun 2019 14:34:14 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader01.eternal-september.org comp.lang.ada:56682 Date: 2019-06-19T07:34:14-07:00 List-Id: On Wednesday, June 19, 2019 at 6:45:42 AM UTC-5, briot....@gmail.com wrote: > On Wednesday, June 19, 2019 at 6:14 AM UTC-5, Lucretia wrote: > > As is often the case, I think if someone's implementation of Unicode su= pport is correct enough, but > > small as well, it might be added to a compiler library first. This wou= ld help some people and help > > those responsible for the next official standard of Ada to vet the tech= niques of that implementation for > > the most future prove way to accomplish it in the standard. =20 > >=20 > > I was thinking about doing this but have decided on writing my own lang= uage instead.=20 > >=20 > > My was this: > > 1) Follow String and Unbounded_String, by having a static length Unicod= e_String which would be > > UTF8. Then have a number of iterators which act on the basic array: > >=20 > > a) The normal array iterator, built-in. > > b) Code-point iterator which returns, 32-bit code points. > > c) Grapheme cluster iterators. Luke, I assume that a so-called =E2=80=9Ccluster=E2=80=9D is the denormaliz= ation of diacritical marks and combining characters and the like, not some = other definition of =E2=80=9Ccluster=E2=80=9D than related to normalization= . If you are utilizing a definition of =E2=80=9Ccluster=E2=80=9D other tha= n related to multiple codepoints not-normalized into one codepoint, then pl= ease consider adding an iterator related to normalization and denormalizati= on (utilizing Unicode Consortium terms). > > d) Other iterators, i.e. words. > >=20 > > 2) Then the unbounded version which utilises the static stuff, same set= of iterators. > >=20 > > 3) The character database with access via unicode names and index numbe= rs. > >=20 > > 4) Unicode regular expression engine. >=20 > It seems to me that all of this can be implemented as a library, and does= n't need to be in the language > itself. The nice thing with libraries is that users can provide their own= implementation tailored to their > needs. >=20 > When I implemented GNATCOLL.Strings, I was careful to optionally support = unicode via various formal > parameters: internally, we store an array of codepoints. Encoding and dec= oding to utf-8, utf-16 and > others is orthogonal to string manipulation (and right now you would have= to use some other package for > the encoding). >=20 > The various iterators you suggest are nice, but can also be implemented o= n top of it (iterating by words, > sentence, paragraph,... is tricky, and irrelevant for most applications).= You would provide a > `Word_Iterator_Type`, with a GNAT `Iterable` aspect to use, and a functio= n that takes a string and return > that iterator. >=20 > Regexps are very difficult to implement for unicode, but I would suggest = a binding to an existing library > like pcre. >=20 > I would love to see more such libraries, and this is why I had started GN= ATCOLL initially. The more the > merrier, even when they compete with each other. Users will have more cho= ices. If this is part of the > language, it is harder to provide competitors. > And distributions can package the compiler along with a number of such li= bs to make things easier for > new comers to the language. >=20 > Project Alire (https://github.com/alire-project/alire) might be a nice wa= y to contribute such libs. > GNATCOLL has the same drawback as other libraries regularly mentioned her= e: it tends to be too > monolithic. Luke, I concur with everything that Emmanuel Briot said above. But if you = find any portion of your vision for a new language (especially the Unicode/= ISO10646-related portion) that needs more help from the language proper tha= n a library can provide, then please consider the following: 1) Fork FSF GNAT. 2) Extend existing FSF GNAT to overcome the obstacle(s) that you face for w= hich a library alone would not suffice, needing more help than a library ca= n provide. (e.g., Optional compiler-enforced rejection of all ISO8652-stan= dardized string types other than your modern complete-solution Unicode stri= ng type). 3) Consider having all your extensions as either a command-line option or a= pragma or an aspect, so that they can be opted into or opted out of (and t= o bra-ket your code so that your vision is clearly demarcated from AdaCore'= s vision & maintenance and ARG's standardization). In prior posts on c.l.a= in prior months, you have proposed language features that alter Ada's fund= amental syntax (e.g., {} braces instead of begin-end-esque blocks). As Ran= dy has correctly stated previously in c.l.a postings a month or 2 ago, synt= ax is a veneer in a compiler (my paraphrasing, not quotation of Randy), com= pared to the vastly more complex semantic core of a compiler. FSF GNAT cou= ld be taught to have 2 different syntaxes for Ada opted into or opted out o= f via command-line, pragma, or aspect: the current ISO-standardized one an= d Luke's alternative vision for a bulk of the same semantics. Then the sem= antic differences are Luke's course-corrections here and Luke's tweaks ther= e, isolated in the evolved forked FSF GNAT source code by conditionally tes= ting for the prior opt-in or opt-out of a prior command-line argument, prag= ma, or aspect. 4) In Luke's evolution of forked FSF GNAT, establish industrial practice th= at is supposed to be the primary input to ISO standardization. By creating= a nonstandard variant of Ada, industrial practice for extensions & variant= s is established, which can then establish strong standing according to ISO= rules (which the ARG must obey). Such divergent industrial practice is vi= ewed as stronger than proposed AIs that lack prior industrial practice, as = I understand ISO regulations. 5) Devise an Ada-esque name that is not Ada for these Ada-language extensio= ns. For example, perhaps Lucretia Ada, or L-Ada (or Lady Ada). That way t= here is a crystal clear differentiated name for your industrial practice ve= rsus all the ISO-standard-obeying Ada compilers. 6) Join the ARG. (And if you can establish citizenship in a nation or Crow= n possession or territory that lacks an ISO representative on ISO8652, join= the ISO8652 committee as a representative of that nation or Crown possessi= on or territory.) Lobby hard for accepting your established industrial pra= ctice as a variant of Ada that needs to be standardized as per ISO regulati= ons. 7) Fix bugs in FSF GNAT on your timeline (i.e., faster) than waiting on Ada= Core's timeline. Establish your industrial-practice evolved FSF GNAT as su= perior is some metric(s), as a better brand from which to obtain Ada. All of this together would be less effort in total (and quicker to market) = than writing a new compiler from scratch. Plus, it would more likely reach= ISO or ECMA standardization quicker, and hence reshape the programming wor= ld in your image more & quicker than writing a compiler from scratch