From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED.qqSiDWjm5OUIwRKiwQ3Lzw.user.gioia.aioe.org!not-for-mail From: Blady Newsgroups: comp.lang.ada Subject: Re: GNOGA - RFC UXStrings package. Date: Tue, 12 May 2020 10:13:46 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <4065382c-ef1f-47c4-a0ea-74d736536447@googlegroups.com> NNTP-Posting-Host: qqSiDWjm5OUIwRKiwQ3Lzw.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-US Xref: reader01.eternal-september.org comp.lang.ada:58654 Date: 2020-05-12T10:13:46+02:00 List-Id: Le 11/05/2020 à 19:44, Jere a écrit : > On Monday, May 11, 2020 at 4:59:33 AM UTC-4, Blady wrote: >> Hello, >> >> Gnoga (https://sourceforge.net/p/gnoga) internal character strings >> implementation is based on both Ada types String and Unbounded_String. >> The native Ada String encoding is Latin-1 whereas transactions with the >> Javascript part are in UTF-8 encoding. >> >> Some drawbacks come up, for instance, with internationalization of >> programs (see Localize Gnoga demo >> https://sourceforge.net/p/gnoga/code/ci/dev_1.6/tree/demo/localize): >> >> • several conversions between String and Unbounded_String objects >> • it isn't usable out of Latin-1 character set, characters out of >> Latin-1 set are blanked >> • continuous conversions between Latin-1 and UTF-8, each sent and >> received transaction between Ada and Javascript parts >> >> Two ways of improvement: native dynamic length handling and Unicode support. >> ... >> >> Feel free to send feedback about UXStrings >> (https://sourceforge.net/p/gnoga/code/ci/dev_1.6/tree/deps/uxstrings/src/uxstrings.ads) >> specification source code on the forum or on Gnoga mailing list >> (https://sourceforge.net/p/gnoga/mailman/gnoga-list). >> >> Thanks, Pascal. >> https://blady.pagesperso-orange.fr > > I would be hesitant to use gnatcoll directly. The one nice > thing about Gnoga is that (at least previously), it tried not > to fully rely on GNAT. Using gnatcoll would be a step in the > wrong direction in that respect. From personal experience, > I used Gnoga to create a program for a GPU module using a variant > of linux. If Gnoga suddenly started requiring gnatcoll, then > that program would no longer work as I was unable to get most > of Adacore's additional libraries to even compile in that variant > of linux. This included gnatcoll. > > Additionally, the library Gnoga already leverages (Simple > Components [1]) already has some UTF-8 functionality you > might be able to leverage. You might check that out. > > One other thing, if you are interested, you might send a > message to a fellow who goes by Entomy on github. His > area of expertise is text parsing, localization, etc. and > he is experienced in Ada. He might have some libraries > or tools you could leverage. You could probably catch > him at his twitter pretty easily: > https://twitter.com/pkell7 > > > [1]: http://www.dmitry-kazakov.de/ada/strings_edit.htm#Strings_Edit.UTF8.Maps.Unicode_Set > Hello Jere, I agree that GNATColl dependency would be too heavy for Gnoga. At least GNATColl might be an inspiration for an implementation goal. I've checked Simple Components, it might be completed with some parsing functions in order to fulfill all Gnoga needs. But I think that UTF-8 (or UTF-16) internal representation would make too much penalties in term of execution time which is critical for Gnoga as server. That's why I would like to experiment some data structure with smart character size (1, 2 or 4 bytes) and smart string length (either static or dynamic). Thanks for the Entomy pointer. Regards, Pascal.