From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on ip-172-31-65-14.ec2.internal X-Spam-Level: X-Spam-Status: No, score=-3.2 required=3.0 tests=BAYES_00,FREEMAIL_FROM, NICE_REPLY_A,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 Path: eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Blady Newsgroups: comp.lang.ada Subject: Re: [ANN] Release of UXStrings 0.5.0 Date: Sat, 1 Jul 2023 16:41:37 +0200 Organization: A noiseless patient Spider Message-ID: References: <0162bf97-8a37-4244-a368-1bf7ae00077bn@googlegroups.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 1 Jul 2023 14:41:38 -0000 (UTC) Injection-Info: dont-email.me; posting-host="97c2e6457631dbb7a6cc16f6dfc9f936"; logging-data="3114414"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bMZDjdstSKAIX8e/DcTSx" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Cancel-Lock: sha1:wg+2DaW1qJj4D78+5Z7cHRbUtQM= Content-Language: fr, en-US In-Reply-To: <0162bf97-8a37-4244-a368-1bf7ae00077bn@googlegroups.com> Xref: news.eternal-september.org comp.lang.ada:65392 List-Id: Hello Vincent, Le 29/06/2023 à 10:49, Vincent D. a écrit : > Hello Pascal, > > Thank you for this contribution. Here are some comments: > - since UTFString is a class ("a tagged record type"), why don't you create an abstract root "UXString" and then derive specialized object types ? Like UTF_8_XString, UTF_16_XString, ASCII_XString, Win_1252_XString, Latin_XString, etc. Well, that's a possibility chosen in some other Ada Strings libraries. I've preferred that the API of legacy Ada "string" types to be closed to those of Ada library so that the adaptation would be easy. These are not intended to be used outside legacy code adaptation. Note that I've renamed them as character arrays rather than strings in order to accentuate the semantic difference. > - The default format to convert between different encodings should be UTF-8 as it is now ubiquitous. Conversions are between UXString and encodings, not between encodings. >> [...] moreover in the case of strings accentuated in French and strings containing emojis the process times are also improved (factor 7 to 8 by compared to UXStrings1 > - I find quite astonishing to have a factor 8 compared to UTF-8 encoding. Do you have an explanation ? This looks like a poor implementation because UTF-8 encoding is fast and allows direct manipulation in most cases. Maybe because random access is treated as sequential access for UTF-8 encoded strings but this again is poor implementation. You got it: "most cases". Apart from complex implementations, if you want to access a specific position you have to parse from the beginning of the UTF-8 data as UXStrings1 does. UXStrings2 always computes if the resulting data are all ASCII, so the access is then direct. UXStrings3 is internally like an Unicode array, so the access is direct. Best regards, Pascal.