From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED.5toSSCP5H8WhQPVIfFrwuA.user.gioia.aioe.org!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: GNOGA - RFC UXStrings package. Date: Tue, 12 May 2020 11:35:53 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <4065382c-ef1f-47c4-a0ea-74d736536447@googlegroups.com> NNTP-Posting-Host: 5toSSCP5H8WhQPVIfFrwuA.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 X-Notice: Filtered by postfilter v. 0.9.2 Content-Language: en-US Xref: reader01.eternal-september.org comp.lang.ada:58655 Date: 2020-05-12T11:35:53+02:00 List-Id: On 2020-05-12 10:13, Blady wrote: > I've checked Simple Components, it might be completed with some parsing > functions in order to fulfill all Gnoga needs. You are welcome to ask. I am not sure what kind of parsing you mean, most of nightmarish legacy encodings are supported already, e.g. http://www.dmitry-kazakov.de/ada/strings_edit.htm#7.10 > But I think that UTF-8 > (or UTF-16) internal representation would make too much penalties in > term of execution time which is critical for Gnoga as server. Well, whatever minor overhead UTF-8 may have it is in order of many magnitude less than Unbounded_String or what you do in your code for UXStrings would inflict. > That's why I would like to experiment some data structure with smart > character size (1, 2 or 4 bytes) and smart string length (either static > or dynamic). When I am concerned about performance: 1. I make all content in UTF-8. I convert anything to UTF-8 first, if I get it from outside. 2. I never use dynamically allocated strings in any form, never in the standard memory pool. If I really, really need a pool, I use a custom arena pool and allocate a String there. As a nice side effect the server will be resilient all sorts of something-is-too-large attacks, no space in the arena, drop connection, bye. 3. I never copy anything. Thus, again, never Unbounded_String, only String and its slices. 4. I never tokenize anything. I walk down the string in a single pass, notice start/stop indices of a token, pass a string slice down to a semantic callback, better, pass it straight to a look-up table. No string copies. 5. I never use Wide or Wide_Wide. They are mess and require conversions => copying => a lot of resources. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de