comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: GNOGA - RFC UXStrings package.
Date: Tue, 12 May 2020 11:35:53 +0200
Date: 2020-05-12T11:35:53+02:00	[thread overview]
Message-ID: <r9dqlo$1tkv$1@gioia.aioe.org> (raw)
In-Reply-To: r9dlrq$1im8$1@gioia.aioe.org

On 2020-05-12 10:13, Blady wrote:

> I've checked Simple Components, it might be completed with some parsing 
> functions in order to fulfill all Gnoga needs.

You are welcome to ask.

I am not sure what kind of parsing you mean, most of nightmarish legacy 
encodings are supported already, e.g.

    http://www.dmitry-kazakov.de/ada/strings_edit.htm#7.10

> But I think that UTF-8 
> (or UTF-16) internal representation would make too much penalties in 
> term of execution time which is critical for Gnoga as server.

Well, whatever minor overhead UTF-8 may have it is in order of many 
magnitude less than Unbounded_String or what you do in your code for 
UXStrings would inflict.

> That's why I would like to experiment some data structure with smart 
> character size (1, 2 or 4 bytes) and smart string length (either static 
> or dynamic).

When I am concerned about performance:

1. I make all content in UTF-8. I convert anything to UTF-8 first, if I 
get it from outside.

2. I never use dynamically allocated strings in any form, never in the 
standard memory pool. If I really, really need a pool, I use a custom 
arena pool and allocate a String there. As a nice side effect the server 
will be resilient all sorts of something-is-too-large attacks, no space 
in the arena, drop connection, bye.

3. I never copy anything. Thus, again, never Unbounded_String, only 
String and its slices.

4. I never tokenize anything. I walk down the string in a single pass, 
notice start/stop indices of a token, pass a string slice down to a 
semantic callback, better, pass it straight to a look-up table. No 
string copies.

5. I never use Wide or Wide_Wide. They are mess and require conversions 
=> copying => a lot of resources.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

      reply	other threads:[~2020-05-12  9:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11  8:59 GNOGA - RFC UXStrings package Blady
2020-05-11 17:44 ` Jere
2020-05-12  8:13   ` Blady
2020-05-12  9:35     ` Dmitry A. Kazakov [this message]
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox