comp.lang.ada
 help / color / mirror / Atom feed
* String Buffer
@ 2021-12-02 18:17 Kevin Chadwick
  2021-12-02 19:56 ` Jeffrey R.Carter
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Kevin Chadwick @ 2021-12-02 18:17 UTC (permalink / raw)


In this thread bounded and unbounded get quite a bashing.

"https://groups.google.com/g/comp.lang.ada/c/NINmFln-YS4/m/5De5DeUAAAAJ"

I thought bounded looked useful but then I realised that it allocates the max immediately anyway. It may be useful in constrained environments but then I do not use Strings in constrained environments.

Unbounded is said to be inefficient because it re-allocates.

In Go they have strings.Builder. I assume that is what Text_Buffer is aimed to be. (Actually Go seems to have copied a lot from Ada such as AWS API, unless they both are similar to something else like JAVA). 

Is Text_Buffer usable today with GCC 11?

strings.Builder in Go behaves similarly to unbounded in that it doubles the allocation as required but it only returns a string when needed and does not have string operations. You can Grow the builder to avoid re-allocations.

"https://pkg.go.dev/strings#Builder"

If possible without breaking all of the string functions (length and separate capacity) and Unbounded Strings had a Grow function, then wouldn't that relieve the efficiency issue?

In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation most of the alleged 10% of the time that it useful, but it would be nice to know of and use something akin to strings.Builder, preferably from the standard library, if it is available?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 18:17 String Buffer Kevin Chadwick
@ 2021-12-02 19:56 ` Jeffrey R.Carter
  2021-12-02 20:15   ` Dmitry A. Kazakov
  2021-12-03  8:31   ` ldries46
  2021-12-02 20:51 ` Simon Wright
  2021-12-03  8:11 ` Vadim Godunko
  2 siblings, 2 replies; 10+ messages in thread
From: Jeffrey R.Carter @ 2021-12-02 19:56 UTC (permalink / raw)


On 2021-12-02 19:17, Kevin Chadwick wrote:
> 
> Unbounded is said to be inefficient because it re-allocates.
> 
> In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation
"Efficiency" is only meaningful in the context of a project's quantitative 
timing requirements: an efficient implementation allows the project to meet 
those requirements, while an inefficient implementation does not.

In the absence of such context, any claims of "inefficiency" (and especially 
blanket claims such as "Unbounded_String is inefficient") simply demonstrate the 
speaker's incompetence.

For an example of context, a decade ago I worked on a soft-real-time system that 
made extensive use of Unbounded_String and had no problem meeting its timing 
requirements. Unbounded_String was clearly efficient for that project. Since 
most projects that would use Unbounded_String have even less restrictive timing 
requirements than that system, it seems likely that Unbounded_String will be 
efficient for them, too.

-- 
Jeff Carter
"After fifteen minutes I wanted to marry her, and
after a half hour I completely gave up the idea of
snatching her purse."
Take the Money and Run
136

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 19:56 ` Jeffrey R.Carter
@ 2021-12-02 20:15   ` Dmitry A. Kazakov
  2021-12-02 21:06     ` Jeffrey R.Carter
  2021-12-03  8:31   ` ldries46
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2021-12-02 20:15 UTC (permalink / raw)


On 2021-12-02 20:56, Jeffrey R.Carter wrote:
> On 2021-12-02 19:17, Kevin Chadwick wrote:
>>
>> Unbounded is said to be inefficient because it re-allocates.
>>
>> In any case avoiding unbounded strings is almost certainly in the 
>> realm of premature optimisation

I see it as a design question. Code that modifies a string as a whole is 
most likely broken. Unbounded_String property of varying length is not 
needed except for very special cases, like passing parameters where no 
result is allowed, e.g. in the task entries. Most, if not all such cases 
are Ada language design deficiencies. Normally there should be no need 
for Unbounded_String.

> In the absence of such context, any claims of "inefficiency" (and 
> especially blanket claims such as "Unbounded_String is inefficient") 
> simply demonstrate the speaker's incompetence.

It is true only to a certain degree. When comparing algorithms it is 
valid to claim inefficiency without any context if computational 
complexity sufficiently differs.

For example, an O(N) algorithm is unquestionably inefficient comparing 
to O(log N) one. I leave marginal cases of very small N.

Unbounded_String uses storage pool and that is a qualitative difference 
that requires in any context. For very large text buffers they are 
unusable either. Again, it is a design question, any time I see 
Unbounded_String alarm bells start ringing.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 18:17 String Buffer Kevin Chadwick
  2021-12-02 19:56 ` Jeffrey R.Carter
@ 2021-12-02 20:51 ` Simon Wright
  2021-12-03  8:11 ` Vadim Godunko
  2 siblings, 0 replies; 10+ messages in thread
From: Simon Wright @ 2021-12-02 20:51 UTC (permalink / raw)


Kevin Chadwick <kevc3no4@gmail.com> writes:

> In Go they have strings.Builder. I assume that is what Text_Buffer is
> aimed to be. (Actually Go seems to have copied a lot from Ada such as
> AWS API, unless they both are similar to something else like JAVA).
>
> Is Text_Buffer usable today with GCC 11?

If you mean Universal Text Buffers, no. There is
Ada.Strings.Text_Output, but it looks as though that's actually an
internal package to support T'Put_Image - so probably best avoided.

GCC 12 has universal text buffers.

But I don't see that Ada.Strings.Text_Buffers.Unbounded is going to be
any more or less efficient than Unbounded_Strings?

I seem to remember that AdaCore made considerable performance
improvements to Unbounded_Strings, for example copy-on-write sharing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 20:15   ` Dmitry A. Kazakov
@ 2021-12-02 21:06     ` Jeffrey R.Carter
  2021-12-02 21:45       ` Dmitry A. Kazakov
  2021-12-03  5:25       ` Randy Brukardt
  0 siblings, 2 replies; 10+ messages in thread
From: Jeffrey R.Carter @ 2021-12-02 21:06 UTC (permalink / raw)


On 2021-12-02 21:15, Dmitry A. Kazakov wrote:
> 
> It is true only to a certain degree. When comparing algorithms it is valid to 
> claim inefficiency without any context if computational complexity sufficiently 
> differs.
> 
> For example, an O(N) algorithm is unquestionably inefficient comparing to O(log 
> N) one. I leave marginal cases of very small N.

This is false. If you have the O(N) algorithm and it meets your requirements, 
then using it is more efficient than implementing the O(log N) algorithm. If you 
have both and the O(N) algorithm meets your requirements, the effort to get your 
data into a form where you can apply the O(log N) algorithm may still outweigh 
the time difference.

As for "very small N", I have worked on systems where linear search was 
efficient for sequences of 100,000 items.

The only thing you can meaningfully say about such algorithms is that one is 
O(N) and the other O(log N). Both have their uses.

Unbounded_String is needed far less than many people think, but there are 
application domains where it is much easier to achieve correctness and clarity 
with a variable-length string abstraction than without. Blanket statements about 
"efficiency" are dangerous for those working in such domains.

-- 
Jeff Carter
"After fifteen minutes I wanted to marry her, and
after a half hour I completely gave up the idea of
snatching her purse."
Take the Money and Run
136

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 21:06     ` Jeffrey R.Carter
@ 2021-12-02 21:45       ` Dmitry A. Kazakov
  2021-12-03  0:49         ` Kevin Chadwick
  2021-12-03  5:25       ` Randy Brukardt
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2021-12-02 21:45 UTC (permalink / raw)


On 2021-12-02 22:06, Jeffrey R.Carter wrote:
> On 2021-12-02 21:15, Dmitry A. Kazakov wrote:
>>
>> It is true only to a certain degree. When comparing algorithms it is 
>> valid to claim inefficiency without any context if computational 
>> complexity sufficiently differs.
>>
>> For example, an O(N) algorithm is unquestionably inefficient comparing 
>> to O(log N) one. I leave marginal cases of very small N.
> 
> This is false. If you have the O(N) algorithm and it meets your 
> requirements, then using it is more efficient than implementing the 
> O(log N) algorithm.

You have to show that it meets these requirements. The burden of proof 
is on you as you select an objectively inferior algorithm. Selecting a 
better algorithm is a safe choice under ill-defined conditions.

> Unbounded_String is needed far less than many people think, but there 
> are application domains where it is much easier to achieve correctness 
> and clarity with a variable-length string abstraction than without. 
> Blanket statements about "efficiency" are dangerous for those working in 
> such domains.

It is a good argument because introducing String requires more initial 
efforts in Ada than a thoughtless application Unbounded_String.

And note, that in most cases it is really thoughtless as the choice is 
made on the basis of how easy it is to declare a string component of a 
record type and then rewrite it.

Later on throughout the rest of the program the user of Unbounded_String 
will be consistently punished for that poor choice because normal string 
operations are very uncomfortable with Unbounded_String. But that 
happens later. Right now and here, let us save a couple of code lines.

So the simplest and most persuasive blanket statement is OK to dissuade 
people from poor choices.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 21:45       ` Dmitry A. Kazakov
@ 2021-12-03  0:49         ` Kevin Chadwick
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Chadwick @ 2021-12-03  0:49 UTC (permalink / raw)


> And note, that in most cases it is really thoughtless as the choice is 
> made on the basis of how easy it is to declare a string component of a 
> record type and then rewrite it. 
> 
> Later on throughout the rest of the program the user of Unbounded_String 
> will be consistently punished for that poor choice because normal string 
> operations are very uncomfortable with Unbounded_String. But that 
> happens later. Right now and here, let us save a couple of code lines. 
> 
> So the simplest and most persuasive blanket statement is OK to dissuade 
> people from poor choices.

I think I am glad that I am understanding this point about preferring strings early on in my Ada usage. Of course it is very easy to convert to a String as needed and whilst Randy mentioned yuck on "use" use, which I never use. I find package renames work well. I was thinking maybe you should never propagate an unbounded but then I am sure there will be the occasional scenario, where you expect the caller most likely wants to append and the code reads better without declares. As always...it depends..I guess?

With regard to a String buffer. It just occurred to me that the reason bounded is more efficient than unbounded is because, you could consider the bound max as a one time equivalent to strings.Builders grow(). In Go, I would try to only call grow, once anyway.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 21:06     ` Jeffrey R.Carter
  2021-12-02 21:45       ` Dmitry A. Kazakov
@ 2021-12-03  5:25       ` Randy Brukardt
  1 sibling, 0 replies; 10+ messages in thread
From: Randy Brukardt @ 2021-12-03  5:25 UTC (permalink / raw)


"Jeffrey R.Carter" <spam.jrcarter.not@spam.acm.org.not> wrote in message 
news:sobcg4$tc4$1@dont-email.me...
> On 2021-12-02 21:15, Dmitry A. Kazakov wrote:
>>
>> It is true only to a certain degree. When comparing algorithms it is 
>> valid to claim inefficiency without any context if computational 
>> complexity sufficiently differs.
>>
>> For example, an O(N) algorithm is unquestionably inefficient comparing to 
>> O(log N) one. I leave marginal cases of very small N.
>
> This is false. If you have the O(N) algorithm and it meets your 
> requirements, then using it is more efficient than implementing the O(log 
> N) algorithm. If you have both and the O(N) algorithm meets your 
> requirements, the effort to get your data into a form where you can apply 
> the O(log N) algorithm may still outweigh the time difference.
>
> As for "very small N", I have worked on systems where linear search was 
> efficient for sequences of 100,000 items.

And it isn't unusual that you *are* working mostly with small N. Case in 
point: The Janus/Ada parser table lookup routine. At one point, I tried to 
speed it up by implementing a binary search algorithm. (The table is 
logically a 2-dimensional array with the action being the element, but the 
majority of the elements are error markers [which have no information]. The 
usual storage is to omit those and store pairs of terminal/action. That's 
many times smaller for a language like Ada.) The binary search seemed to 
*slow down* parsing! I spent a week building instrumented versions, and 
determined that the average number of pairs was about 8, and the binary 
search (which necessarily was much more complex than a linear search) was 
slower until the number of pairs was about 20. So it only helped the largest 
states.

I ended up doing two other things instead: coding the lookup routine in 
assembler, which got the lookup loop down to about 5 instructions (it might 
be possible for a compiler to write that loop with fancy optimizations, but 
I've never seen one that could, and Janus/Ada certainly isn't able), and 
then I restructured the parse table to optimize lookups (terminals don't 
appear with a uniform frequency in code). The net result was that average 
lookup takes 1.6 iterations (of a 5 instruction loop); the binary version 
(which executed a lot more code) took somewhere around 3.5 on average.

The point is, simpler can often be better (especially as it is more likely 
to work the first time). And if it proves necessary to improve it, nothing 
really substitutes for instrumenting the actual application.

                                  Randy.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 18:17 String Buffer Kevin Chadwick
  2021-12-02 19:56 ` Jeffrey R.Carter
  2021-12-02 20:51 ` Simon Wright
@ 2021-12-03  8:11 ` Vadim Godunko
  2 siblings, 0 replies; 10+ messages in thread
From: Vadim Godunko @ 2021-12-03  8:11 UTC (permalink / raw)


On Thursday, December 2, 2021 at 9:17:38 PM UTC+3, kevc...@gmail.com wrote:
> In this thread bounded and unbounded get quite a bashing. 
> 
> "https://groups.google.com/g/comp.lang.ada/c/NINmFln-YS4/m/5De5DeUAAAAJ" 
> 
> I thought bounded looked useful but then I realised that it allocates the max immediately anyway. It may be useful in constrained environments but then I do not use Strings in constrained environments. 
> 
> Unbounded is said to be inefficient because it re-allocates. 
> 
> In Go they have strings.Builder. I assume that is what Text_Buffer is aimed to be. (Actually Go seems to have copied a lot from Ada such as AWS API, unless they both are similar to something else like JAVA). 
> 
> Is Text_Buffer usable today with GCC 11? 
> 
> strings.Builder in Go behaves similarly to unbounded in that it doubles the allocation as required but it only returns a string when needed and does not have string operations. You can Grow the builder to avoid re-allocations. 
> 
> "https://pkg.go.dev/strings#Builder" 
> 
> If possible without breaking all of the string functions (length and separate capacity) and Unbounded Strings had a Grow function, then wouldn't that relieve the efficiency issue? 
> 
> In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation most of the alleged 10% of the time that it useful, but it would be nice to know of and use something akin to strings.Builder, preferably from the standard library, if it is available?

For VSS.Strings.Virtual_String we used two kinds of optimization:

1. Short strings are stored without any memory allocation. It saves a lot of time. This is not very visible in multithread applications due to runtime cost of controlled objects; however it is very visible on manycore due to less amount of involved atomic operations.  How "short" string should be depends from underlying encoding, content and machine architecture, on modern 64bit systems when UTF-8 encoding is used it is 17 ASCII characters, or 8 Cyrillic characters, or 4 math characters. In context of Ada Language Server most of cases are such small strings.

2. It is possible to set capacity for the particular string object. Memory will be reallocated on next modification operation of the string object. This may be useful for large strings, when approximate size of the string is known and allows to save few allocate/move memory cycles on append. I don't know any real use cases of this feature right now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: String Buffer
  2021-12-02 19:56 ` Jeffrey R.Carter
  2021-12-02 20:15   ` Dmitry A. Kazakov
@ 2021-12-03  8:31   ` ldries46
  1 sibling, 0 replies; 10+ messages in thread
From: ldries46 @ 2021-12-03  8:31 UTC (permalink / raw)


Op 2-12-2021 om 20:56 schreef Jeffrey R.Carter:
> On 2021-12-02 19:17, Kevin Chadwick wrote:
>>
>> Unbounded is said to be inefficient because it re-allocates.
>>
>> In any case avoiding unbounded strings is almost certainly in the 
>> realm of premature optimisation
> "Efficiency" is only meaningful in the context of a project's 
> quantitative timing requirements: an efficient implementation allows 
> the project to meet those requirements, while an inefficient 
> implementation does not.
>
> In the absence of such context, any claims of "inefficiency" (and 
> especially blanket claims such as "Unbounded_String is inefficient") 
> simply demonstrate the speaker's incompetence.
>
> For an example of context, a decade ago I worked on a soft-real-time 
> system that made extensive use of Unbounded_String and had no problem 
> meeting its timing requirements. Unbounded_String was clearly 
> efficient for that project. Since most projects that would use 
> Unbounded_String have even less restrictive timing requirements than 
> that system, it seems likely that Unbounded_String will be efficient 
> for them, too.
>
I don't like the word "Efficiency". because what doe it mean. In the 
time that I started programming 1966/1967 computers were relative slow 
and small. You did not like any kind of strings because they toke lots 
of memory and lots of power. Gradually computers started to become 
faster and bigger (in memory, smaller in size) that meant that string 
operations also became faster and so use of string operations became 
more normal. In my opinion programs that need a lot of string 
manipulation do not need to be very fast because in general there the 
output is the limiting factor. In most other programs string 
manipulation is only a small part of the program still becoming faster 
along with new developments. My statement is that string manipulation 
does no have to become more efficient because other factors do limit the 
efficciency of the program much more.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-12-03  8:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 18:17 String Buffer Kevin Chadwick
2021-12-02 19:56 ` Jeffrey R.Carter
2021-12-02 20:15   ` Dmitry A. Kazakov
2021-12-02 21:06     ` Jeffrey R.Carter
2021-12-02 21:45       ` Dmitry A. Kazakov
2021-12-03  0:49         ` Kevin Chadwick
2021-12-03  5:25       ` Randy Brukardt
2021-12-03  8:31   ` ldries46
2021-12-02 20:51 ` Simon Wright
2021-12-03  8:11 ` Vadim Godunko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox