From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Paul Rubin <no.email@nospam.invalid>
Newsgroups: comp.lang.ada
Subject: Re: How to get Ada to ?cross the chasm??
Date: Tue, 08 May 2018 22:02:32 -0700
Organization: A noiseless patient Spider
Message-ID: <87h8nhwhef.fsf@nightsong.com>
References: <1c73f159-eae4-4ae7-a348-03964b007197@googlegroups.com>
	<cd866ee8-ad4d-489e-a38b-d8e5f5d4af00@googlegroups.com>
	<87efiuope8.fsf@nightsong.com> <pcdagp$7pn$1@franka.jacob-sparre.dk>
	<87lgd1heva.fsf@nightsong.com> <pcg54t$gul$1@franka.jacob-sparre.dk>
	<87zi1gz3kl.fsf@nightsong.com> <pciqcd$ou1$1@franka.jacob-sparre.dk>
	<878t8x7k1j.fsf@nightsong.com> <fl7t0uF410rU1@mid.individual.net>
	<pcmjao$1soj$1@gioia.aioe.org> <fl8lgcF9dqgU1@mid.individual.net>
	<pcnem2$19mn$1@gioia.aioe.org> <fl938cFchk8U1@mid.individual.net>
	<pcnqgm$1tb4$1@gioia.aioe.org> <87k1sg2qux.fsf@nightsong.com>
	<pcov1b$1cvt$1@gioia.aioe.org> <87h8njmk4r.fsf@nightsong.com>
	<pcq81a$1n85$1@gioia.aioe.org> <87po27fbv9.fsf@nightsong.com>
	<pcrjv1$1jsj$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org;
 posting-host="a15f8e13c869a4517dec92123a650d01";
	logging-data="32080"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX1/tJ7YRIwP4AqYVwn611rg9"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
Cancel-Lock: sha1:1tmDX8gxNcoZ7SiW9mjg78bALXM=
	sha1:hXjnDUqIrPYuZ6i9c+OkMD0kgcE=
Xref: reader02.eternal-september.org comp.lang.ada:52143
Date: 2018-05-08T22:02:32-07:00
List-Id: <comp.lang.ada>

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> If referenced object counts are > 1 they are not going to be
> finalized. So the point stands, no locking is ever required upon
> finalization.

I'm still perplexed by this.  You have to decrement all those refcounts.
While this is happening, other threads may also be messing with them.
You need locks (at least in the form of atomic instructions which work
by hardware locks, i.e. that are much slower than normal instructions)
to prevent data races.

When you decrement the counts, some of them might reach zero so their
objects need freeing (and traversal).  That too can be arbitrarily
complicated.  Niklas mentions you can possibly do that incrementally,
but that sounds complicated and GC algorithms can also do that.  So I
don't see wy not use one.

> For large objects there is usually additional knowledge about their
> allocation order which is not available for the compiler

Meh, maybe, though it's unclear whether this will help enough to care
about in practice, so it would have to be justified by concrete evidence
on a case by case basis.

> Not at all. This is an old discussion about up-front analysis and
> design vs. "spinal-cord-programming". Ada was designed for people who
> do not consider investing their time in software design useless.

But in this case it sounds like you're burning the effort on solving a
problem that someone else already solved.  Like if your application has
a matrix and you need its inverse, you can call a general purpose matrix
solver from a math library, or you can write a special one that uses
some property of your application's matrix.

Will the special solver be of practical benefit even if it's somehow
objectively better (like saving a cpu-millisecond when it's called once
a week)?  Very possibly not.  Even on your specific problem, will it
*actually* beat the general purpose solver that was optimized for years
by specialist numerics geeks?  Again very possibly not.  I believe the
default presumption is to prefer the general purpose one.  For one thing
you don't have to redesign it when your application changes and the
matrix now has different properties.

GC is like the general purpose math library, highly tuned and optimized,
probably adequate for specific problems even when you can find a way to
beat it, which you might not be able to.  Why spend your time on the
one-off solution before encountering concrete problems with the general
one?

> I don't want objects moving in the memory that is for sure. It is a
> huge distributed performance hit

There's enough experience with these GC's that a claim of a significant
performance hit is only credible if it's backed by profile data showing
the GC is taking too much time for that app.  The usual advice for Java
is configure the GC so it's using around 10% of the cpu cycles (assuming
you have enough memory).  Even if a non-defragmenting scheme uses 0% of
the cycles, you're likely to lose more than 10% to cache misses that a
compacting scheme prevents.

So for a particular program, this question can only be definitely
answered with benchmarks, but the general pattern of observations over
lots of different programs is that the GC tends to win.

> On Intel it could be fetch-and-add. Anything a modern processor has is
> in order of magnitude faster than any GC implementation,

Ok, it looks like there's LOCK XADD, though that's less powerful than
LOCK CMPXCHG.  You need the LOCK prefix either way, but it does look to
me that on recent x86's, LOCK is less expensive than I remembered, so
maybe you're onto something.  LOCK CMPXCHG on Skylake-X has 10 cycle
latency vs 9 cycles for LOCK XADD according to p. 252 of
http://agner.org/optimize/instruction_tables.pdf .  That's actually
pretty good, I thought it was much worse.  But ordinary ADD is 1 cycle
and can often be overlapped with other instructions.

> No, that is irrelevant. Determinism is a property of the system and
> not of its inputs. Consider it a black box. You feed the inputs and
> get the outputs. How many little threads are in the box does not
> matter.

The system includes the program and its input sources.  My usual picture
of a concurrent system is a network server connected to 1000s of clients
over the internet.  So the internet and its random delays are part of
the system.  It can't be seen as deterministic in any useful way.