From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Paul Rubin Newsgroups: comp.lang.ada Subject: Re: How to get Ada to ?cross the chasm?? Date: Tue, 08 May 2018 22:02:32 -0700 Organization: A noiseless patient Spider Message-ID: <87h8nhwhef.fsf@nightsong.com> References: <1c73f159-eae4-4ae7-a348-03964b007197@googlegroups.com> <87efiuope8.fsf@nightsong.com> <87lgd1heva.fsf@nightsong.com> <87zi1gz3kl.fsf@nightsong.com> <878t8x7k1j.fsf@nightsong.com> <87k1sg2qux.fsf@nightsong.com> <87h8njmk4r.fsf@nightsong.com> <87po27fbv9.fsf@nightsong.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: reader02.eternal-september.org; posting-host="a15f8e13c869a4517dec92123a650d01"; logging-data="32080"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/tJ7YRIwP4AqYVwn611rg9" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) Cancel-Lock: sha1:1tmDX8gxNcoZ7SiW9mjg78bALXM= sha1:hXjnDUqIrPYuZ6i9c+OkMD0kgcE= Xref: reader02.eternal-september.org comp.lang.ada:52143 Date: 2018-05-08T22:02:32-07:00 List-Id: "Dmitry A. Kazakov" writes: > If referenced object counts are > 1 they are not going to be > finalized. So the point stands, no locking is ever required upon > finalization. I'm still perplexed by this. You have to decrement all those refcounts. While this is happening, other threads may also be messing with them. You need locks (at least in the form of atomic instructions which work by hardware locks, i.e. that are much slower than normal instructions) to prevent data races. When you decrement the counts, some of them might reach zero so their objects need freeing (and traversal). That too can be arbitrarily complicated. Niklas mentions you can possibly do that incrementally, but that sounds complicated and GC algorithms can also do that. So I don't see wy not use one. > For large objects there is usually additional knowledge about their > allocation order which is not available for the compiler Meh, maybe, though it's unclear whether this will help enough to care about in practice, so it would have to be justified by concrete evidence on a case by case basis. > Not at all. This is an old discussion about up-front analysis and > design vs. "spinal-cord-programming". Ada was designed for people who > do not consider investing their time in software design useless. But in this case it sounds like you're burning the effort on solving a problem that someone else already solved. Like if your application has a matrix and you need its inverse, you can call a general purpose matrix solver from a math library, or you can write a special one that uses some property of your application's matrix. Will the special solver be of practical benefit even if it's somehow objectively better (like saving a cpu-millisecond when it's called once a week)? Very possibly not. Even on your specific problem, will it *actually* beat the general purpose solver that was optimized for years by specialist numerics geeks? Again very possibly not. I believe the default presumption is to prefer the general purpose one. For one thing you don't have to redesign it when your application changes and the matrix now has different properties. GC is like the general purpose math library, highly tuned and optimized, probably adequate for specific problems even when you can find a way to beat it, which you might not be able to. Why spend your time on the one-off solution before encountering concrete problems with the general one? > I don't want objects moving in the memory that is for sure. It is a > huge distributed performance hit There's enough experience with these GC's that a claim of a significant performance hit is only credible if it's backed by profile data showing the GC is taking too much time for that app. The usual advice for Java is configure the GC so it's using around 10% of the cpu cycles (assuming you have enough memory). Even if a non-defragmenting scheme uses 0% of the cycles, you're likely to lose more than 10% to cache misses that a compacting scheme prevents. So for a particular program, this question can only be definitely answered with benchmarks, but the general pattern of observations over lots of different programs is that the GC tends to win. > On Intel it could be fetch-and-add. Anything a modern processor has is > in order of magnitude faster than any GC implementation, Ok, it looks like there's LOCK XADD, though that's less powerful than LOCK CMPXCHG. You need the LOCK prefix either way, but it does look to me that on recent x86's, LOCK is less expensive than I remembered, so maybe you're onto something. LOCK CMPXCHG on Skylake-X has 10 cycle latency vs 9 cycles for LOCK XADD according to p. 252 of http://agner.org/optimize/instruction_tables.pdf . That's actually pretty good, I thought it was much worse. But ordinary ADD is 1 cycle and can often be overlapped with other instructions. > No, that is irrelevant. Determinism is a property of the system and > not of its inputs. Consider it a black box. You feed the inputs and > get the outputs. How many little threads are in the box does not > matter. The system includes the program and its input sources. My usual picture of a concurrent system is a network server connected to 1000s of clients over the internet. So the internet and its random delays are part of the system. It can't be seen as deterministic in any useful way.