Re: ADA Performance

comp.lang.ada
 help / color / mirror / Atom feed

* Re: ADA Performance
@ 1993-04-29 16:12 Robert Dewar
  0 siblings, 0 replies; 8+ messages in thread
From: Robert Dewar @ 1993-04-29 16:12 UTC (permalink / raw)


The subject line of this thread is seriously misleading. The post had nothing
to do with Ada performance per se, but rather with the performance of a
particular Ada compiler compared to a particular Fortran compiler. Well no
one would claim that Ada (or Fortran for that matter) has a secret guarantee
that it is impossible to write slow compilers! Indeed we all know counter
examples.


But that should not lead one to believe that somehow there is something
inherently wrong with Ada performance. Certainly for example in GCC, if
you write a program in C, and then you write the same program in Ada, using
GNAT, you will get exactly the same optimized code because it's going through
the same back end. At least that's true if you use roughly the same level of
abstraction. Clearly introducing extra levels of non-inlined calls will slow
things down, but many Ada semantic features like packages, overloading,
generic instantiations, won't affect the code in any way, sinc they are 
essentially front end abstractions that are lost long before the code 
generator is reached. Similarly if you have checks on, you can expect a
(not terribly big) degradation -- FORTRAN looping type code is actually a
best case for minimal impact of checks in a decent Ada compiler.

That being said, comparing performance on type Complex between Fortran and
Ada 83 is definitely a worst case comparison from Ada's point of view. That's
because Fortran has the Complex abstraction built in, and it is always easier
to get a built-in abstraction working at high efficiency. Ada 9X recognizes
this particular problem by proividing Complex as a predefined abstraction in
the numerics annex. This doesn't guarantee efficient implementation, but in
practice it will help.

What would be useful from the original poster, is not so much the names of
the products involved, but rather an analysis of where the inefficiency is
coming from. Random mucking with the sources won't give this information.
What is really needed is to look at and analyze the resulting object code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-04-29 17:08 Jeffrey M. Creem x5700
  0 siblings, 0 replies; 8+ messages in thread
From: Jeffrey M. Creem x5700 @ 1993-04-29 17:08 UTC (permalink / raw)


>         So the other day he asks how things are going and I
>         squirm as I try to explain why my Ada Compiler generates
>         code 3 to 8 times slower than FORTRAN!!!
>              ================================

>be more maintainable. In a just world ADA should generate code as
>optimized as FORTRAN when all exceptions are suppressed. However if 
>if the code is more than 50% slower I start having problems and I
>go balistic at 2 times.


  While I should reserve judgement until I have run these test myself
there are a few things that need to be said:

1) I can only assume that you chose the correct compiler switches to
   optimize and suppress all range checks since you did not mention
   the compiler or switches or even include pragma (suppress_all) in
   the source code.  On our compiler VADS (by verdix) on both a Sun
   and motorola 68040 range check suppression and full optimization   
   along with a few pragma inlines usually cuts execution time in half.

2) Size of the floats. Since you didn't explicitly rep spec the size of
   your Ada floats but used "long_float", I can make no assumption about
   the size that you got. On Some compilers float is 64 bits (and short
   float is 32 bits) so long float might be some awful software emulated
   ugly float or may be what you expect.

3) Agregate assignment.  Not all Ada compilers do this well.  VADS Ada for 
   instance creates the aggregate on the stack and then copies it into
   the destination. On a short structure like a complex this might not
   be too bad but put that in a loop and bad you can kiss performance
   good-bye. Although it is obviously preferable to use the aggregate  
   assignment, with some compilers it is not a good thing.

4) Timing: In some compilers calendar.clock is not the best way to
   time things. It can often be slow. I think it is ok here since the  
   amount of work done in the complex loop is large compared to
   any amount of overhead I could think of in Calendar.Clock.

5) Object Code: A good thing to do might be to look at the object
   code that is generated by both compilers for a small subset of the
   program. It is not fun and not something that one normally likes to
   do but it may give you some hint as to why this is happening. It
   is possible that the Ada compiler you have is not the most recent
   for the processor you are on and is not using the best instructions
   for the given processor (Ie not using a co-processor).

  All this said in the end the Ada could very well be much slower but
it shouldn't have to be that way. If you are going to ask for timings
on things you should be sure that the code you put out will generate
what you expect on a wide variety of platforms.  (ie the comment about
the short_float/long_float ...)

Jeff Creem

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA performance
@ 1993-04-30 17:18 Wes Groleau X7574
  0 siblings, 0 replies; 8+ messages in thread
From: Wes Groleau X7574 @ 1993-04-30 17:18 UTC (permalink / raw)


There is a saying that the first step in learning to do genealogy is learning
to spell it (A little teasing about the subject line).

Seriously, y'all had good responses about the performance problem.  One more
factor, though:  To expand and generalize on the comments about a) size of
floats, b) use or non-use of co-processor, c) built-in abstraction vs.
hand-coded abstraction --  Some computers (such as the high-end VAXen) and some
co-processors have complex as a simple data type.  In such a system, a complex
multiplication may be a single instruction:  certainly more efficient than
four floating point multiplications and two floating point multiplications
i.e. (a+bi)*(c+di) = [ac-bd]+[ad+bc]i  where the + outside the brackets doesn't
count.  Probably one complex add is probably more efficient even than two
( [a+c] + [b+d]i ).  Any optimizer smart enough to recognize the Ada record
or two-float versions as a complex operation is probably smart enough to
recognize complex operations where there aren't any.  All four of these reasons
(a, b, c, and mine are why many compiler vendors provide complex math packages.

Some of these vendor packages use assembler or pragma INTERFACE for the body.
Verdix, unfortunately, uses the Ada record approach with the body in Ada.
(Also, there is a coding error and a STUPID design error in the Verdix complex
package--e-mail me if you can't find it.)

Conclusions:  1. Don't compare two compilers by using non-equivalent code.
              2. RTFM - with Verdix, you have to read it VERY carefully to
                 find out they have a complex math package.
             
Wes G.

(Back to our regularly scheduled info-mercial about marketing Ada.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-05-03 11:54 cis.ohio-state.edu!news.sei.cmu.edu!firth
  0 siblings, 0 replies; 8+ messages in thread
From: cis.ohio-state.edu!news.sei.cmu.edu!firth @ 1993-05-03 11:54 UTC (permalink / raw)


In article <1993Apr28.204735.19177@netfs.dnd.ca> BERRYMAN@orca.drep.dnd.ca (DON
 BERRYMAN) writes:

>begin
>
>   for i in v1'range loop
>       v1(i) := cmplx(1.0,2.0);
>       v2(i) := cmplx(0.1,0.2);
>   end loop;
>
>   start := clock;                       -- Note the start time
>
>   for n in 1..100 loop                  -- Do the Vector Sum 100 times
>       a := cmplx (0.0, 0.0);
>       for i in v1'range loop
>           a := a + v1(i) * v2(i);
>       end loop;
>   end loop;

As a matter of interest, did any of the copmpilers optimise away the
entire program?  In principle, value tracking and constant folding
can compute the final value of A at compile time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-05-03 21:21 Robert I. Eachus
  0 siblings, 0 replies; 8+ messages in thread
From: Robert I. Eachus @ 1993-05-03 21:21 UTC (permalink / raw)


     I'd like to go a little further than Robert Dewar and say that
this test violates several important principles about interlanguage
benchmarking.  The first and most important is that you must compare
two versions of the SAME program with the same data.  The difference
in the data I'll assume was a typo, but the difference in programs is
major.  All the Ada versions except case8 and case9 start the timed
loop with a function call, while the FORTRAN version starts with a
static value.  On most compilers I know of this is enough to defeat
movement of code out of the loop...

    Second, use the same clock!  Wall clock time is okay, but know
what you are doing and use a dedicated machine.  (A dedicated machine
is usually needed for other reasons...)  Usually when comparing two
programming languages you either have to do some implementation
dependent coding to get to the same OS clock call, or time the
execution time for the entire benchmark in the OS environment, such as
the Unix "time;test_program;time".

    Third use each the features of each language as they are expected
to be used.  If you are benchmarking FORTRAN 77 vs. Ada 83, the Ada
program should use (user defined) vector operations instead of loops
in the main program and multiple calls to user written subroutines.
(If the comparison is to FORTRAN 90, then both programs should use
vector operations...)

     I won't write and run such a version because of the other
problems with this benchmark, but it is relatively easy:

     type Complex is private;
     type Complex_Vector is array (Integer range <>) of Complex;
     ...
     function "*"(L,R: Complex_Vector) return Complex;
     ...

     Now a good Ada compiler will do pretty well, since it can keep
everything except the big vectors in registers.  Of course, a good
FORTRAN 90 compiler will skip 99 interations of the outer loop. :-)

--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-05-04 15:03 cis.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!howland.
  0 siblings, 0 replies; 8+ messages in thread
From: cis.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!howland. @ 1993-05-04 15:03 UTC (permalink / raw)

In article <1993May4.142456.13012@convex.com> sercely@convex.com (Ron Sercely) 
writes:

>In the specific benchmark posted, IMHO, the BIGGEST difference between
>the Ada and FORTRAN implementation is the constraint checking.  The FORTRAN
>code does NOT check array bounds, nor bounds of the individual elements.  Ada,
>on the other hand, MUST check both.

I don't think so.  In the benchmark code, both loops were written

	for i in v1'range loop

and both v1 and v2 were declared of the same subtype.  Any Ada compiler
that introduced an index range check in such an obviously safe situation
would deserve to fail miserably.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-05-04 15:18 Boris Pelakh
  0 siblings, 0 replies; 8+ messages in thread
From: Boris Pelakh @ 1993-05-04 15:18 UTC (permalink / raw)


In article <1993May3.075449.12264@sei.cmu.edu> firth@sei.cmu.edu (Robert Firth)
 writes:
>As a matter of interest, did any of the copmpilers optimise away the
>entire program?  In principle, value tracking and constant folding
>can compute the final value of A at compile time.

Actually, I had to put a use of A (call to a routine) after the timing
ended in order to get any results - otherwise,everything other than the
timing got tossed.


-- 
Boris Pelakh		Ada Project Leader          pelakh@convex.com
		     Convex Computer Corporation
"If winning isn't important, why keep score ?"	-- Lt. Worf, Star Trek TNG.
			

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ADA Performance
@ 1993-05-04 19:31 Robert I. Eachus
  0 siblings, 0 replies; 8+ messages in thread
From: Robert I. Eachus @ 1993-05-04 19:31 UTC (permalink / raw)


   I had said:

   >>    ...The first and most important is that you must compare two
   >> versions of the SAME program with the same data.

   And later I said...

   >>    Third use each the features of each language as they are expected
   >> to be used...

In article <1993May4.080922.22901@lth.se> dag@control.lth.se (Dag Bruck) writes
:

   > You either have to compare compilers with programs that are as similar
   > as possible, or compare languages by writing "idiomatic" code in each
   > language.  I think both kinds of comparisons are useful and complement
   > each other.

   Not a contradiction, just a fundamental problem with cross-language
benchmarking.  First you need two programs which accomplish the same
functional operations in the same order but are written in different
languages.  Second (or third :-) the should be written to take
advantage of the features of the language, not to avoid otherwise
normal operations.  For a good example assume I decide to use a 1024
point Fourier Tranform as a benchmark.  Okay, I should use the same
algorithm for both languages (assume FFT).  If I am going to use a
complex transform in FORTRAN I shouldn't do the transform modulo a
large integer in Ada.  I should use the same input data, read from the
same file, etc., etc.  But the Ada code might be much more elegently
written as a set of recursive calls using vector operations...as long
as I use the same algorithm to get the same answers.  (With reals they
may not be exact, but they should be very close.  In this case I
actually perfer to do the transform and the inverse and measure the
error...)

   On theother hand, if I want to the sum of the first 1,234,567
integers to benchmark Ada vs APL, I have an insoluble problem.  In APL
I will write:
       
       N <-- +/iota 1234567  

    and be very surprised to get anything other than an instant
response.  (All APL systems I know of use a compact representation for
such vectors, and compute the sum from the three numbers actually
generated.)  What is the equivalent code in Ada or FORTRAN?  Writing
an explict loop is probably wrong, and writing (X * (X+1)) / 2 is
worse.  So this particular benchmark is inappropriate for comparing
these languages.

    This was the problem with the benchmark proposed in the article
that started all this.  A truly agressive FORTRAN compiler can get rid
of almost everything, while no sensible Ada compiler is going to try
to determine, at at compile time, whether or not an exception is going
to be raised.  Put in vectors which contain many different values, and
rotate the vectors so that you get 100 different answers, print all of
the answers out, and you might get a valid benchmark.
--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1993-05-04 19:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1993-04-29 16:12 ADA Performance Robert Dewar
  -- strict thread matches above, loose matches on Subject: below --
1993-04-29 17:08 Jeffrey M. Creem x5700
1993-04-30 17:18 ADA performance Wes Groleau X7574
1993-05-03 11:54 ADA Performance cis.ohio-state.edu!news.sei.cmu.edu!firth
1993-05-03 21:21 Robert I. Eachus
1993-05-04 15:03 cis.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!howland.
1993-05-04 15:18 Boris Pelakh
1993-05-04 19:31 Robert I. Eachus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox