* LA AdaTEC Ada Fair '84 Report
@ 1984-12-07 15:04 colbert
0 siblings, 0 replies; only message in thread
From: colbert @ 1984-12-07 15:04 UTC (permalink / raw)
Report on the
L.A. AdaTEC Ada* Fair '84
Compiler Test Results
Bryce M. Bardin
Hughes Aircraft Company
Software Engineering Division
Ground Systems Group
Fullerton, CA
On June 30th, 1984, L.A. AdaTEC held its second annual Ada Fair. Again
this year, compiler vendors were invited to run a suite of test programs
selected by L.A. AdaTEC. Each vendor was asked to report his own
results in accordance with the set of rules which were supplied with the
test suite. This report summarizes the results reported by the vendors.
Source listings of the programs and copies of the rules were distributed
to the people who attended the Fair. They are now available on the
ARPAnet by logging into EV-INFORMATION at ECLB (with a password of EV)
and typing "HELP TESTS-ADA-FAIR-84" or by FTPing
<EV-INFO>TESTS-ADA-FAIR-84.HLP. As an alternative, L.A. AdaTEC, in the
person of Ed Colbert, will mail you the tests over usenet if you contact
him at "trwrb!trwspp!colbert". The test suite was assembled by Ed
Colbert (TRW), Gerry Fisher (IBM Research), and me.
The vendors who participated by running the tests were:
1) Data General Corporation (DG), running the DGC/Rolm ADE compiler
on a DG MV8000 under AOS/VS,
2) Irvine Computer Sciences Corporation (ICSC), running the ICSC-Ada
Compiler on a Gould 32/87, and
3) RR Software, Inc. (RR), running the JANUS/Ada compiler on an IBM
PC-XT under DOS.
This year, with the advent of more validated compilers, the tests were
chosen without trying to limit the Ada constructs used in any way. The
intent of the suite was to reveal the current status of Ada
implementations to the entire Ada community, to the extent this is
possible with a very small set of tests.
* Ada is a registered trademark of the U.S. Government,
Ada Joint Program Office.
-2-
Since we wished to enable vendors and end users alike to make simple
performance comparisons on a uniform and equitable basis, we assumed
that package Calendar was implemented. Because evaluation of the
differences in performance which depend on slight differences in source
code is almost impossible, we established the rule that making
unauthorized changes to any test automatically removes a vendor from
consideration on that test.
Additionally, in order to challenge the vendors of validated compilers a
bit, we included a few tests of features that are needed in order to
build serious real-time embedded systems -- features that only a rather
complete Ada implementation would be likely to support. Where possible,
the tests were designed to be self-checking and to report their success
or failure.
The tests were checked out as far as possible with validated versions of
NYU Ada/Ed, although some features not supported by Ada/Ed were
simulated. In spite of our best efforts, two tests were clearly
incorrect as given to the vendors and, in accordance with the rules,
these tests were dropped from the suite. The Boolean vector "and" test
had two errors: "v2(N) := true;" should have been "v1(N) := true;" and
"vector_result(n) := v1(n) and v2(n);" should have been "vector_result
:= v1 and v2;". The derived type inter-conversion test had the record
representation clause and length clause commented out, which defeats the
purpose of the test. (Although it turns out that no vendor could have
performed this test even if the source text had been correct.) A third
test, the sets package, was challenged by Data General at the time their
results were submitted. Several experts have now agreed that the test
(and also version 1.2.9 of Ada/Ed) is in error, so the test has been
dropped. The ARPAnet version of the test suite has been corrected.
One group of the tests attempted to produce serious timing results using
the package Calendar. These tests were quite interesting because of the
problems in test construction they revealed. In order to assure
adequate precision in the results, the vendors were instructed to modify
the loop counts to obtain significant net time differences. The
criterion used to determine whether the loop count was adequate to pass
these tests was based on the assumption that the resolution of the Clock
function is determined by Duration'Small and therefore the tests
compared the net time with 100 times Duration'Small in order to be sure
of at least one percent precision in the average times.
However, according to the Ada Reference Manual (ARM), "Duration'Small
need not correspond to the basic clock cycle, the named number
System.Tick" (ARM 9.6/4). Although the ARM does not define "basic clock
cycle", I interpret it to mean the resolution of the function
Calendar.Clock. Then the comparison should have been against 100 times
the maximum of Duration'Small and System.Tick, instead.
Since the disparity between the clock resolution and Duration'Small may
be very large (e.g., in the case of Data General it is 1.0 vs. 1/(2**9)
seconds, a ratio of 512 to 1), the results of the timing tests as
written are not guaranteed to be very accurate even when the test itself
announces that it "passed". It should be emphasized that the cause of
this problem is primarily poor test design.
-3-
The major reasons that compilers did not pass some tests can be simply
stated:
1) The test was not attempted. (We speculate that this is likely to be
due to the fact that some feature or features necessary to the proper
functioning of the test are not implemented or have significant bugs.)
2) The vendor was disqualified on the test due to the use of
unauthorized changes to the source code. (Initially, all vendors were
disqualified on one or more tests for this reason. This was
particularly likely to be the cause for the non-validated
implementations, since they need work-arounds for unimplemented
features, in order to make a program compilable. However, in a few
cases, there was no apparent reason for the vendor to modify the code.
In such cases we asked the vendor to re-run the test without the
modifications.)
3) The test was run correctly, but the results did not meet the
accuracy criterion, so the test itself indicated that it failed.
(This was generally due to poor test design.)
The following overall comments apply to the results from each of the
vendors individually:
1) The DG implementation has an apparent inconsistency in the
implementation of the Calendar.Clock function and the definition
of System.Tick. The value of System.Tick is 0.1 seconds and the
resolution of Clock is 1.0 seconds. I believe their implementation
to be incorrect. (DG says that they are aware of this discrepancy
and are taking steps to improve the resolution of their clock
function to equal System.Tick.) Errors were present in the output
format for type Duration and an apparent bug was revealed in the
operation of division of type Duration by type Integer.
2) The ICSC compiler, which is not yet validated, currently
implements type Calendar.Duration as a (hidden) subtype of float
and uses the floating point output routines. This leads to an
incorrect format for a Put of Duration values with both Fore and
Exp set to 0.
3) The RR compiler is also not yet validated. Contrary to the
benchmarking rules, no compilation or execution listings were
provided by RR. Their results have been compiled from the
summary they submitted.
How the vendors fared on each individual test is given in Table 1.
-4-
Most of the timing results reported by the vendors are summarized in
Table 2, regardless of whether the test was passed, "failed" due to
insufficient precision, or the vendor was disqualified on the test,
since these results are generally not too sensitive to the work-arounds
which may have been used. The original intent of the tests to provide
times accurate to 1% was not realized due to problems in test design.
Some of the times are only accurate to about one significant digit.
Therefore we are reporting the results in the exact format given by the
vendor, where possible, in order to avoid biasing the data further.
Interpretation of the data may be easier with the aid of the values of
the clock function resolution and Duration'Small, which are included in
the table along with their ratio. The greater the ratio of the
resolution value to Duration'Small, the less accurate the results would
be if the minimum iteration count that met the precision criterion were
used in the test. In general, the iteration counts used by the vendors
were greater than necessary to pass the Duration'Small criterion, but
not greatly so. All times are given in seconds.
Some of the size information supplied by the vendors is summarized in
Table 3. Because most vendors did not report all of the sizes
requested, only the size of the object module compiled for the test (the
columns labelled "Object") and the maximum memory size used (the columns
labelled "Memory") are given here. It should be noted that the DG data
include the stack/heap allocation in the size reported. All sizes are
given in (decimal) bytes.
One thing is clear about the results, and that is that all of the timing
tests need further refinement and, in some cases, drastic surgery to
improve their precision. In particular, besides using both System.Tick
and Duration'Small in checking the precision, better strategies are
needed for the measurement of some of the I/O times.
Another problem is that some of the tests were nominally "failed" for
reasons of inadequate precision because iteration counts or array sizes
greater than the maximum the implementation can support would have been
required. This is manifestly unfair when the goal of a test is to
measure timing rather than capacity. Future tests should have a better
separation of test concerns, making sure that timing tests and capacity
tests are kept distinct, and designing timing tests to run properly on
machines with small word sizes and small address spaces wherever that is
feasible.
We need to iterate the test design and trial use process until the
results are satisfactory to users and implementers alike. I believe the
current set of tests will have served their purpose, in spite of their
obvious flaws, if they help to point us in the right direction.
-5-
Test Name Vendor: DG ICSC RR
Ackermann's Function A[a,b] A[a,c] D[d,e]
Boolean Vector And Test I I I
Binary Search P N N
Cauchy Matrices - Floating Point F[f] N N
Cauchy Matrices - Fixed Point F[f] N N
Cauchy Matrices - Universal Numbers F[f] N N
Character Direct I/O P[a] P[a] D[d,e]
Character Enumeration I/O P[a] P[a] N
Character Text I/O P[a] P[a] D[d,e]
Consumer/Producer P N N
Derived Type Inter-conversion I I I
Floating Point Vector Addition P[a] F[a,g] D[d,e]
Friendliness Test P[h] N N
Integer Direct I/O P[a] P[a] D[d,e]
Integer Text I/O P[a] P[a] D[d,e]
Integer Vector Addition P[a] F[a,g] D[d,e]
Low Level Test N N N
Procedure Call Timing P[a] P[a] D[d,e]
Quick Sort - Parallel P D[d] N
Quick Sort - Sequential P D[d] N
Readers/Writers Problem P N N
Rendezvous Call Timing P[a] P[a] N
Sets Package I[i] I I
Legend: P = Passed
A = Anomalous (Program behavior was slightly anomalous)
F = Failed
N = Not Attempted
D = Disqualified
I = Invalid Test (Test Dropped)
Notes:
a Output had errors in format.
b Output had errors in values.
c Stack overflow occurred after Ackermann (3,7), but Storage_Error
was not raised or handled.
d Disqualified due to source code changes.
e No listing was provided by the vendor.
f Compiler passed the syntax and semantics checking phases, but
couldn't generate correct code.
g Array size could not be set large enough to give adequate timing
precision. Otherwise, program executed correctly.
h Compiled and executed correctly, but no set/use errors (use of a
variable before initialization) or 'hard' exceptions (exceptions
which will always be raised by the program) were detected by the
compiler. Procedure Dont_Do_It was not called in the generated
code, but was included in the load module. The run-time
environment did not identify the name of the exception which is
deliberately raised by the program (Program_Error).
i Compiler diagnosed source errors. Vendor successfully challenged
the validity of the test.
Table 1: Overall Results
-6-
Test Name Vendor: DG ICSC RR
Machine: DG MV8000 Gould 32/87 IBM PC-XT
Clock resolution (System.Tick) 1.0[a] 1.66667E-02 0.0549
Duration'Small 1.95312E-03 1.66667E-02 0.01
Ratio (a pure number) 512.0 1.0 5.49
Ackermann's function: 3.26E-4[b]
(3,1) 0.00000E+00[c] 0.00000E+00[c] --
(3,2) 0.00000E+00[c] 3.08059E-05[d] --
(3,3) 0.00000E+00[c] 1.37056E-05[d] --
(3,4) 9.70214E-05[d,f] 1.13187E-05[d] --
(3,5) 2.35638E-05[d,f] 1.13887E-05[e] --
(3,6) 3.48365E-05[d,f] 1.13214E-05 --
(3,7) 3.60249E-05[e,f] 1.15515E-05 --
(3,8) 3.62527E-05[f] [g] --
(3,9) [h] -- --
Character Direct I/O Write 1.00000E-04[d] 8.49966E-04 4.73E-3
Character Direct I/O Read 8.33333E-05[d] 5.20812E-04 3.63E-3
Character Enumer. I/O Write 1.33333E-03[e] 9.83294E-04 --
Character Enumer. I/O Read 5.60000E-03[e] 1.54994E-03 --
Character Text I/O Write 4.33333E-04[e] 4.79147E-05 1.54E-3
Character Text I/O Read 5.33333E-04[e] 9.41629E-05 1.40E-3
Float Vector Add 1.53846E-05[d] 0.00000E+00[c] 3.30E-4
Integer Direct I/O Write 2.70000E-04[e] 1.09579E-03 4.88E-3
Integer Direct I/O Read 1.10000E-04[e] 5.33312E-04 3.79E-3
Integer Text I/O Write 2.80000E-03 1.64993E-03 3.93E-3
Integer Text I/O Read 3.97500E-03 2.26658E-03 4.81E-3
Integer Vector Add 2.50000E-05[d] 3.33320E-06[d] 2.70E-4
No Parameter Call 1.50000E-05[d] 6.19975E-06 1.37E-4
In Parameter Call 1.50000E-05[d] 5.49978E-06 2.11E-4
Out Parameter Call 2.00000E-05[d] 5.89976E-06 1.77E-4
In Out Parameter Call 2.00000E-05[d] 6.06642E-06 1.77E-4
No Parameter Rendezvous 8.36666E-03 8.99964E-04 --
Notes:
a Clock resolution is 1.0, although System.Tick is 0.1 seconds.
b No individual results were provided by vendor.
c Net time was less than one resolution interval.
d Net time was at least 1 but less than 10 resolution intervals.
e Net time was at least 10 but less than 100 resolution intervals.
f Calculated by hand from intermediate results. (Due to a compiler
bug the values printed were all zero.)
g Storage_Error exception not raised or handled. The system
detected stack overflow and terminated the program.
h Terminated (as expected) by Storage_Error exception.
Table 2: Timing Results
-7-
Test Name Vendor: DG[a] ICSC[b] RR[b]
Size: Object Memory Object Memory Object Memory
Ackermann's function -- 348160 1792 75016 1540 86784
Binary Search -- 243712 -- -- -- --
Character Direct I/O -- 251904 3392 81872 2531 90112
Character Enumeration I/O -- 251904 4104 77328 -- --
Character Text I/O -- 249856 3336 76560 2527 87680
Consumer/Producer -- 352256 -- -- -- --
Floating Point Vector Addition -- 948224 1744 74968 1467 86656
Friendliness Test -- 241664 -- -- -- --
Integer Direct I/O -- 251904 3392 81872 2528 90240
Integer Text I/O -- 249856 3352 76576 2557 87680
Integer Vector Addition -- 948224 1712 74936 1422 86656
Procedure Call Timing -- 249856 2080 75304 2083 87296
Quick Sort - Parallel -- 354304 3080 84648 -- --
Quick Sort - Sequential -- 243712 3296 84864 -- --
Readers/Writers Problem -- 356352 -- -- -- --
Rendezvous Call Timing -- 360448 1672 86824 -- --
Notes:
a Stack/heap storage is included in size
b Stack/heap storage is not included in size
-------
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~1984-12-07 15:04 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1984-12-07 15:04 LA AdaTEC Ada Fair '84 Report colbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox