comp.lang.ada
 help / color / mirror / Atom feed
From: alice!peju@ucbvax.Berkeley.EDU  (Peter Juhl)
Subject: Re: Hoare's gripes about Ada (should be so what)
Date: 1 Sep 93 22:32:42 GMT	[thread overview]
Message-ID: <26475@alice.att.com> (raw)

I apologize if this appears twice in different forms, I posted
a preliminary version. Doug McIlroy had numerous suggestions
for improvements, I hereby send a more readable article (the first
one should have been canceled, but one never knows.) 

I think that it has some interest.

Colin James's remarks about quicksort vs distribution sort (hereafter
called radix sort) must be tempered according to circumstances.  It is
not clear exactly which big-O estimates he referred to, but it is
worth warning that the occasionally made claim that quicksort takes
time N log N while radix sort only takes time N makes sense only if
you give entirely different interpretations to N in the two cases.

If you have N records comprising a total volume of V "digits" of data,
then performance scales like

        quicksort, expected:  N log N comparisons
        quicksort, worst:  N^2 comparisons
        big-end radix, expected:  N log N digit inspections
        big-end radix, worst:  V digit inspections
        little-end radix, always:  V digit inspections

V is usually bigger - often much bigger - than N log N.

If you are comparing strings, comparisons take expected time at least
log N, not time 1.  Thus for strings quicksort has expected time N
log^2 N. The worst case is VN.

So it all depends on what you are sorting.  If keys can be represented
as strings, and keys must be compared as strings, then big-end radix
sort is a good thing.

If it should happen that V is not much bigger than N log N and keys
are essentially strings, then little-end radix is a good thing.

Radix sorting has considerable bookkeeping overhead.  Under the
conditions set forth above, it is definitely a win, but not without
substantial programming effort.  So quicksort is not dead:  it is easy
to get right with reasonable efficiency, and it is the only game in
town when keys cannot be compared as strings.

A little-known hybrid of radix sort and quicksort gets around the
problem of string comparison taking expected time log N, at the cost
of a larger scaling constant, without the bookkeeping hassle of radix
sorting.  Partition on first digits into three parts
(less,equal,greater).  In the less and greater parts partition
recursively on first digits.  In the equal part partition sort
recursively on second digits and so on.

Both research Unix and BSD Unix currently use radix sort in their sort
utilities.

All this and more can be found in the article, "Engineering radix
sort", by McIlroy, Bostic, and McIlroy in "Computing Systems" 6 (1993)
5-27.  Another recent article is Davis, "A fast radix sort", in
"Computer Journal" 35 (1992) 636-642.  Both articles contain effective
programs written in C. For a fast C qsort, based on quicksort, see
Bentley and D McIlroy to appear next November in "Software Practice
and Experience".  For the speed-champion qsort (which isn't based on
quicksort) see P McIlroy in the proceedings of the Symposium on
Discrete Algorithms, Austin, 1993.

Incidentally, Hoare is an excellent software engineer.

--- peter (peju@research.att.com)

             reply	other threads:[~1993-09-01 22:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1993-09-01 22:32 Peter Juhl [this message]
  -- strict thread matches above, loose matches on Subject: below --
1993-09-13 13:32 Hoare's gripes about Ada (should be so what) Wes Groleau x1240 C73-8
1993-09-10 18:08 Dag Bruck
1993-09-10  0:49 Michael Feldman
1993-09-09 21:07 Robert Dewar
1993-09-09 18:53 Dag Bruck
1993-09-03 19:44 Wes Groleau x1240 C73-8
1993-09-02  3:10 Michael Feldman
1993-09-02  2:40 Robert Dewar
1993-09-02  2:38 Robert Dewar
1993-09-01 17:02 Mark A Biggar
1993-09-01 14:15 agate!howland.reston.ans.net!darwin.sura.net!jabba.ess.harris.com!dr3w!sm
1993-09-01 13:14 Ste ve Masticola
1993-09-01  3:03 Michael Feldman
1993-08-26  3:04 Colin James 0621
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox