comp.lang.ada
 help / color / mirror / Atom feed
* WE NEED A GOOD Ada SORT PACKAGE!
@ 1994-11-01 22:10 ferguson
  0 siblings, 0 replies; 6+ messages in thread
From: ferguson @ 1994-11-01 22:10 UTC (permalink / raw)



We are looking for a generic Ada sort package that will be efficient for 
extremely large numbers of records residing on disk (~700,000), occupying about 
20 MB of memory, which are locally out of order (i.e. the data are only 
unordered in a small region of the whole).  The application will reside on a 
UNIX workstation.  If there are commercially available sort packages, we'd sure 
like to know where they are!  We are not sure whether the GRACE or BOOCH 
components have any robust sort packages for large data sets.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WE NEED A GOOD Ada SORT PACKAGE!
@ 1994-11-02 21:19 Bennett, Chip (KTR) ~U
  1994-11-03 17:42 ` Jahn Rentmeister
       [not found] ` <39bvj7$543@goanna.cs.rmit.oz.au>
  0 siblings, 2 replies; 6+ messages in thread
From: Bennett, Chip (KTR) ~U @ 1994-11-02 21:19 UTC (permalink / raw)


ferguson@TRON.BWI.WEC.COM wrote:

> We are looking for a generic Ada sort package that will be efficient for
> extremely large numbers of records residing on disk (~700,000), occupying
about
> 20 MB of memory, which are locally out of order (i.e. the data are only
> unordered in a small region of the whole).  The application will reside on
a
> UNIX workstation.  If there are commercially available sort packages, we'd
sure
> like to know where they are!  We are not sure whether the GRACE or BOOCH
> components have any robust sort packages for large data sets.

I am looking at a copy of "Software Components with Ada", the book that
accompanies the Booch components.  Under the topic of "external sorting", it
shows the specs for two utilities: natural merge sort and polyphase sort.  I
believe the book is available separately.  Any help?

(By the way, I know we're sensitive around here about not capitalizing Ada,
but since you were shouting in your "Subject", it probably would have been
OK.  :-)

*****************************************************************
* Chip Bennett, GDE Systems Inc | BennettC@j64.stratcom.af.mil  *
* USSTRATCOM/J64213             | Voice (402)294-7360           *
* 901 SAC Blvd, Suite 2B24      | FAX   (402)294-7912           *
* Offutt AFB, NE 68113-6600     | Opinions expressed are my own *
*****************************************************************



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WE NEED A GOOD Ada SORT PACKAGE!
  1994-11-02 21:19 WE NEED A GOOD Ada SORT PACKAGE! Bennett, Chip (KTR) ~U
@ 1994-11-03 17:42 ` Jahn Rentmeister
  1994-11-05  5:43   ` Robert Dewar
       [not found] ` <39bvj7$543@goanna.cs.rmit.oz.au>
  1 sibling, 1 reply; 6+ messages in thread
From: Jahn Rentmeister @ 1994-11-03 17:42 UTC (permalink / raw)


Bennett, Chip (KTR) ~U (BennettC@J64.STRATCOM.AF.MIL) wrote:

: I am looking at a copy of "Software Components with Ada", the book that
: accompanies the Booch components.  Under the topic of "external sorting", it
: shows the specs for two utilities: natural merge sort and polyphase sort.  I
: believe the book is available separately.  Any help?

Maybe I don't get it right, but aren't these algorithm's designed for tape
drives? On a Disk using Direct_IO, one could for example use QuickSort.
(Because there is direct access.) There should be even faster possibilities
than a direct translation of quicksort, because one could use available RAM
to presort parts of the data. 

At least polyphase sort will work with sequential access and is bound to it. 
When using direct_io, much faster methods should be available. 
(OK, for variable-length data sets it won't work, i.e. text-files etc.)

Though, I don't know of any readily available packages that do it. But it
does not sound like a big thing to me. (I probably missed the point, eh?)

--
This .sig deliberately left empty.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WE NEED A GOOD Ada SORT PACKAGE!
  1994-11-03 17:42 ` Jahn Rentmeister
@ 1994-11-05  5:43   ` Robert Dewar
  1994-11-06 15:28     ` Larry Kahn
  0 siblings, 1 reply; 6+ messages in thread
From: Robert Dewar @ 1994-11-05  5:43 UTC (permalink / raw)


Jahn, the idea of doing quicksort directly on a disk gives me a headache
just thinking about the poor disk arm!

no, you definitely can't do this if you are interested in efficiency,
even on disk files you definitely need some multi-phase merge sort.
For Realia-COBOL, I implemented the disk sort with a 64 way polyphase
merge, and it seems to be the fastest external sort around, or at least
was at the time I was paying attention to COBOL!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WE NEED A GOOD Ada SORT PACKAGE!
  1994-11-05  5:43   ` Robert Dewar
@ 1994-11-06 15:28     ` Larry Kahn
  0 siblings, 0 replies; 6+ messages in thread
From: Larry Kahn @ 1994-11-06 15:28 UTC (permalink / raw)


In article <39f622$p7d@gnat.cs.nyu.edu>, dewar@cs.nyu.edu (Robert Dewar) says:
>
>Jahn, the idea of doing quicksort directly on a disk gives me a headache
>just thinking about the poor disk arm!
>
>no, you definitely can't do this if you are interested in efficiency,
>even on disk files you definitely need some multi-phase merge sort.
>For Realia-COBOL, I implemented the disk sort with a 64 way polyphase
>merge, and it seems to be the fastest external sort around, or at least
>was at the time I was paying attention to COBOL!
>

I have a shell sort example of a generic in Ada but it does the sort in memory..

For the tests we Did it was faster than a quicksort probably due to the overhead
of the quicksort code (this on a dec compiler with around 5000 items)

don't remember where the trade off point was where quicksort was faster
but sure there was one.....

The original source was from the following book:

-- from Software Components With Ada, Grady Booch, page 469

Laurence G. Kahn
Senior Software Engineer
Dynamics Research Corp.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: WE NEED A GOOD Ada SORT PACKAGE!
       [not found] ` <39bvj7$543@goanna.cs.rmit.oz.au>
@ 1994-11-09  7:30   ` Richard A. O'Keefe
  0 siblings, 0 replies; 6+ messages in thread
From: Richard A. O'Keefe @ 1994-11-09  7:30 UTC (permalink / raw)


> ferguson@TRON.BWI.WEC.COM wrote:

> We are looking for a generic Ada sort package that will be efficient for
> extremely large numbers of records residing on disk (~700,000), occupying about
> 20 MB of memory, which are locally out of order (i.e. the data are only
> unordered in a small region of the whole).  The application will reside on a
> UNIX workstation.  If there are commercially available sort packages,
> we'd sure like to know where they are!

It would be most illuminating if you could tell us what is wrong with the
UNIX sort(1) utility.  If your records (28.5 bytes average, from your figures)
are normal lines of readable text (and if they aren't, you have some nasty
portability problems waiting to bite you) it should be able to do the job.

I have a little utility that I posted to the net a couple of years ago,
written in C.  It's called nsort, and it operates by
    - reading the entire file into memory as a singly-linked list
    - performing a bottom up "natural merge" sort.
      This is one of the best internal sorts around anyway (yes, I do mean
      that it usually beats "quick"sort) and it is especially good with
      mostly-ordered data
    - then it writes the lines out.
I get rather large speedups compared with sort(1).  For example, I just
created a test file:
g% wc /tmp/TEXT
  259533  940803 7206497 /tmp/TEXT
7.2 Mbytes, ~260 000 lines, ~27.8 bytes/record including line terminator
so the average record size is not too far off your figure.  On the SPARC
Solaris 2.3 system that I use, 
    Command                     User System (time in seconds)
    sort  </tmp/TEXT >/dev/null   38+1	-- to my surprise, -y didn't help
    gsort </tmp/TEXT >/dev/null   16+4	-- GNU sort (in textutils.tar.gz)
    nsort </tmp/TEXT >/dev/null   11+1

That's a speedup ratio of 39/12 > 3, which is rather gratifying, because
sort(1) is supposed to be fairly well tuned.  The guts of nsort is about
60 lines of C.  It would be about the same in Ada.

-- 
"The complex-type shall be a simple-type."  ISO 10206:1991 (Extended Pascal)
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~1994-11-09  7:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1994-11-02 21:19 WE NEED A GOOD Ada SORT PACKAGE! Bennett, Chip (KTR) ~U
1994-11-03 17:42 ` Jahn Rentmeister
1994-11-05  5:43   ` Robert Dewar
1994-11-06 15:28     ` Larry Kahn
     [not found] ` <39bvj7$543@goanna.cs.rmit.oz.au>
1994-11-09  7:30   ` Richard A. O'Keefe
  -- strict thread matches above, loose matches on Subject: below --
1994-11-01 22:10 ferguson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox