baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I was wondering does anyone know a trick to reduce memory requirement during a bucket sort without significantly reducing the speed

Let say I have 30 numbers:

2 53 23456 75434532 233 45654 665446 33 234 5 6665 3 2 23 43 62553 27263 2 33 75434532 987762 34455 22 3345 24 567 54 43 12 23
in order to sort them using bucket sort algorithm i would need to either create an array of size 75434532 and assign each number to a bucket. duplicates are resolved using linked lists. another strategy is to avoid arrays and create a linked list and then upon insertion re-assign pointers. However in order to figure out where a number needs to be inserted a binary search needs to be implemented (so O(log N) algorithm instead array based O(1) -> N -average size of the list) plus if the procedure is repeated many times over each allocated memory unit has to be freed (erased) after the sort in order to be used again without any memory leaks (ok perl has a GC so that is not that difficult to do - but still some time needs to be wasted in order to do it).

are there any other solutions or is that it?? Are there any other fast O(M) algorithms fo sorting (M - the number of elements)?

thnx

Replies are listed 'Best First'.
Re: Bucket sort - reducing memory requirements and preserving speed
by BrowserUk (Patriarch) on Jan 13, 2015 at 12:49 UTC
    I was wondering does anyone know a trick to reduce memory requirement during a bucket sort without significantly reducing the speed

    How? (which module or post code); and why? -- reasoning -- are you using a "bucket sort" from Perl?

    Given that:

    • Your data is so unsuited to a bucket sort:

    • Perl's in-built (merge) sort (since 5.8x???) will sort numerical data quite quickly for ranges that most hardware will handle;
      #! perl -slw use strict; use Time::HiRes qw[ time ]; my @data = qw[ 2 53 23456 75434532 233 45654 665446 33 234 5 6665 3 2 23 43 62553 27263 2 33 75434532 987762 34455 22 3345 24 567 54 43 12 23 ]; @data = ( (@data) x 10 ); my $start = time; @data = sort{ $a <=> $b } @data; printf "sorting %10u integers took: %.6f\n", scalar( @data ), time() - + $start; @data = ( @data ) x 10; $start = time; @data = sort{ $a <=> $b } @data; printf "sorting %10u integers took: %.6f\n", scalar( @data ), time() - + $start; @data = ( @data ) x 10; $start = time; @data = sort{ $a <=> $b } @data; printf "sorting %10u integers took: %.6f\n", scalar( @data ), time() - + $start; @data = ( @data ) x 10; $start = time; @data = sort{ $a <=> $b } @data; printf "sorting %10u integers took: %.6f\n", scalar( @data ), time() - + $start; @data = ( @data ) x 10; $start = time; @data = sort{ $a <=> $b } @data; printf "sorting %10u integers took: %.6f\n", scalar( @data ), time() - + $start; __END__ C:\test>1113070 sorting 300 integers took: 0.000152 sorting 3000 integers took: 0.000316 sorting 30000 integers took: 0.001214 sorting 300000 integers took: 0.018363 sorting 3000000 integers took: 0.192435

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Running your code on my box:

      karls-mac-mini:monks karl$ ./kgb-1113079.pl sorting 300 integers took: 0.000077 sorting 3000 integers took: 0.000099 sorting 30000 integers took: 0.000770 sorting 300000 integers took: 0.008718 sorting 3000000 integers took: 0.084943

      Seems like you got some slow machine bro ;-)

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        Seems like you got some slow machine bro ;-)

        Indeed. It's a 7 year old Core™2 Quad Processor Q6600 running at 2.4GHz; but its more than adequate for development.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Bucket sort - reducing memory requirements and preserving speed
by salva (Canon) on Jan 13, 2015 at 12:41 UTC
    You can not implement bucket sort using linked lists while still being O(N).

    The right solution is to use bigger buckets and then to recurse.

    Anyway, is this home work? because otherwise you should probably go for some CPAN module, for instance Sort::Key::Radix:

    use Sort::Key::Radix qw(isort); my @data = map int, qw(2 53 23456 75434532 233 45654 665446 33 234 5 6 +665 3 2 23 43 62553 27263 2 33 75434532 987762 34455 22 3345 +24 567 54 43 12 23); my @sorted = isort @data; print "@sorted";
Re: Bucket sort - reducing memory requirements and preserving speed
by Anonymous Monk on Jan 13, 2015 at 12:31 UTC

    Some more background would be nice: Why do you need a bucket sort, or why isn't Perl's sort appropriate for whatever your application is? If you really need to implement your own algorithm, what do you expect your data to usually look like?

    See also comparison of sorting algorithms on Wikipedia.

Re: Bucket sort - reducing memory requirements and preserving speed
by locked_user sundialsvc4 (Abbot) on Jan 13, 2015 at 14:50 UTC

    The bucket algorithm depends, as you know, upon first distributing the data into “buckets,” then, by some unspecified means, sorting each bucket.   Many other sorting algorithms designed for use with very large datasets also work in part by subdividing the inputs into smaller, more manageable chunks that can then be sorted e.g. “in memory,” but their approach toward that splitting process are much more aware of clustering of key distributions, to which simple-minded algorithms like Bucket are especially prone.

    In short, if you want a practical sort for in-memory work, the sort verb is hard to beat ... and “in memory” is often sufficient in today’s multi-gigabyte RAM sizes.   Otherwise, modules such as Sort::External may be appropriate.   Or, the use of an external sorting command or tool, e.g. SyncSort.