in reply to sorting a vec

If by "sort a vec" you mean, treat a bitstring as a collection of n-bit values and then sort them using perl's built in sort function without converting it/them to an array, you don't. At least, not easily.

The input to sort is an array as is the return. if you bit-fields are of equal length, which is also a power of 2, then you can use vec (or unpack/pack with a pattern of '(bn)*' to convert them to/from an array.

vec( $vec, $_, 4) = rand(16) for 0 .. 99; print map{ vec( $vec, $_, 4)} 0 .. 99; 6 15 0 10 4 6 10 1 13 9 7 1 1 1 7 12 3 6 14 7 4 7 8 7 0 4 10 0 13 6 6 9 14 7 10 5 12 4 15 11 13 12 7 2 13 14 4 7 0 10 9 4 10 5 3 12 10 9 15 12 5 3 8 0 0 6 7 0 2 14 14 1 3 13 4 15 8 10 2 8 11 2 0 14 10 4 8 6 10 1 10 7 10 13 6 5 12 6 0 15 $p=0; map{ vec( $vec, $p++, 4) = $_ } sort{ $a <=> $b } map{ vec( $vec, $_, 4) } 0 .. 99; print map{ vec( $vec, $_, 4)} 0 .. 99; 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15

It would be possible supply sort with a list of integers 0 to the length (in bitfields) of your vector and to play games inside the sort block, performing the switching of the bit-strings yourself using vec and returning the approriate value (-1, 0, 1) yourself to indicate what you had done. This is difficult to get right, fragile as heck, and grossly inefficient as the convertion for testing would be done for every comparison of every field, rather than just once at the beginning and once at the end. Nothing more than an intellectual curiosity. I probably shouldn't have even mentioned it:)


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!

Replies are listed 'Best First'.
Re: Re: sorting a vec
by Anonymous Monk on Nov 02, 2003 at 16:08 UTC
    Thanks, for your help. I appreciate it. I am trying not convert the vec to an array because I don't have that much memory. I created a couple different sorting functions where I do switch the bitstrings myself. I used the algorithms for insertion, substitution and quicksort just to do a comparison on which one would be the best for this type of problem. Insertion and substitution are really slow once I try to sort a vec of a million integers which makes sense but I had to try. Quicksort right now is giving me problems, I think due to recursion but I am not certain. I was hoping there was clever way to do it within Perl. I wanted to say thanks for the help!

      Your welcome, but if you gave us a little more information, we could probably suggest a better solution.

      For example, how many bits are in each of your integers?

      How many do you need to sort?

      Storing a million integers in a standard perl array and sorting them is an almost everyday procedure. Even on my 233Mhz laptop it takes less than five minutes using perls builtin sort. So, unless you have a very large number, or the integers are very large, the standard sort would probably suffice.

      If you have a very large number of small integers, say 100_000_000 x 8-bits or less, or 1_000_000 x say 128 bit, this could still be done with a custom sort routine without too much trouble.

      Knowing what your requirements are would allow someone here to suggest a solution.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!
      Wanted!

        I am going to be loading 50 million integers into the vec so I am using a bit parameter of 32 for my vec. Right now I am trying to figure out how to efficiently sort a randomly assigned vec containing integers from 1 - 50 mil. After that, I would then like to expand this sort to beable to sort this vec based on the integer values that exists in another vec. Why am I doing this? I am working on implementing suffix arrays in Perl so that I can expand a statistics package called nsp that currently uses hash tables. The problem with the hash table is that after a certain load it runs out of memory. Therefore, I am trying to see if there exists other data structures that will hold more data efficiently. So I am converting words to integers and then loading them into a suffix array. Sorry about the lack of information. I am not use to posting and am not certain where the point of too much information is. I hope this helps clarify more, and again thanks for the advice.