Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: A way to pack string into another light weight structure

by BrowserUk (Patriarch)
on Dec 29, 2011 at 10:44 UTC ( [id://945477]=note: print w/replies, xml ) Need Help??


in reply to A way to pack string into another light weight structure

trying pack('s', $number) way now...

That won't work. pack template 's' is signed shorts which only handles numbers from -32768 through +32767.

If your 10 digit numbers can go all the way up to 9,999,999,999, then the smallest pack template you could use is 'Q' -- if your perl supports 64-bit integers -- and that will produce 8-byte values which would only save you 20%.

Besides which, if you are going to store your packed numbers individually in an array, you won't save much memory as the size of the element itself is insignificant compared to the internals structures that hold them.

But, if instead, you stored the packed integers into a single string and then shuffled the string, you avoid teh storage overhead of the array.

If your largest number is less than 4,294,967,296 then you could get away with 'V' (or 'N') which is 4-bytes that would reduce the memory requirement to around 430MB which is doable.

You could then shuffle that using:

#! perl -slw use strict; use Time::HiRes qw[ time ]; my $t = time; my $s = chr(0); $s x= 112e6 * 4; substr( $s, $_ * 4, 4 ) = pack 'V', scalar <> for 0 .. 112e6-1; my $n = length( $s ) / 4; for( 0 ..$n ) { my $p = $_ + rand( $n - $_ ); my $x = substr( $s, $_*4, 4 ); substr( $s, $_*4,4 ) = substr( $s, $p*4,4 ); substr( $s, $p*4, 4 ) = $x; } print unpack 'V', substr( $s, $_ * 4, 4 ) for 0 .. $n; print STDERR time - $t;

It'll take about 10 minutes to shuffle your 112 million numbers, which should be quick enough if this is a one off task.

Perhaps, if your need for true randomness isn't too crucial, then the simplest way to randomise your list would be to sort it using an external sort utility using an offset other than the first digit. That would give you 9 'random' possibilities.

If that's not sufficient you could pick two offset/lengths:sort -k 8,9 -k 3,4 in > out which gives you many possibilities. Especially if you realise that your keys can be of variable length and even overlap.

The downside is that unless your sort is substantially quicker than mine, it'll take well over half an hour. But again that might be okay if it is an infrequent requirement.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://945477]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-25 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found