Category: Utilities
Author/Contact Info Sean Kelly / mail@shortestpath.org
Description: Take a file on standard input, randomise order of lines and print to standard output
#!/usr/bin/perl -w
##
## randomiseLines.pl
## Take a file on standard input, randomise order of lines and
## print to standard output
##
## Sean Kelly <mail@shortestpath.org>
##
## v0.1 1/June/2001

use strict;

my %lines;
while (<>)
{
        $lines{ rand() } = $_;
}

my $line;
foreach $line (sort keys %lines)
{
        print $lines{$line};
}
Replies are listed 'Best First'.
Re: randomiseLines
by blakem (Monsignor) on Jun 01, 2001 at 14:04 UTC
    See the Cookbook recipies 8.7 and 4.17.

    Writing correct shuffling programs is tricky. Looks like yours might have key collision issue. I've learned to trust the cookbook when I'm not sure.

    -Blake
    p.s. Are the above links legitimate? I've never been there before (since I have my dogeared cookbook right here.) Does O'Reilly really condone the CD bookshelf being put on a public webserver like this?

    Update: Guess I got my question answered... Removed links to recipies. (There actually are books out there that encourage this type of usage. In fact, at least one of them is published by O'Reilly, if I'm not mistaken.) I stumbled on to the bookshelf site when I went looking for the code ftp site. Didn't know if it was legit or not. Sorry.

      Does O'Reilly really condone the CD bookshelf being put on a public webserver like this?

      See this entry from gnat's journal at use Perl; - Pirate Books.

      So, I guess the answer is "No".

      I also think that posting links to sites like this on Perlmonks is a really bad idea and would suggest that you remove them from your post.

      --
      <http://www.dave.org.uk>

      "Perl makes the fun jobs fun
      and the boring jobs bearable" - me

Re: randomiseLines
by ChOas (Curate) on Jun 01, 2001 at 14:09 UTC
    Cool...

    But on a suffieciently (sp?) large enough file, rand
    might return the same number, and you will lose data...

    Just my fl 0.02

    GreetZ!,
      ChOas

    print "profeth still\n" if /bird|devil/;
Re: randomiseLines
by bwana147 (Pilgrim) on Jun 01, 2001 at 19:52 UTC

    Neat, but rand might give the same result several times (as has already been pointed out), and you'll loose some lines, although that's unlikely. Here's how I'd do it anyway:

    print sort { sprintf("%u",rand(2)) || -1 } <>;

    Explanations: rand(2) yields a float between 0 and 2 (2 excluded), sprintf returns the integer part, i.e. 0 or 1. If it's 0, take -1 instead. So the expression in curlies randomly returns 1 or -1, which causes the lines from <> to be randomly exchanged.

    My € 0.02

    btw, I'd be delight if someone could show me a better/shorter way than sprintf "%u" of getting the integer part of a float.

    Update: of course, int (thanks mycocom)! I think I tried it once, but I must've messed up something, and I've been convinced that it didn't work ever since :-(

      Perhaps int(rand(2)) would do what you're looking for...

Re: randomiseLines
by bikeNomad (Priest) on Jun 02, 2001 at 20:26 UTC
    This version runs about 20% faster, probably since it slurps the file in all at once. It also doesn't care about collisions.
    #!/usr/bin/perl -w use strict; my @lines = <>; my @order = sort { $a->[0] < $b->[0] } map { [ rand($#lines), $_ ] } (0..$#lines); foreach my $line (@order) { print $lines[$line->[1]] }
Re: randomiseLines
by fx (Pilgrim) on Jun 02, 2001 at 15:47 UTC

    It is true that key collision is possible. I needed to knock something together quickly (hence this is v0.1 :) ) and seeing that:

    perl -e 'print rand, "\n"'; 0.315919870045036
    and that I only had 9000 lines in my file, it was thought the probability of key collision was small.

    I actually tested it on the file about 10 times and saw that the number of lines in the randomised file was equal to the original file.

    No doubt I will modify this one day when I feel like it...

    Thanks to all those who replied,

    == fx, Infinity is Colourless