adiuva has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to generate a file out of random data. I thought the best way would be to use rand() to get some randomness. But it looks like rand() is ridiculously slow compared to just write the same character into the file.
Let's have a look at the code first:
use strict; use IO::Handle; if ($#ARGV < 1) { die("usage: <size_in_bytes> <file_name>\n"); } open(my $gen_file_h,">" . $ARGV[0]) or die "Can't open file for writin +g\n"; $gen_file_h->autoflush(1); superfast(); close($gen_file_h); die(); sub superfast { #each time though the loop should be 1 meg for (1 .. $ARGV[1]) { #print 1 meg of Zs print {$gen_file_h} "Z" x (1024*1024) } } sub generateRandomFile { my $final_size = $ARGV[1]; for (my $mbytes = 0; $mbytes < $final_size; $mbytes ++) { my $string=""; my @chars = split(" ","a b c d e f g h i j k l m n o p q r + s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z +- _ ! & ? = 0 1 2 3 4 5 6 7 8 9"); for (1..1048576) { my $rand = int(rand(68)); $string .= $chars[$rand]; } print $gen_file_h $string; } }
So the sub superfast() is just writing the character "Z" into the file and this is fast like hell. I could fill my SSD within seconds completely.
The sub generateRandomFile() is really slow. It takes several seconds to create for example a 10MB file.
Do you have any suggestion how I could speed up things?

BTW: The files don't need "secure" randomness, I just need files with a specific size and I'm using ActiveState Perl 5.6.1 635 (and I'm bound to this version) on Windows 7

Thanks in advance,
Stefan

Replies are listed 'Best First'.
Re: Is rand() really that slow or am I just using it wrong?
by BrowserUk (Patriarch) on Aug 08, 2013 at 18:27 UTC

    Most of your time is being spent building an 8-char string from 8 random bytes, then copying that to a 16-byte string and adding 8 more random bytes; then copying that to a 32-byte string and adding 16 more random bytes; then copying that to a 64-byte string ...

    Try this version:

    #! perl -slw use strict; my @chars = ('a'..'z', 'A'..'Z',0..9, qw[- _ ! & ? = ] ); sub genFile { my( $fh, $mb ) = @_; my $buf = chr(0) x 1024**2; for( 1 .. $mb ) { substr $buf, $_, 1, $chars[ rand @chars ] for 0 .. 1024**2-1; print $fh $buf; } }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks for the fast reply! But even though your code looks a lot nicer than mine it's not faster. :(
      If you generate a 10MB file you can still see 1MB chunks being written to the harddisk, while using the sub superfast() the file is there instantly.
        it's not faster

        It *is* faster than your method.

        Not as fast as writing 10MB of a single character for sure, but then you are calling a function, rand 10 million times; it has to be slower.

        The only question is; how much faster do you need it to be?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is rand() really that slow or am I just using it wrong?
by mtmcc (Hermit) on Aug 08, 2013 at 18:59 UTC
    rand() slows down pretty quickly on windows alright. I found this node helpful in figuring out the alternative options: Random numbers are not random enough on Windows.

    Math::Random::MT on strawberry perl solved my problem, and I think it's available for ActiveState also.

    Having said that, my money would say that Browser's code is better, I've found, as a general rule...

    EDIT: Math::Random::MT isn't faster than rand(), see Browser's comment below...

      Math::Random::MT is over an order of magnitude slower than the default rand function. Far superior, but much slower:

      require Math::Random::MT;; cmpthese -1,{a=>q[my $a; $a=rand() for 1..1e6],b=>q[my $a; $a=Math::Ra +ndom::MT::rand() for 1..1e6] };; (warning: too few iterations for a reliable count) s/iter b a b 1.58 -- -93% a 0.105 1407% --

      And for picking 1 of 68 for random test data; the built-in is perfectly adequate.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Fair point!

        My confusion came from a recent experience of using rand() to generate 50,000 unique random numbers over a range of 1 to ~3,000,000,000. It worked fine on osx and linux, but not when I tested it on windows xp, where it struggled to ~32000 unique numbers and gave up. I can see though that if the numbers don't need to be unique and are within a small range, this wouldn't be a problem.

        I still don't quite understand why this was only an issue on windows - will have to go and google it a bit more.

        In any case, thanks for the correction!

Re: Is rand() really that slow or am I just using it wrong?
by AnomalousMonk (Archbishop) on Aug 09, 2013 at 02:26 UTC
    ... [I] don't need "secure" randomness, I just need files with a specific size ...

    The general tone of your posts and the fact that you were comfortable with the 3115-bit built-in rand in the first place leads one to imagine that you might be satisfied with data having the first-glance appearance of randomness. The following is quite fast: well under a second by an eyeball timing on my laptop. I didn't try to fill up my HD (I don't have a SSD), but even so...

    >perl -wMstrict -e "use List::Util qw(shuffle); ;; my @chars = ('a'..'z', 'A'..'Z', '0'..'9', qw[- _ ! & ? =]); @chars = (@chars) x 100; my $jumble = join '', shuffle @chars; $jumble .= reverse $jumble; ;; use constant MAX => 10 * 1024 * 1024; ;; my $j_len = length $jumble; die 'wrong assumptions' unless $j_len < MAX; ;; open my $fh, '>', 'faux-rand' or die qq{opening: $!}; print $fh $jumble or die qq{writing: $!} for 1 .. int(MAX / $j_len); my $shortfall = MAX % $j_len; print $fh substr($jumble, 0, $shortfall) or die qq{writing: $!}; close $fh or die qq{closing: $!}; "