bronto has asked for the wisdom of the Perl Monks concerning the following question:

Hello everybody

Days ago I posted a small read/write tester. It works, but performances are really poor, especially with large block sizes; that is obvious, because each time a write is needed, a new random block is created.

I would like to improve performance while retaining a significant amount of randomness in the data that gets written down on the disk.

I am working on a solution that caches an amount of blocks and replaces one every time that a read operation happens; this solution would involve $inperc and $initsize (see code); the first variable tells me how much data I need to write (100-$inperc percent of total operations), and the second tells me how much random data is generated prior to starting tests.

Of course, caching all the initial random records in memory is not feasible :-)

Any suggestions for a good caching algorithm?

Ciao!
--bronto

Update: I've just made a first "patch" that makes the code run four times faster (of course, the memory footprint is bigger).

I wanted to post a patch, but since it is 115 lines long, and the whole code is 138, I am putting the whole code here

Any advice would be apreciated. Thanks

#!/usr/bin/perl # $Id: dbcreate.pl,v 1.4 2003/11/26 11:37:02 bronto Exp $ use strict ; use warnings ; use Tie::File ; die "\nUsage: $0 NProc OutputBlockSize InputPerc InitSize OutputSize D +BFile\n\n" unless @ARGV == 6 ; use constant DEBUG => 1 ; my %suffix = ( K => 1024, M => 1024*1024, G => 1024*1024*1024 ) ; my $maxchild = shift ; die "Invalid number of processes" unless $maxchild >= 1 ; my $outblocksize = shift ; convert_to_bytes($outblocksize) ; die "Output block size should be > 0" unless $outblocksize >= 1 ; my $inperc = shift ; die "Input percentage too big" if $inperc > 100 ; die "Input percentage too small" if $inperc <= 0 ; my $initsize = shift ; convert_to_bytes($initsize) ; die "Initial file size should be > 0" unless $initsize >= 1 ; my $outsize = shift ; convert_to_bytes($outsize) ; die "Output size should be > 0" unless $outsize >= 1 ; my @db ; my $dbfile = shift ; my $initlines = sprintf "%.0f",($initsize/$outblocksize) ; my $cachelines = $initlines*(100-$inperc)/100 ; $cachelines = 100 if $cachelines > 100 ; my $c1 = $cachelines -1 ; tie @db, 'Tie::File', $dbfile or die "Cannot open file $dbfile: $!" ; my @cache ; { my @randomchars = ('A'..'Z','a'..'z','0'..'9') ; my $randindex = scalar @randomchars ; my $lines = $initlines ; my $l1 = $lines - 1 ; sub input { my $i = rand($l1) ; my $r = $db[$i] ; $cache[rand($c1)] = $r ; return "r" ; } sub output { my $i = sprintf "%.0f",rand($lines) ; $db[$i] = $cache[rand($c1)] ; if ($i > $lines) { $lines = $i ; $l1 = $i - 1 ; } return "w" ; } sub random_record { return join("", @randomchars[ map( rand($randindex), (1..$outblocksize)) ] ) ; } } print STDERR "Initializing file and cache..." ; @db = () ; for (my $i = 1 ; $i<= $initlines ; $i++) { my $r = random_record($outsize) ; push @db,$r or die "Cannot write record, aborting [$!]" ; push @cache,$r if $#cache <= $c1 ; } print STDERR "done\n" ; my $chunks = $outsize/$outblocksize ; my $chunksperchild = $chunks/$maxchild ; my %childpid ; while (keys(%childpid) < $maxchild) { my $pid = fork ; # The following instruction comes from node 237098 die "Cannot fork: $!" unless defined $pid ; if ($pid) { # Parent process $childpid{$pid} = '' ; } else { # Child process for (my $i = 0 ; $i <= $chunksperchild ; $i++) { my $random ; $random = rand(100) ; my $result = $random <= $inperc? input(): output() ; print STDERR "$result" if DEBUG ; #print STDERR "$random/$inperc: $result\n" ; } exit ; } } while (keys(%childpid) > 0) { my $dead = wait ; # This die added after merlyn's suggestion # If you want to check a waitpid solution, see the code # posted by merlyn and zentara die "Something weird happened while wait!" if $dead == -1 ; delete $childpid{$dead} ; } ; print STDERR "\n" if DEBUG ; exit 0 ; sub convert_to_bytes { if ($_[0] =~ /^\d+(k|m|g)$/i) { my $factor = chop $_[0] ; $_[0] *= $suffix{uc($factor)} ; } return $_[0] ; }

The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
--John M. Dlugosz