in reply to Re^6: Speed Improvement (insignificant)
in thread Speed Improvement
I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").
The lines being modified are obviously some kind of templating system. For what? We can only guess, but perhaps some kind of scientific Monte Carlo simulation runs... At some point, the templated random values have to be populated with actual numbers.
To that end, I generated 100 files averaging 500k using this code:
#! perl -slw use strict; use List::Util qw[ reduce ]; sub uniq{ my %x; @x{@_} = (); keys %x } for my $filenum ( '00' .. '99' ) { open O, '>', "templ$filenum.txt" or die $!; for my $line ( 1 .. int( rand 10000 ) ) { my $line = 'X' x 100; for my $pos ( sort{$b <=> $a} uniq( map int( rand 100 ), 0 .. +int( rand 10 ) ) ) { substr $line, $pos, 0, '{\d' . int( rand 9 ) . '}'; } print O $line; } close O; }
Not exotic, but 'good enough'.
I then used the following code to slurp the files in turn, do the substitutions and then write the modified data to new files in another directory. Once using Nar's OP code (tweaked to work) and once using my posted version:
#! perl -slw use strict; use Time::HiRes qw[ time ]; sub nar_substitute { my @numeric_chars = ( 0 .. 9 ); my $message = shift; my @numeric_matches = ($message =~ m/\{\\d\d+\}/g); foreach (@numeric_matches) { my $substitution = $_; my ($substitution_count) = ($substitution =~ m/(\d+)/); my $number = ''; for (1..$substitution_count) { $number .= $numeric_chars[int rand @numeric_chars];; } $message =~ s[\Q$substitution][$number]e; } return $message; } sub buk_substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s } our $O //= 0; $|++; my $start = time; for my $fname ( glob 'templ*.txt' ) { printf "\rProcessing $fname"; my $file = do{ local( @ARGV, $/ ) = $fname; <> }; $file = $O ? buk_substitute( $file ) : nar_substitute( $file ); open O, '>', "modified/$fname" or die $!; print O $file; close O; } printf "\n\nTook %.6f secs\n", time() - $start;
[16:39:37.79] C:\test\junk>..\junk63 Processing templ99.txt Took 2064.037575 secs [17:16:13.24] C:\test\junk>del modified\* C:\test\junk\modified\*, Are you sure (Y/N)? y [17:25:20.32] C:\test\junk>..\junk63 -O=1 Processing templ99.txt Took 6.626883 secs [17:25:35.03] C:\test\junk>
Maybe this needs doing once per run of the simulation. Maybe once a day; maybe hundreds. Maybe 100 files is overkill; maybe it requires thousands of files. Maybe 500k average is oversized; maybe they a in the GB range. I don't know...and neither do you.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^8: Speed Improvement (Is 311 times faster: "insignificant"?)
by tye (Sage) on Dec 03, 2014 at 20:00 UTC | |
by BrowserUk (Patriarch) on Dec 03, 2014 at 20:43 UTC |