I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").

The lines being modified are obviously some kind of templating system. For what? We can only guess, but perhaps some kind of scientific Monte Carlo simulation runs... At some point, the templated random values have to be populated with actual numbers.

To that end, I generated 100 files averaging 500k using this code:

Not exotic, but 'good enough'.

I then used the following code to slurp the files in turn, do the substitutions and then write the modified data to new files in another directory. Once using Nar's OP code (tweaked to work) and once using my posted version:

#! perl -slw use strict; use Time::HiRes qw[ time ]; sub nar_substitute { my @numeric_chars = ( 0 .. 9 ); my $message = shift; my @numeric_matches = ($message =~ m/\{\\d\d+\}/g); foreach (@numeric_matches) { my $substitution = $_; my ($substitution_count) = ($substitution =~ m/(\d+)/); my $number = ''; for (1..$substitution_count) { $number .= $numeric_chars[int rand @numeric_chars];; } $message =~ s[\Q$substitution][$number]e; } return $message; } sub buk_substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s } our $O //= 0; $|++; my $start = time; for my $fname ( glob 'templ*.txt' ) { printf "\rProcessing $fname"; my $file = do{ local( @ARGV, $/ ) = $fname; <> }; $file = $O ? buk_substitute( $file ) : nar_substitute( $file ); open O, '>', "modified/$fname" or die $!; print O $file; close O; } printf "\n\nTook %.6f secs\n", time() - $start;
[16:39:37.79] C:\test\junk>..\junk63 Processing templ99.txt Took 2064.037575 secs [17:16:13.24] C:\test\junk>del modified\* C:\test\junk\modified\*, Are you sure (Y/N)? y [17:25:20.32] C:\test\junk>..\junk63 -O=1 Processing templ99.txt Took 6.626883 secs [17:25:35.03] C:\test\junk>

100 - ( 6.626883 / 2064.037575 * 100 ) = 99.7% saving or 311 times faster!

Worth the effort I would say.

Maybe this needs doing once per run of the simulation. Maybe once a day; maybe hundreds. Maybe 100 files is overkill; maybe it requires thousands of files. Maybe 500k average is oversized; maybe they a in the GB range. I don't know...and neither do you.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^7: Speed Improvement (Is 311 times faster: "insignificant"?) by BrowserUk
in thread Speed Improvement by Nar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.