Re:{4} how do I line-wrap while copying to stdout?
by jeroenes (Priest) on Apr 20, 2001 at 17:22 UTC
|
use Benchmark;
undef $/;
open DATA, "/home/jeroen/texs/review/reviewnew.tex" or die $!; #just s
+ome lengthy manuscript
$str = <DATA>;
open DUMP, ">/dev/null";
timethese( -1, {'regex' =>
sub {
$a = $str;
print DUMP map "$_\n", $a=~/\G(.{1,80})/gs;
},
'substr' =>
sub {
$a = $str; $b='';
$b .= substr( $a, 0, 80, '')."\n" while length($a) >80;
print DUMP "$b$a";
}
});
#givesBenchmark: running regex, substr, each for at least 1 CPU second
+s...
regex: 1 wallclock secs ( 1.05 usr + 0.01 sys = 1.06 CPU) @ 28
+7.74/s (n=305)
substr: 1 wallclock secs ( 1.26 usr + 0.01 sys = 1.27 CPU) @ 55
+6.69/s (n=707)
#This changes a bit when leaving the map out:
Benchmark: running regex, substr, each for at least 1 CPU seconds...
regex: 2 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 39
+6.23/s (n=420)
substr: 2 wallclock secs ( 1.04 usr + 0.02 sys = 1.06 CPU) @ 54
+7.17/s (n=580)
#I left the print out@substr at first (didn't want to test
# print), but putting it back in gives:
Benchmark: running regex, substr, each for at least 1 CPU seconds...
regex: 1 wallclock secs ( 1.30 usr + 0.01 sys = 1.31 CPU) @ 36
+6.41/s (n=480)
substr: 1 wallclock secs ( 1.06 usr + 0.02 sys = 1.08 CPU) @ 47
+3.15/s (n=511)
#I added a better comparison:
Rate regmap regex fixreg substr
regmap 285/s -- -19% -22% -56%
regex 350/s 23% -- -4% -46%
fixreg 364/s 28% 4% -- -44%
substr 650/s 128% 86% 78% --
Apparently, the substr is just too efficient compared to regex.
End the print is only a bit inefficient compared to storing stuff
in memory.
Jeroen
"We are not alone"(FZ) | [reply] [d/l] |
|
|
| [reply] |
|
|
By the undef $/, the file is read in as a huge string.
Of course I checked, and with a n=1000 string I get:
Rate regex substr
regex 21775/s -- -27%
substr 29748/s 37% --
Which makes sense, as my file has some 50k chars.
Jeroen
"We are not alone"(FZ)
Update: (I'm not going to make a
Re:{9} post)
I'd say, start checking the source {grin}
$str='a'x1E6;
=>
Rate regex substr
regex 15.1/s -- -16%
substr 18.0/s 19% --
$str='a'x1E7;
=>
Rate regex substr
regex 1.41/s -- -21%
substr 1.79/s 27% --
At 100M, I'm testing my swap ;-)..... I tried it nevertheless, but now I'm waiting for my box
stop swapping... /me is afraid that may take a while... :-)...
finally, I had to use that reboot button :-<
25M still went OK:
s/iter regex substr
regex 1.80 -- -21%
substr 1.42 27% --
at 50M, benchmark produced a division by zero .....
| [reply] [d/l] [select] |
|
|
|
|
|
|
Too efficient? Execution efficiency, programming efficiency, or maintenance efficiency? There are many types. Using a regex may not always be as fast (in many cases it is faster -- try using index and substr to find word space), but in most instances they are more maintainable and more readable and faster to write.
use Benchmark;
my $n = 1000;
open(STDERR,">/dev/null");
cmpthese (1000,
{
match => sub { local $_ = "abcde " x $n;
print STDERR "$1\n" while /\G(.{1,80})/gs
+;
},
swap => sub { local $_ = "abcde " x $n;
s/\G(.{1,80})/$1\n/gs;
print STDERR $_;
},
subst => sub { local $a = "abcde " x $n;
$b='';
$b .= substr( $a, 0, 80, '')."\n" while l
+ength($a) >80;
print STDERR "$b$a";
},
});
Produces
Benchmark: timing 1000 iterations of match, subst, swap...
match: 1 wallclock secs ( 0.97 usr + 0.00 sys = 0.97 CPU) @ 10
+30.93/s (n=1000)
subst: 1 wallclock secs ( 0.58 usr + 0.00 sys = 0.58 CPU) @ 17
+24.14/s (n=1000)
swap: 1 wallclock secs ( 0.75 usr + 0.00 sys = 0.75 CPU) @ 13
+33.33/s (n=1000)
Rate match swap subst
match 1031/s -- -23% -40%
swap 1333/s 29% -- -23%
subst 1724/s 67% 29% --
A substr method is faster this time, but if it gets any more complex than that a regex will do just fine. If done once per script will you notice the difference between 1700 per second and 1300 per second? Maybe.
| [reply] [d/l] [select] |
|
|
Rate inssub regmap regex fixreg substr
inssub 13.7/s -- -95% -97% -97% -97%
regmap 260/s 1800% -- -34% -35% -47%
regex 395/s 2784% 52% -- -1% -20%
fixreg 398/s 2809% 53% 1% -- -19%
substr 491/s 3483% 89% 24% 23% --
That insert must be *really* inefficient :-)Jeroen
"We are not alone"(FZ)
Let me add the new code:
use Benchmark;
undef $/;
open DATA, "/home/jeroen/texs/review/reviewnew.tex" or die $!;
$str = <DATA>;
open DUMP, ">/dev/null";
$result = timethese( -5, {
'regex' =>
sub {
$a = $str;
$b = '';
$b .= "$1\n" while $a=~/\G(.{1,80})/gs;
print DUMP "$b";
},
'regmap' =>
sub {
$a = $str;
$b = '';
print DUMP map "$_\n", $a=~/\G(.{1,80})/gs;
},
'fixreg'=>
sub {
$a = $str;
$b = '';
$b .= "$1\n" while $a=~/\G(.{1,80})/gos;
print DUMP "$b";
},
'substr' =>
sub {
$a = $str;
$b='';
$b .= substr( $a, 0, 80, '')."\n" while length($a) >80;
print DUMP "$b$a";
},
'inssub' =>
sub {
$a = $str;
$idx = 0;
substr( $a, $idx+=81, 0)="\n" while $idx< (length( $a) - 80 );
print DUMP "$a";
}
}, 'none');
Benchmark::cmpthese($result);
| [reply] [d/l] [select] |
|
|
|
|
Just to make a little more trouble -- what about:
local $,=$\;
and either
print /(.{1,80})/g
or
print grep /./, split /(.{1,80})/
p
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
|
|
|
|
abcdefghijklm
nopqrstuvwxyz
The proper result is:
abcdefghij
klm
nopqrstuvw
xyz
However, the solutions in the Benchmark will produce:
abcdefghij
klm
nopqrs
tuvwxyz
The regex solution is easy to fix:
print DUMP map "$_\n", $a=~/(.{1,80})/g;
The substr() solution requires more work to get right, such as splitting on newlines and wrapping each line separately, or sticking any partial lines back onto to the beginning of the string after each substr(). | [reply] [d/l] [select] |
|
|
| [reply] |
Re: Re: Re: Re: how do I line-wrap while copying to stdout?
by Rhandom (Curate) on Apr 20, 2001 at 17:46 UTC
|
As above (in merlyn's code) but with a swap
s/\G(.{1,80})/$1\n/gs;
print;
Maybe I should benchmark that. This doesn't have the advantage of not affecting long strings and it does put multiple lines into one variable, but that might be OK.
Couldn't help but benchmark this thing...
use Benchmark qw(cmpthese);
open(STDERR,">/dev/null");
cmpthese (10000,
{
match => sub { local $_ = "abcde " x 100;
print STDERR "$1\n" while /\G(.{1,80})/gs
+;
},
swap => sub { local $_ = "abcde " x 100;
s/\G(.{1,80})/$1\n/gs;
print STDERR $_;
},
});
Produces
Benchmark: timing 10000 iterations of match, swap...
match: 1 wallclock secs ( 1.26 usr + 0.01 sys = 1.27 CPU) @ 78
+74.02/s (n=10000)
swap: 1 wallclock secs ( 1.01 usr + 0.01 sys = 1.02 CPU) @ 98
+03.92/s (n=10000)
Rate match swap
match 7874/s -- -20%
swap 9804/s 25% --
So the swap will save you time (if you are nitpicky about speed). | [reply] [d/l] [select] |