Q: is there a processor saving in using substr rather than a regex?
In general yes, but not if you have to combine the substr with a regex (as MZSanford does here).
The benchmark shows that the pure regex approach suggested by tye is quickest for your problem, closely followed by japhy's version using fancier regex constructs. MZSanford's substr/substitute is slow (and a bit buggy, fixed that below :) because it tries to start the match at every interior whitespace. But you can improve on it:
($chunk) = substr($string,0,201) =~ /(.*)\s+\w*$/',
Here are the results of the benchmark:
Benchmark: running Hofmator, MZSanford, japhy, tye, each for at least
+3 CPU seconds...
Hofmator: 3 wallclock secs ( 2.99 usr + 0.01 sys = 3.00 CPU) @ 20
+6100.67/s (n=618302)
MZSanford: 4 wallclock secs ( 3.03 usr + 0.00 sys = 3.03 CPU) @ 55
+936.63/s (n=169488)
japhy: 4 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 25
+6036.67/s (n=768110)
tye: 4 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 29
+2146.67/s (n=876440)
generated by this code:
#!/usr/bin/perl
use Benchmark qw/timethese/;
$string = q/Some text repeated / x 50;
timethese(-3, {
MZSanford => '$chunk = substr($string,0,201);$chunk =~ s/\s+\w*$//
+',
Hofmator => '($chunk) = substr($string,0,201) =~ /(.*)\s+\w*$/',
japhy => '($chunk) = $string =~ /^(.{1,200})(?<!\s)(?!\w)/;',
tye => '($chunk) = $string =~ /^(.{0,199}\S)\s/',
});
-- Hofmator |