Re^4: Converting multiple spaces to nbsp
by GrandFather (Saint) on Jun 17, 2005 at 05:43 UTC
|
I think you are right about the execution speed. It would be interesting to benchmark.
I would like to think that my solution is slightly easier to understand, but I consider that I am writing toddler Perl (up from baby Perl) and still have a lot to learn. And I have learned from your answer, thank you.
Perl is Huffman encoded by design.
| [reply] |
|
|
I thought it was a little easier on the eyes, too. I'm surprised the look behind is faster.
| [reply] |
|
|
We don't know that yet, but I'll play with benchmarking it this weekend.
My guess is that the look behind translates into a "find a space, slurp any more spaces, do the replace". My version does similar work in the search phase, but does more work in the replace phase.
Perl is Huffman encoded by design.
| [reply] |
Re^4: Converting multiple spaces to nbsp
by bart (Canon) on Jun 17, 2005 at 08:43 UTC
|
I thought executed regexs are experimental
/e doesn't produce an executed regex! Instead, it tells perl that the substitution part is to be parsed and treated and executed as perl code. Furthermore, there's no eval taking place, the code is parsed and compiled at compile time.
Note: LSH = regex, RHS = substite
All those features that ikegami lists as experimental, are to be used in the regex part. But /e isn't.
| [reply] |
Re^4: Converting multiple spaces to nbsp
by reasonablekeith (Deacon) on Jun 17, 2005 at 08:06 UTC
|
well it appears it's the slower of the two...
#!/usr/bin/perl
use Benchmark qw(cmpthese);
my $test_text = q|Wow, that was quick!<br/>
Two points:<br/>
1) I only want space, not tabs or new lines - so shouldn't the \s be
+ replaced with " "? <br/>
2) Is there a difference between inkgmi's and GrandFather's entry? <b
+r/>
PS I thought executed regexs are experimental (so says the man page) -
+ is there a problem with them?<br/>
|;
my $working_var;
my $count = 1000000;
cmpthese($count, {
'grandfather' => sub {$working_var = $test_text; $working_var =~ s
+/ ( +)/" " . (" " x length ($1))/ge},
'ikegami' => sub {$working_var = $test_text; $working_var =~ s
+/(?<= )( )/' ' x length($1)/eg}
});
__OUTPUT__
Benchmark: timing 1000000 iterations of grandfather, ikegami...
grandfather: 31 wallclock secs (31.01 usr + 0.00 sys = 31.01 CPU) @ 3
+2252.86/s (n=1000000)
ikegami: 52 wallclock secs (51.26 usr + 0.00 sys = 51.26 CPU) @ 19
+506.87/s (n=1000000)
Rate ikegami grandfather
ikegami 19507/s -- -40%
grandfather 32253/s 65% --
Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable to more people.
---
my name's not Keith, and I'm not reasonable.
| [reply] [d/l] |
|
|
'ikegami' => sub {$working_var = $test_text; $working_var =~ s/(?<= )( )/' ' x length($1)/eg}
That code is wrong: there's a + missing from after the 2nd space, which means that $1 always has a length of one! When benchmarking code, first check that each of your variants yield the same answer as each other before timing them.
However, in this particular case it doesn't seem to make much difference to the timings.
Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable&nbps;...
Personally I'd go with Ikegami's variant over GrandFather's, even though it is slower, because I think Ikegami's is more readable*! GrandFather's variant involves matching something that you don't intend replacing, then sticking it back in the substitution, which is a little messy. By using the lookbehind assertion Ikegami's way clearly documents that you wish to perform the substitution just after a space, but that the space itself isn't going to be replaced.
... to more people.
That's probably true, in the sense that the people who know the lookbehind assertion are a subset of those who know about regexps. But I think I should write my production Perl code for a target audience of people who do know Perl, and not worry that people who aren't Perl coders might not understand it: I'm employed to write Perl programs, in Perl, and I don't think it'd be reasonable of my employer to expect a Java programmer to understand them unaided.
(In the same way, when writing documentation in English I want to be able to choose the best way of saying what I want to say in English, rather than intentionally writing it more sloppily on the grounds that when I write it precisely and accurately I may be using words that are unfamiliar to those who don't speak English: I'm employed to write English documentation, in English (in England, for other English people to read), and I don't think it'd be reasonable of my employer to expect a Brazillian to understand it unaided.)
* Actually, I'd probably go with my own variant (see above), which happens to be faster than either of these.
Smylers
| [reply] [d/l] |
|
|
cmpthese(-3, {
GrandFather => sub {
local $_ = $test_text;
scalar s/ ( +)/" " . (" " x length ($1))/ge
},
ikegami => sub {
local $_ = $test_text;
scalar s/(?<= )( +)/' ' x length($1)/eg
},
Smylers => sub {
local $_ = $test_text;
scalar s/(?<= )( )/ /g
},
});
__END__
Rate ikegami Smylers GrandFather
ikegami 22180/s -- -25% -31%
Smylers 29772/s 34% -- -8%
GrandFather 32268/s 45% 8% --
| [reply] [d/l] [select] |
|
|
|
|
| [reply] |