Re^3: Converting multiple spaces to nbsp

Replies are listed 'Best First'.
Re^4: Converting multiple spaces to nbsp by GrandFather (Saint) on Jun 17, 2005 at 05:43 UTC
I think you are right about the execution speed. It would be interesting to benchmark. I would like to think that my solution is slightly easier to understand, but I consider that I am writing toddler Perl (up from baby Perl) and still have a lot to learn. And I have learned from your answer, thank you. Perl is Huffman encoded by design.	[reply]
Re^5: Converting multiple spaces to nbsp by Cap'n Steve (Friar) on Jun 17, 2005 at 07:20 UTC
I thought it was a little easier on the eyes, too. I'm surprised the look behind is faster.	[reply]
Re^6: Converting multiple spaces to nbsp by GrandFather (Saint) on Jun 17, 2005 at 07:48 UTC
We don't know that yet, but I'll play with benchmarking it this weekend. My guess is that the look behind translates into a "find a space, slurp any more spaces, do the replace". My version does similar work in the search phase, but does more work in the replace phase. Perl is Huffman encoded by design.	[reply]
Re^4: Converting multiple spaces to nbsp by bart (Canon) on Jun 17, 2005 at 08:43 UTC
I thought executed regexs are experimental /e doesn't produce an executed regex! Instead, it tells perl that the substitution part is to be parsed and treated and executed as perl code. Furthermore, there's no eval taking place, the code is parsed and compiled at compile time. Note: LSH = regex, RHS = substite All those features that ikegami lists as experimental, are to be used in the regex part. But `/e` isn't.	[reply]
Re^4: Converting multiple spaces to nbsp by reasonablekeith (Deacon) on Jun 17, 2005 at 08:06 UTC
well it appears it's the slower of the two... #!/usr/bin/perl use Benchmark qw(cmpthese); my $test_text = q\|Wow, that was quick!<br/> Two points:<br/> 1) I only want space, not tabs or new lines - so shouldn't the \s be + replaced with " "? <br/> 2) Is there a difference between inkgmi's and GrandFather's entry? <b +r/> PS I thought executed regexs are experimental (so says the man page) - + is there a problem with them?<br/> \|; my $working_var; my $count = 1000000; cmpthese($count, { 'grandfather' => sub {$working_var = $test_text; $working_var =~ s +/ ( +)/" " . (" " x length ($1))/ge}, 'ikegami' => sub {$working_var = $test_text; $working_var =~ s +/(?<= )( )/' ' x length($1)/eg} }); __OUTPUT__ Benchmark: timing 1000000 iterations of grandfather, ikegami... grandfather: 31 wallclock secs (31.01 usr + 0.00 sys = 31.01 CPU) @ 3 +2252.86/s (n=1000000) ikegami: 52 wallclock secs (51.26 usr + 0.00 sys = 51.26 CPU) @ 19 +506.87/s (n=1000000) Rate ikegami grandfather ikegami 19507/s -- -40% grandfather 32253/s 65% -- [download] Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable to more people. --- my name's not Keith, and I'm not reasonable.	[reply] [d/l]
Re^5: Converting multiple spaces to nbsp by Smylers (Pilgrim) on Jun 17, 2005 at 14:11 UTC
`'ikegami' => sub {$working_var = $test_text; $working_var =~ s/(?<= )( )/' ' x length($1)/eg}` That code is wrong: there's a `+` missing from after the 2nd space, which means that `$1` always has a length of one! When benchmarking code, first check that each of your variants yield the same answer as each other before timing them. However, in this particular case it doesn't seem to make much difference to the timings. Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable&nbps;... Personally I'd go with Ikegami's variant over GrandFather's, even though it is slower, because I think Ikegami's is more readable^! GrandFather's variant involves matching something that you don't intend replacing, then sticking it back in the substitution, which is a little messy. By using the lookbehind assertion Ikegami's way clearly documents that you wish to perform the substitution just after a space, but that the space itself isn't going to be replaced. ... to more people. That's probably true, in the sense that the people who know the lookbehind assertion are a subset of those who know about regexps. But I think I should write my production Perl code for a target audience of people who do know Perl, and not worry that people who aren't Perl coders might not understand it: I'm employed to write Perl programs, in Perl, and I don't think it'd be reasonable of my employer to expect a Java programmer to understand them unaided. (In the same way, when writing documentation in English I want to be able to choose the best way of saying what I want to say in English, rather than intentionally writing it more sloppily on the grounds that when I write it precisely and accurately I may be using words that are unfamiliar to those who don't speak English: I'm employed to write English documentation, in English (in England, for other English people to read), and I don't think it'd be reasonable of my employer to expect a Brazillian to understand it unaided.) ^ Actually, I'd probably go with my own variant (see above), which happens to be faster than either of these. Smylers	[reply] [d/l]
Re^6: Converting multiple spaces to nbsp by ikegami (Patriarch) on Jun 17, 2005 at 15:48 UTC
Actually, I'd probably go with my own variant (see above), which happens to be faster than either of these. Yours isn't the fastest for me, although I've added `use warnings`, `use strict;` and forced a scalar context unto the substitution: Read more... (853 Bytes) `cmpthese(-3, { GrandFather => sub { local $_ = $test_text; scalar s/ ( +)/" " . (" " x length ($1))/ge }, ikegami => sub { local $_ = $test_text; scalar s/(?<= )( +)/' ' x length($1)/eg }, Smylers => sub { local $_ = $test_text; scalar s/(?<= )( )/ /g }, }); __END__ Rate ikegami Smylers GrandFather ikegami 22180/s -- -25% -31% Smylers 29772/s 34% -- -8% GrandFather 32268/s 45% 8% --` [download]	[reply] [d/l] [select]
Re^7: Converting multiple spaces to nbsp by Smylers (Pilgrim) on Jun 17, 2005 at 16:55 UTC
Re^6: Converting multiple spaces to nbsp by GrandFather (Saint) on Jun 17, 2005 at 23:29 UTC
Personally I'd go with Ikegami's variant ... Actually, I agree. Nice point well made (I did say I was still writing toddler code didn't I). Perl is Huffman encoded by design.	[reply]