in reply to Re: Converting multiple spaces to nbsp
in thread Converting multiple spaces to nbsp

Wow, that was quick!

Two points:

PS I thought executed regexs are experimental (so says the man page) - is there a problem with them?

  • Comment on Re^2: Converting multiple spaces to nbsp

Replies are listed 'Best First'.
Re^3: Converting multiple spaces to nbsp
by ikegami (Patriarch) on Jun 17, 2005 at 05:08 UTC
    I only want space, not tabs or new lines

    Then yes, substitute "\s" with "" or "\040". Keep in mind that HTML doesn't know the difference between spaces, tabs and newlines.

    Is there a difference between inkgmi's and GrandFather's entry?

    I think mine is a teeny bit faster. (One less character to add to $1, one less operation in building the replacement string, one less character to replace.)

    I thought executed regexs are experimental

    Just
    (?{ code }),
    (??{ code }),
    (?>pattern) and
    (?(condition)yes-pattern|no-pattern),
    none of which were used here. /e has been around for quite some time and is reliable.

      I thought executed regexs are experimental
      /e doesn't produce an executed regex! Instead, it tells perl that the substitution part is to be parsed and treated and executed as perl code. Furthermore, there's no eval taking place, the code is parsed and compiled at compile time.

      Note: LSH = regex, RHS = substite

      All those features that ikegami lists as experimental, are to be used in the regex part. But /e isn't.

      I think you are right about the execution speed. It would be interesting to benchmark.

      I would like to think that my solution is slightly easier to understand, but I consider that I am writing toddler Perl (up from baby Perl) and still have a lot to learn. And I have learned from your answer, thank you.


      Perl is Huffman encoded by design.
        I thought it was a little easier on the eyes, too. I'm surprised the look behind is faster.
      well it appears it's the slower of the two...
      #!/usr/bin/perl use Benchmark qw(cmpthese); my $test_text = q|Wow, that was quick!<br/> Two points:<br/> 1) I only want space, not tabs or new lines - so shouldn't the \s be + replaced with " "? <br/> 2) Is there a difference between inkgmi's and GrandFather's entry? <b +r/> PS I thought executed regexs are experimental (so says the man page) - + is there a problem with them?<br/> |; my $working_var; my $count = 1000000; cmpthese($count, { 'grandfather' => sub {$working_var = $test_text; $working_var =~ s +/ ( +)/" " . ("&nbsp;" x length ($1))/ge}, 'ikegami' => sub {$working_var = $test_text; $working_var =~ s +/(?<= )( )/'&nbsp;' x length($1)/eg} }); __OUTPUT__ Benchmark: timing 1000000 iterations of grandfather, ikegami... grandfather: 31 wallclock secs (31.01 usr + 0.00 sys = 31.01 CPU) @ 3 +2252.86/s (n=1000000) ikegami: 52 wallclock secs (51.26 usr + 0.00 sys = 51.26 CPU) @ 19 +506.87/s (n=1000000) Rate ikegami grandfather ikegami 19507/s -- -40% grandfather 32253/s 65% --
      Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable to more people.
      ---
      my name's not Keith, and I'm not reasonable.
            'ikegami'     => sub {$working_var = $test_text; $working_var =~ s/(?<= )( )/'&nbsp;' x length($1)/eg}

        That code is wrong: there's a + missing from after the 2nd space, which means that $1 always has a length of one! When benchmarking code, first check that each of your variants yield the same answer as each other before timing them.

        However, in this particular case it doesn't seem to make much difference to the timings.

        Personlly I'd go with GrandFather's solution even if it were the slower, on the grounds I think it'd be more readable&nbps;...

        Personally I'd go with Ikegami's variant over GrandFather's, even though it is slower, because I think Ikegami's is more readable*! GrandFather's variant involves matching something that you don't intend replacing, then sticking it back in the substitution, which is a little messy. By using the lookbehind assertion Ikegami's way clearly documents that you wish to perform the substitution just after a space, but that the space itself isn't going to be replaced.

        ... to more people.

        That's probably true, in the sense that the people who know the lookbehind assertion are a subset of those who know about regexps. But I think I should write my production Perl code for a target audience of people who do know Perl, and not worry that people who aren't Perl coders might not understand it: I'm employed to write Perl programs, in Perl, and I don't think it'd be reasonable of my employer to expect a Java programmer to understand them unaided.

        (In the same way, when writing documentation in English I want to be able to choose the best way of saying what I want to say in English, rather than intentionally writing it more sloppily on the grounds that when I write it precisely and accurately I may be using words that are unfamiliar to those who don't speak English: I'm employed to write English documentation, in English (in England, for other English people to read), and I don't think it'd be reasonable of my employer to expect a Brazillian to understand it unaided.)

        * Actually, I'd probably go with my own variant (see above), which happens to be faster than either of these.

        Smylers

Re^3: Converting multiple spaces to nbsp
by davido (Cardinal) on Jun 17, 2005 at 04:50 UTC

    The /e modifier (evaluation) is not experimental. You're probably thinking of (?{...}) and (??{....}) which are considered experimental. Actually, the've proven to be fairly stable over the last few Perl releases, other than having a few kinks worked out.


    Dave

Re^3: Converting multiple spaces to nbsp
by GrandFather (Saint) on Jun 17, 2005 at 04:46 UTC

    Yes, substitute spaces for the \s in either version.

    GrandFather's solution comes with an example :-)


    Perl is Huffman encoded by design.