http://qs1969.pair.com?node_id=11121800


in reply to Re^6: What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff
in thread What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff

No elegant alternative. As I showed in the benchmark, this works fine:

sub splt{split" ",reverse((split" ",(reverse$x),1)[0]),1;};

To be honest, i all the years that I "do" Perl (since perl-4.016), I have never seen anyone using split to trim leading (or trailing) whitespace.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^8: What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff
by Fletch (Bishop) on Sep 15, 2020 at 15:51 UTC

    Had a reeeeeaaaaly vague recollection and the Perl Cookbook (Ch 1 section 19) does actually offer $_ = join(' ', split(' ')); as an alternative to these three substitutions:

    s/^\s+//; s/\s+$//; s/\s+/ /g;

    to strip and canonicalize to single spaces; and does offer this trim sub:

    sub trim { my @out = @_ ? @_ : $_; $_ = join(' ', split(' ')) for @out; return wantarray ? @out : "@out"; }

    Edit: That being said, I don't recall having seen this construct in the wild otherwise and had the vaguest of hunches that PC mentioned anything like this so I'd hardly call it a "common idiom" either.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      How often do you need to compress internal whitespace? Anyway. it is faster:

      $ perl -MBenchmark=cmpthese -wE'my$x=join" "=>"",("abc")x5,"";say"s +ourc: |$x|";sub trim{join" ",split" ",$x};sub rgx{$x=~s/^\s+//r=~s/\s ++$//r=~s/\s\s+/ /gr};say "split: |",trim(),"|";say"regex: |",rgx(),"| +";cmpthese(-2,{splt=>\&trim,rgx=>\&rgx})' sourc: | abc abc abc abc abc | split: |abc abc abc abc abc| regex: |abc abc abc abc abc| Rate rgx splt rgx 506423/s -- -71% splt 1767763/s 249% --

      Enjoy, Have FUN! H.Merijn

        I think I always want whitespace normalized. I can’t think of a case where it wouldn’t bother me, even if, as in HTML rendering defaults, it generally will not be apparent. Two spaces between words in a title or regular text is as much a typographical error as a misspelling.

        $ perl -MBenchmark=cmpthese -wE'my$x=join"    "=>"",("abc")x5,"";say"sourc: ... yadda yadda ...

        Sorry, but this overly wide code-line is breaking the formatting of most of the thread. (took a bit to find it too)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re^8: What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff
by likbez (Sexton) on Sep 16, 2020 at 05:02 UTC
    No elegant alternative. As I showed in the benchmark, this works fine:

    sub splt{split" ",reverse((split" ",(reverse$x),1)[0]),1;};
    The most elegant approach to this problem is the use of tr function. The implementation below beats regex three times and can be made faster by trivial extension of tr function mentioned in my prev post (something like option 'x' -- stop the translation on the first symbol outside the set1 and return this position), which can be used instead of more general function index for searching single characters in the string and can made like rindex to be able to search in reverse direction too.

    Looks like the solution for trim from the Cookbook mentioned by Fletch along with potentially deforming the string is slower then regex in my test(on my machine it took 3.56 sec real time).

    Here is the "tr based" algorithm for trim:

    time perl -e 'for (1..1000000) { $line=" aaa bbb ccc ddd eee fff "; $_=$line; $_=~tr/ /x/c; $start=index($_,'x'); $line=substr($line,$start,rindex($_,'x')-$start+1); }'
    
    real 0m1.112s
    user 0m1.076s
    sys 0m0.031s
    
    Can be made into a single statement making it slightly( ~7%) slower:
    $line=substr($line,($start=index($_=$line=~tr/ /x/cr,'x')),rindex($_,'x')-$start+1);
    
    # time perl -e 'for (1..1000000) { $line=" aaa bbb ccc ddd eee fff "; $line=substr($line,($start=index($_=$line=~tr/ /x/cr,'x')),rindex($_,'x')-$start+1); }' 
    real 0m1.189s
    user 0m1.154s
    sys 0m0.015s
    
    
      or
      $line=substr($line,index($_=$line=~tr/ /x/cr,'x')),rindex($_,'x')-leng +th($line)+1);

        So, restricting to solutions that do not compress internal spaces, and ignoring the fact that " " is identical to \s, I am surprised that split is faster than tr. Converted the one-liner to a script for readability):

        $ cat test.pl use 5.18.0; use warnings; use Benchmark qw( cmpthese ); my $x = join " " => "", ("abc") x 5, ""; sub splt { split " ", reverse ((split " ", (reverse $x), 1)[0]), 1; } sub rgx { $x =~ s/^\s+//r =~ s/\s+$//r; } sub trx { my $y = $x =~ y/ /x/cr; substr ($x, index ($y, "x"), rindex ($y, "x") - length ($x) + 1); } say "sourc: |$x|"; say "split: |", splt (), "|"; say "regex: |", rgx (), "|"; say "tr/x/: |", trx (), "|"; cmpthese (-2, { splt => \&splt, rgx => \&rgx, trx => \&trx });
        $ perl test.pl sourc: | abc abc abc abc abc | split: |abc abc abc abc abc| regex: |abc abc abc abc abc| tr/x/: |abc abc abc abc abc| Rate rgx trx splt rgx 1047602/s -- -59% -65% trx 2553722/s 144% -- -14% splt 2958598/s 182% 16% --

        So far, none of the presented alternatives to s{^\s+}{}r and friends appeal to me, even though twice as fast or more, I would definitely choose the regex one over the magic of the other two. YMMV.

        Here's the one that also squeezes internal spaces with no \s, and there I would seriously consider the join/split variant:

        $ cat test.pl use 5.18.0; use warnings; use Benchmark qw( cmpthese ); my $x = join " " => "", ("abc") x 5, ""; sub splt { join " " => split " " => $x; } sub rgx { $x =~ tr/ / /sr =~ s/^ //r =~ s/ $//r; } sub trx { my $y = $x =~ tr/ /x/cr; substr ($x, index ($y, "x"), rindex ($y, "x") - length ($x) + 1) = +~ tr/ / /sr; } say "sourc: |$x|"; say "split: |", splt (), "|"; say "regex: |", rgx (), "|"; say "tr/x/: |", trx (), "|"; cmpthese (-2, { splt => \&splt, rgx => \&rgx, trx => \&trx });
        $ perl test.pl sourc: | abc abc abc abc abc | split: |abc abc abc abc abc| regex: |abc abc abc abc abc| tr/x/: |abc abc abc abc abc| Rate rgx splt trx rgx 1012851/s -- -42% -52% splt 1739341/s 72% -- -17% trx 2105282/s 108% 21% --

        Enjoy, Have FUN! H.Merijn
        You probably meant
        substr($x, index($_ = $x =~ tr/ /x/cr, 'x'), rindex($_, 'x') - length( +$x) + 1)
        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]