Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

regex verses join/split

by aquacade (Scribe)
on Jul 27, 2001 at 07:27 UTC ( [id://100221]=perlquestion: print w/replies, xml ) Need Help??

aquacade has asked for the wisdom of the Perl Monks concerning the following question:

Which of these two versions of similarly functional functions is "more efficient" in your opinion and why? Thanks!

use strict; # Using regex sub trimall_v1 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value! for (@out) { s/^\s+//; s/\s+$//; s/\s+/ /g; } return wantarray ? @out : $out[0]; } # Using specialized split on ' ' and $_ sub trimall_v2 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value for (@out) {$_= join ' ',split;} return wantarray ? @out : $out[0]; } my $val=" Scalar \r\r\n\t\t\n "; my @list=(" List\t\t\n 1 \n \n \n"," \t\t List 2 "," List\t\t\t\ +n 3\t\n\n\r "); print "'$_'\n" for (trimall_v1($val)); print "'$_'\n" for (trimall_v1(@list)); print "'$_'\n" for (trimall_v2($val)); print "'$_'\n" for (trimall_v2(@list)); __END__

Replies are listed 'Best First'.
Re: regex verses join/split
by aquacade (Scribe) on Jul 27, 2001 at 12:51 UTC

    Thanks! I learned how to use benchmark tonight! I added (I think) middle whitespace removal regexes to your test routines (using runrig's benchmark example from your "already mentioned" link above. What amazes me is (if I understand the benchmark output) is that the split/join method is THE fastest way to trim extra whitespace. Hope I modified the other regexes "the best way" to do the equivalent functionality of trimall_v1?

    use strict; use Benchmark 'cmpthese'; my $str = " a b c d "; cmpthese(-5, { ALTERNATE=>\&alternate, LTSAVE=>\&lt_save, LTSEXEGER=>\&ltsexeger, WHILE_SUB=>\&while_sub, TRIMALL_V1=>\&trimall_v1, TRIMALL_V2=>\&trimall_v2, }); # Using regex sub trimall_v1 { local $_ = $str; s/^\s+//; s/\s+$//; s/\s+/ /g; $_; } # Using specialized split on ' ' and $_ sub trimall_v2 { local $_ = $str; $_= join ' ',split; } # Used runrig's benchmark example, but made ones below trim leading, t +railing, and extra whitespace sub alternate { local $_ = $str; s/^\s+|\s+$//g; s/\s+/ /g; $_; } sub lt_save { local $_ = $str; s/^\s*(.*?)\s*$/$1/; s/\s+/ /g; $_; } sub ltsexeger { local $_ = reverse $str; s/^\s+//; $_ = reverse $str; s/^\s+//; s/\s+/ /g; } sub while_sub { local $_ = $str; 1 while s/^\s//; 1 while s/\s$//; 1 while s/\s\s/ /; $_; } =Benchmarks Benchmark: running ALTERNATE, LTSAVE, LTSEXEGER, TRIMALL_V1, TRIMALL_V +2, WHILE_SUB, each for at least 5 CPU seconds... ALTERNATE: 5 wallclock secs ( 5.10 usr + 0.00 sys = 5.10 CPU) @ 8 +0293.53/s (n=409497) LTSAVE: 6 wallclock secs ( 5.27 usr + 0.00 sys = 5.27 CPU) @ 4 +8685.39/s (n=256572) LTSEXEGER: 5 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 12 +4402.60/s (n=622013) TRIMALL_V1: 5 wallclock secs ( 5.12 usr + 0.00 sys = 5.12 CPU) @ 12 +1486.91/s (n=622013) TRIMALL_V2: 5 wallclock secs ( 5.05 usr + 0.00 sys = 5.05 CPU) @ 15 +4931.88/s (n=782406) WHILE_SUB: 6 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 6 +8421.40/s (n=342107) Rate LTSAVE WHILE_SUB ALTERNATE TRIMALL_V1 LTSEXEGER +TRIMALL_V2 LTSAVE 48685/s -- -29% -39% -60% -61% + -69% WHILE_SUB 68421/s 41% -- -15% -44% -45% + -56% ALTERNATE 80294/s 65% 17% -- -34% -35% + -48% TRIMALL_V1 121487/s 150% 78% 51% -- -2% + -22% LTSEXEGER 124403/s 156% 82% 55% 2% -- + -20% TRIMALL_V2 154932/s 218% 126% 93% 28% 25% + -- =cut
Re: regex verses join/split
by clemburg (Curate) on Jul 27, 2001 at 12:57 UTC

    Looks like the split is slightly faster:

    h:\>perl benchmark.pl perl benchmark.pl Benchmark: timing 100000 iterations of trimall_v1, trimall_v2... trimall_v1: 13 wallclock secs (13.29 usr + 0.00 sys = 13.29 CPU) trimall_v2: 12 wallclock secs (11.74 usr + 0.01 sys = 11.75 CPU)

    Example code for testing with benchmark:

    #!/usr/bin/perl -w use strict; use Benchmark; my $iterations = $ARGV[0] || 100000; timethese($iterations, { trimall_v1 => \&try_trimall_v1, trimall_v2 => \&try_trimall_v2, }); sub try_trimall_v1 { my ($val, $list) = setup(); 1 for (trimall_v1($val)); 1 for (trimall_v1(@{$list})); } sub try_trimall_v2 { my ($val, $list) = setup(); 1 for (trimall_v2($val)); 1 for (trimall_v2(@{$list})); } sub setup { my $val=" Scalar \r\r\n\t\t\n "; my @list=(" List\t\t\n 1 \n \n \n"," \t\t List 2 "," List\t\ +t\t\n 3\t\n\n\r "); return $val, \@list; } # Using regex sub trimall_v1 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value! for (@out) { s/^\s+//; s/\s+$//; s/\s+/ /g; } return wantarray ? @out : $out[0]; } # Using specialized split on ' ' and $_ sub trimall_v2 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value for (@out) { $_= join ' ',split; } return wantarray ? @out : $out[0]; }

    Update: Argh ... too late ...

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: regex verses join/split
by lshatzer (Friar) on Jul 27, 2001 at 07:35 UTC
    Using the benchmark module you can run some tests and find out yourself.

    For an example check this node out.
Re: regex verses join/split
by japhy (Canon) on Jul 27, 2001 at 09:56 UTC
    Update: thanks to MeowChow and lemming, I've enrolled in an excellent course in which I will learn to read.

    I have a comment to make about your first method, with the regexes. You do three distinct substitutions. That's just silly. The whitespace at the beginning of a string and at the end of a string is going to be treated like ALL the OTHER whitespace, so you're only being inefficient by separating the clauses. And, as I've already mentioned, doing s/\s+$// is particularly inefficient. So your regex-method should merely be s/\s+//g. But you can probably do better with tr/\n\r\f\t //d.

    _____________________________________________________
    Jeff japhy Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Interesting, but the OP wanted to remove leading/trailing spaces, and collapse any sequence of whitespaces into one single space. You propose to remove any whitespace anywhere, which is not what was asked for.

      Still, it's true that tr/\n\r\f\t / /s would be more efficient than s/\s+/ /g. So, taking your remarks into account, I would say:

      s/^\s+//; 1 while s/\s$//; tr/\n\r\f\t / /s;

      --bwana147

Re: regex verses join/split
by runrig (Abbot) on Jul 27, 2001 at 18:49 UTC
    As long as you want to squeeze whitespace, do that first, and the regex engine might optimize the leading/trailing substitution, so then here's another possibility to benchmark:
    # Squeeze whitespace tr/\n\r\f\t //s # Then # This: s/^\s|\s$//g; # Or this: s/^\s//; s/\s$//;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://100221]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found