regex verses join/split

aquacade has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regex verses join/split by aquacade (Scribe) on Jul 27, 2001 at 12:51 UTC
Thanks! I learned how to use benchmark tonight! I added (I think) middle whitespace removal regexes to your test routines (using runrig's benchmark example from your "already mentioned" link above. What amazes me is (if I understand the benchmark output) is that the split/join method is THE fastest way to trim extra whitespace. Hope I modified the other regexes "the best way" to do the equivalent functionality of trimall_v1? use strict; use Benchmark 'cmpthese'; my $str = " a b c d "; cmpthese(-5, { ALTERNATE=>\&alternate, LTSAVE=>\&lt_save, LTSEXEGER=>\&ltsexeger, WHILE_SUB=>\&while_sub, TRIMALL_V1=>\&trimall_v1, TRIMALL_V2=>\&trimall_v2, }); # Using regex sub trimall_v1 { local $_ = $str; s/^\s+//; s/\s+$//; s/\s+/ /g; $_; } # Using specialized split on ' ' and $_ sub trimall_v2 { local $_ = $str; $_= join ' ',split; } # Used runrig's benchmark example, but made ones below trim leading, t +railing, and extra whitespace sub alternate { local $_ = $str; s/^\s+\|\s+$//g; s/\s+/ /g; $_; } sub lt_save { local $_ = $str; s/^\s(.?)\s*$/$1/; s/\s+/ /g; $_; } sub ltsexeger { local $_ = reverse $str; s/^\s+//; $_ = reverse $str; s/^\s+//; s/\s+/ /g; } sub while_sub { local $_ = $str; 1 while s/^\s//; 1 while s/\s$//; 1 while s/\s\s/ /; $_; } =Benchmarks Benchmark: running ALTERNATE, LTSAVE, LTSEXEGER, TRIMALL_V1, TRIMALL_V +2, WHILE_SUB, each for at least 5 CPU seconds... ALTERNATE: 5 wallclock secs ( 5.10 usr + 0.00 sys = 5.10 CPU) @ 8 +0293.53/s (n=409497) LTSAVE: 6 wallclock secs ( 5.27 usr + 0.00 sys = 5.27 CPU) @ 4 +8685.39/s (n=256572) LTSEXEGER: 5 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 12 +4402.60/s (n=622013) TRIMALL_V1: 5 wallclock secs ( 5.12 usr + 0.00 sys = 5.12 CPU) @ 12 +1486.91/s (n=622013) TRIMALL_V2: 5 wallclock secs ( 5.05 usr + 0.00 sys = 5.05 CPU) @ 15 +4931.88/s (n=782406) WHILE_SUB: 6 wallclock secs ( 5.00 usr + 0.00 sys = 5.00 CPU) @ 6 +8421.40/s (n=342107) Rate LTSAVE WHILE_SUB ALTERNATE TRIMALL_V1 LTSEXEGER +TRIMALL_V2 LTSAVE 48685/s -- -29% -39% -60% -61% + -69% WHILE_SUB 68421/s 41% -- -15% -44% -45% + -56% ALTERNATE 80294/s 65% 17% -- -34% -35% + -48% TRIMALL_V1 121487/s 150% 78% 51% -- -2% + -22% LTSEXEGER 124403/s 156% 82% 55% 2% -- + -20% TRIMALL_V2 154932/s 218% 126% 93% 28% 25% + -- =cut [download]	[reply] [d/l]
Re: regex verses join/split by clemburg (Curate) on Jul 27, 2001 at 12:57 UTC
Looks like the split is slightly faster: `h:\>perl benchmark.pl perl benchmark.pl Benchmark: timing 100000 iterations of trimall_v1, trimall_v2... trimall_v1: 13 wallclock secs (13.29 usr + 0.00 sys = 13.29 CPU) trimall_v2: 12 wallclock secs (11.74 usr + 0.01 sys = 11.75 CPU)` [download] Example code for testing with benchmark: #!/usr/bin/perl -w use strict; use Benchmark; my $iterations = $ARGV[0] \|\| 100000; timethese($iterations, { trimall_v1 => \&try_trimall_v1, trimall_v2 => \&try_trimall_v2, }); sub try_trimall_v1 { my ($val, $list) = setup(); 1 for (trimall_v1($val)); 1 for (trimall_v1(@{$list})); } sub try_trimall_v2 { my ($val, $list) = setup(); 1 for (trimall_v2($val)); 1 for (trimall_v2(@{$list})); } sub setup { my $val=" Scalar \r\r\n\t\t\n "; my @list=(" List\t\t\n 1 \n \n \n"," \t\t List 2 "," List\t\ +t\t\n 3\t\n\n\r "); return $val, \@list; } # Using regex sub trimall_v1 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value! for (@out) { s/^\s+//; s/\s+$//; s/\s+/ /g; } return wantarray ? @out : $out[0]; } # Using specialized split on ' ' and $_ sub trimall_v2 { # Trims all extra whitespace from left, right and middle! my @out=@_; # Pass by value for (@out) { $_= join ' ',split; } return wantarray ? @out : $out[0]; } [download] Update: Argh ... too late ... Christian Lemburg Brainbench MVP for Perl http://www.brainbench.com	[reply] [d/l] [select]
Re: regex verses join/split by lshatzer (Friar) on Jul 27, 2001 at 07:35 UTC
Using the benchmark module you can run some tests and find out yourself. For an example check this node out.	[reply]
Re: regex verses join/split by japhy (Canon) on Jul 27, 2001 at 09:56 UTC
Update: thanks to MeowChow and lemming, I've enrolled in an excellent course in which I will learn to read. I have a comment to make about your first method, with the regexes. You do three distinct substitutions. That's just silly. The whitespace at the beginning of a string and at the end of a string is going to be treated like ALL the OTHER whitespace, so you're only being inefficient by separating the clauses. And, as I've already mentioned, doing `s/\s+$//` is particularly inefficient. So your regex-method should merely be `s/\s+//g`. But you can probably do better with `tr/\n\r\f\t //d`. _____________________________________________________ Jeff `japhy` Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply]
Re: Re: regex verses join/split by bwana147 (Pilgrim) on Jul 27, 2001 at 13:04 UTC
Interesting, but the OP wanted to remove leading/trailing spaces, and collapse any sequence of whitespaces into one single space. You propose to remove any whitespace anywhere, which is not what was asked for. Still, it's true that `tr/\n\r\f\t / /s` would be more efficient than `s/\s+/ /g`. So, taking your remarks into account, I would say: `s/^\s+//; 1 while s/\s$//; tr/\n\r\f\t / /s;` [download] --bwana147	[reply] [d/l] [select]
Re: regex verses join/split by runrig (Abbot) on Jul 27, 2001 at 18:49 UTC
As long as you want to squeeze whitespace, do that first, and the regex engine might optimize the leading/trailing substitution, so then here's another possibility to benchmark: `# Squeeze whitespace tr/\n\r\f\t //s # Then # This: s/^\s\|\s$//g; # Or this: s/^\s//; s/\s$//;` [download]	[reply] [d/l]


go ahead... be a heretic
	PerlMonks