Please consider the following two programs (both of which work fine - suggestions for better function names welcome). The first replaces redundant white space characters with a single white space character.
#!/usr/bin/perl -d:NYTProf use warnings; use strict; use String::Util 'trim'; use Benchmark qw(cmpthese timethese); cmpthese( -30, { compress_1 => q|compress_1(' Mary had a little lamb. ' +);|, compress_2 => q|compress_2(' Mary had a little lamb. ' +);|, compress_3 => q|compress_3(' Mary had a little lamb. ' +);|, squash => q|squash(' Mary had a little lamb. ');|, split_join => q|split_join(' Mary had a little lamb. ' +);|, } ); print "'compress_1' => '",compress_1(' Mary had a little la +mb. '),"'\n"; print "'compress_2' => '",compress_2(' Mary had a little la +mb. '),"'\n"; print "'compress_3' => '",compress_3(' Mary had a little la +mb. '),"'\n"; print "'squash' => '",squash(' Mary had a little lamb. ')," +'\n"; print "'split_join' => '",split_join(' Mary had a little la +mb. '),"'\n"; exit; sub compress_1 { my $string = shift; $string =~ s/ +/ /g; return $string; } sub compress_2 { my $string = shift; $string =~ s/\h+/ /g; return $string; } sub compress_3 { my $string = shift; $string =~ s/ {1,}/ /g; return $string; } sub squash { my $string = shift; $string =~ tr/ //s; return $string; } sub split_join { my $string = shift; $string = join ' ', split ' ', $string; return $string; }
The next trims leading and trailing whitespace.
#!/usr/bin/perl -d:NYTProf use warnings; use strict; use String::Util 'trim'; use Benchmark qw(cmpthese timethese); cmpthese( -30, { 'double_star' => q|double_star(' Mary had a little lamb. ');|, 'double_plus' => q|double_plus(' Mary had a little lamb. ');|, 'double_plus2' => q|double_plus(' Mary had a little lamb. Mar +y had a little lamb. Mary had a little lamb. Mary had a little lamb +. Mary had a little lamb. Mary had a little lamb. Mary had a littl +e lamb. Mary had a little lamb. Mary had a little lamb. Mary had a + little lamb. Mary had a little lamb. Mary had a little lamb. Mary + had a little lamb. Mary had a little lamb. ');|, 'replace' => q|replace( ' Mary had a little lamb. ');|, 'for_star' => q|for_star( ' Mary had a little lamb. ');|, 'for_plus' => q|for_plus( ' Mary had a little lamb. ');|, 'regex_or' => q|regex_or( ' Mary had a little lamb. ');|, 'one_liner' => q|one_liner( ' Mary had a little lamb. ');|, 'trim' => q|trim( ' Mary had a little lamb. ');|, } ); print "'trim' => '",trim(' Mary had a little lamb. '),"'\n"; print "'double_star' => '",double_star(' Mary had a little lamb. '),"' +\n"; print "'double_plus' => '",double_plus(' Mary had a little lamb. '),"' +\n"; print "'double_plus2' => '",double_plus(' Mary had a little lamb. Mar +y had a little lamb. Mary had a little lamb. Mary had a little lamb +. Mary had a little lamb. Mary had a little lamb. Mary had a littl +e lamb. Mary had a little lamb. Mary had a little lamb. Mary had a + little lamb. Mary had a little lamb. Mary had a little lamb. Mary + had a little lamb. Mary had a little lamb. '),"'\n"; print "'replace' => '",replace( ' Mary had a little lamb. '),"'\n"; print "'for_star' => '",for_star( ' Mary had a little lamb. '),"'\n"; print "'for_plus' => '",for_plus( ' Mary had a little lamb. '),"'\n"; print "'regex_or' => '",regex_or( ' Mary had a little lamb. '),"'\n"; print "'one_liner' => '",one_liner( ' Mary had a little lamb. '),"'\n" +; exit; sub one_liner { my $string = shift; # $string =~ s/^\ *([A-Z,a-z,0-9]*)\ *$/$1/g; $string =~ s/^\s+|\s+$//g ; return $string; } sub double_star { my $string = shift; $string =~ s/^\s*//; $string =~ s/\s*$//; return $string; } sub double_plus { my $string = shift; $string =~ s/^\s+//; #remove leading spaces $string =~ s/\s+$//; #remove trailing spaces return $string; } sub replace { my $string = shift; $string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/; return $string; } sub for_star { my $string = shift; for ($string) { s/^\s+//; s/\s+$//; } return $string; } sub for_plus { my $string = shift; for ($string) { s/^\s*//; s/\s*$//; } return $string; } sub regex_or { my $string = shift; $string =~ s/(?:^ +)||(?: +$)//g; return $string; }
And here is what I get when I execute these programs.
ted@linux-jp04:~/Work/Projects/misc.tests> ./compress.multiple.spaces. +to.single.space.pl Rate compress_3 compress_1 compress_2 split_join sq +uash compress_3 135174/s -- -2% -6% -34% +-45% compress_1 137798/s 2% -- -4% -33% +-44% compress_2 143178/s 6% 4% -- -30% +-42% split_join 205421/s 52% 49% 43% -- +-17% squash 247547/s 83% 80% 73% 21% + -- 'compress_1' => ' Mary had a little lamb. ' 'compress_2' => ' Mary had a little lamb. ' 'compress_3' => ' Mary had a little lamb. ' 'squash' => ' Mary had a little lamb. ' 'split_join' => 'Mary had a little lamb.' ted@linux-jp04:~/Work/Projects/misc.tests> ./trim.ws.pl Rate double_plus2 regex_or trim for_plus for_star dou +ble_star one_liner double_plus replace double_plus2 69971/s -- -5% -21% -28% -36% + -37% -43% -46% -46% regex_or 73562/s 5% -- -17% -24% -33% + -34% -40% -43% -44% trim 88942/s 27% 21% -- -8% -19% + -20% -27% -32% -32% for_plus 96591/s 38% 31% 9% -- -12% + -13% -21% -26% -26% for_star 109941/s 57% 49% 24% 14% -- + -1% -10% -16% -16% double_star 111060/s 59% 51% 25% 15% 1% + -- -9% -15% -15% one_liner 122651/s 75% 67% 38% 27% 12% + 10% -- -6% -6% double_plus 130149/s 86% 77% 46% 35% 18% + 17% 6% -- -0% replace 130236/s 86% 77% 46% 35% 18% + 17% 6% 0% -- + 'trim' => 'Mary had a little lamb.' + + 'double_star' => 'Mary had a little lamb.' + + 'double_plus' => 'Mary had a little lamb.' + + 'double_plus2' => 'Mary had a little lamb. Mary had a little lamb. M +ary had a little lamb. Mary had a little lamb. Mary had a little la +mb. Mary had a little lamb. Mary had a little lamb. Mary had a lit +tle lamb. Mary had a little lamb. Mary had a little lamb. Mary had + a little lamb. Mary had a little lamb. Mary had a little lamb. Ma +ry had a little lamb.' + + 'replace' => 'Mary had a little lamb.' + + 'for_star' => 'Mary had a little lamb.' + + 'for_plus' => 'Mary had a little lamb.' + + 'regex_or' => 'Mary had a little lamb.' + + 'one_liner' => 'Mary had a little lamb.' + + ted@linux-jp04:~/Work/Projects/misc.tests>
First, I would like to understand the differences in performance among these regular expressions. I realize that the specific numbers will depend on the hardware being used and it's load, but I am interested in the ranking (aside from the obvious that applying the functions to a much longer string will impact these numbers). And, related to this, do they scale differently, or will I get the same ranking of functions regardless of the length of string? Second, I am curious as to why the split/join approach is so much faster than the fastest regular expression. Thirdly, I can see that if I want to both trim leading and trailing white space AND compress sequences of white space characters by a single space, I can use the split/join algorithm, but what about combining the regular expressions? I have included ONLY those regular expressions and algorithms that I found on the web, and one or two I came up with, and tested to work as advertised, but are there other regular expressions and/or functions that will serve one or the other or both functional requirements that would be faster still?
Thanks
ted
Why do I get this ridiculous splitting of my lines of code, so that the code begins at the far left of my screen and stops only a quarter of the way across my screen, and is there a way to stop that?
In reply to Question about regex performance by ted.byers
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |