Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by hippo (Archbishop) on Aug 14, 2020 at 06:46 UTC
|
without resorting to regular expressions (which are definitely an overkill for this particular purpose)?
Sure, just write your own function to do it. Having written that you will then come to the conclusion that regular expressions are definitely not an overkill for this particular purpose.
This is clearly an important special case. ... which clearly is an abuse of regex.
You keep using that word. I don't think it means what you think it means.
| [reply] |
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by kcott (Archbishop) on Aug 14, 2020 at 09:35 UTC
|
G'day likbez,
I will usually reach for one of Perl's string handling functions (e.g.
index,
rindex,
substr, and so on)
in preference to a regex when that is appropriate;
however, in this case, I would say that the regex makes for much cleaner code.
You could implement a trim() function using the guts of this code
(which uses neither a regex nor any modules, standard or otherwise):
$ perl -E '
my @x = (" a b c ", "d e f ", " g h i", "j k l", " ", "");
say "*** Initial strings ***";
say "|$_|" for @x;
for my $i (0 .. $#x) {
my $str = $x[$i];
while (0 == index $str, " ") {
$str = substr $str, 1;
}
my $str_end = length($str) - 1;
while ($str_end == rindex $str, " ") {
$str = substr $str, 0, $str_end;
--$str_end;
}
$x[$i] = $str;
}
say "*** Final strings ***";
say "|$_|" for @x;
'
*** Initial strings ***
| a b c |
|d e f |
| g h i|
|j k l|
| |
||
*** Final strings ***
|a b c|
|d e f|
|g h i|
|j k l|
||
||
If your question was genuinely serious, please Benchmark
a trim() function using something like I've provided against another trim() function using a regex.
You could obviously do the same for ltrim() and rtrim() functions.
[As others have either asked or alluded to,
please explain phrases such as "definitely an overkill", "important special case" and "abuse of regex".
Unfortunately, use of such language makes your post come across as some sort of trollish rant
— I'm not saying that was your intent, just how it presents itself.]
| [reply] [d/l] [select] |
|
DB<11> $a="x \n \n \n "
DB<12> $a =~ s/\s+$//
DB<13> x $a
0 'x'
DB<14>
The OP should be clearer about the semantics he wants.
see also Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules | [reply] [d/l] [select] |
|
G'day Rolf,
That's a valid point.
My main intent with that code was really to show the complexity of the solution when a regex or module were not used.
Anyway, adding a little more complexity, you can trim whatever blanks you want:
$ perl -E '
my @blanks = (" ", "\n", "\r", "\t");
my @x = (
" a b c ", "d e f \r ", " \t g h i",
"j k l", " ", "\n", "\n\nXYZ\n\n", ""
);
say "*** Initial strings ***";
say "|$_|" for @x;
for my $i (0 .. $#x) {
my $str = $x[$i];
while (grep { 0 == index $str, $_ } @blanks) {
$str = substr $str, 1;
}
my $str_end = length($str) - 1;
while (grep { $str_end == rindex $str, $_ } @blanks) {
$str = substr $str, 0, $str_end;
--$str_end;
}
$x[$i] = $str;
}
say "*** Final strings ***";
say "|$_|" for @x;
'
*** Initial strings ***
| a b c |
| e f
| g h i|
|j k l|
| |
|
|
|
XYZ
|
||
*** Final strings ***
|a b c|
|d e f|
|g h i|
|j k l|
||
||
|XYZ|
||
You're quite correct about "The OP should be clearer ...".
The word 'blank' is often used to mean various things:
a single space, multiple consecutive spaces, a whitepace character, multiple consecutive whitepace characters,
and I have also seen it used to refer to a zero-length string.
Similarly, the word 'space' can mean a single space, any gap between visible characters, and so on.
So, as with many posts, we're left with guessing the most likely meaning from the context.
My belief, that a regex is a better option, strengthens
as the complexity of the non-regex and non-module code increases. :-)
| [reply] [d/l] |
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by LanX (Saint) on Aug 14, 2020 at 03:28 UTC
|
| [reply] [d/l] [select] |
|
So if you want the exact same semantic, it'll become far more complicated than this regex.
I agree. That's a good point. Thank you !
In other words it is not easy to design a good trim function without regex, but it is possible to design one that used regex, but treating the single quoted string as a special case
For example
trim(' ',$line)
vs
trim(/\s/.$line)
BTW this is impossible in Python which implements regex via library, unless you add a new lexical type to the Language (regex string instead of raw string that is used). | [reply] |
|
| [reply] [d/l] [select] |
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by jwkrahn (Abbot) on Aug 14, 2020 at 03:58 UTC
|
s/^\s+//, s/\s+$// for $line;
| [reply] [d/l] |
|
| [reply] [d/l] |
|
That benchmark shows s/^\s+|\s+$//g as always being slower than two regexes. It proposes a third option, to extract the inner portion with a match, which is shows as faster in the benchmark given. But that is a very limited benchmark. A more complete benchmark shows that doing two regexes is almost always the fastest.
use strict;
use warnings;
use Benchmark::Dumb qw(cmpthese);
my @strings = (
no_trim_short => 'asd',
no_trim_mid => 'asdasdasdasdasdasdasd',
no_trim_long => 'asd' x 500,
no_trim_mid_with_ws => 'asd asd asd asd asd asd asd',
no_trim_long_with_ws => (join ' ', ('asd') x 500),
short => ' asd ',
mid => ' asdasdasdasdasdasdasd ',
long => ' '.('asd' x 500).' ',
mid_with_ws => ' asd asd asd asd asd asd asd ',
long_with_ws => ' '.(join ' ', ('asd') x 500).' ',
);
while (my ($name, $string) = splice @strings, 0, 2) {
print "$name:\n";
cmpthese(0.0005, {
global => sub {
my $s = $string;
$s =~ s/\A\s+|\s+\z//g;
},
startend => sub {
my $s = $string;
s/\A\s+//, s/\s+\z// for $s;
},
match => sub {
my $s = $string;
($s) = $s =~ /\A\s*(.*?)\s*\z/s;
},
});
print "\n";
}
__END__
no_trim_short:
Rate match global startend
match 1.42373e+06+-420/s -- -29.2% -52.9%
global 2.01075e+06+-840/s 41.2% -- -33.4%
startend 3.02e+06+-1800/s 112.12+-0.14% 50.2+-0.11% --
no_trim_mid:
Rate global match startend
global 519190+-150/s -- -12.3% -81.1%
match 591890+-290/s 14.0% -- -78.5%
startend 2.7543e+06+-1400/s 430.5% 365.3% --
no_trim_long:
Rate global match startend
global 8420.17+-0.00081/s -- -31.7% -97.8%
match 12324.1+-4/s 46.4% -- -96.8%
startend 384590+-0.95/s 4467.5% 3020.6% --
no_trim_mid_with_ws:
Rate global match startend
global 388912+-98/s -- -19.5% -70.2%
match 482948+-4/s 24.2% -- -63.0%
startend 1.30366e+06+-19/s 235.2% 169.9% --
no_trim_long_with_ws:
Rate global match startend
global 5750.5+-2.8/s -- -33.6% -81.3%
match 8663.4+-3.4/s 50.7% -- -71.9%
startend 30807.4+-0.011/s 435.7% 255.6% --
short:
Rate global startend match
global 968450+-390/s -- -12.8% -32.0%
startend 1.11124e+06+-460/s 14.7% -- -22.0%
match 1.42383e+06+-490/s 47.0% 28.1% --
mid:
Rate global match startend
global 387160+-190/s -- -35.3% -56.4%
match 598710+-260/s 54.64+-0.1% -- -32.5%
startend 887420+-380/s 129.21+-0.15% 48.2% --
long:
Rate global match startend
global 8410.31+-0.0012/s -- -31.8% -97.2%
match 12323.2+-4/s 46.5% -- -95.9%
startend 298990+-140/s 3455.0% 2326.2% --
mid_with_ws:
Rate global match startend
global 303500+-130/s -- -37.2% -48.5%
match 482925+-4/s 59.1% -- -18.0%
startend 589220+-300/s 94.14+-0.13% 22.0% --
long_with_ws:
Rate global match startend
global 5691.4+-2.6/s -- -34.4% -81.1%
match 8672.9+-4/s 52.4% -- -71.1%
startend 30035.1+-0/s 427.7% 246.3% --
| [reply] [d/l] [select] |
|
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules
by perlfan (Parson) on Aug 14, 2020 at 12:23 UTC
|
>$line =~ s/^\s+|\s+$//g which clearly is an abuse of regex.
Why do you say that?
>trim function which BTW was present in Perl 6
You say this like it's a good thing. I bet there is also one in PHP. | [reply] [d/l] |
|
You won
«The Crux of the Biscuit is the Apostrophe»
perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help
| [reply] [d/l] |
|
easily re-implemented in Perl. It seems ...
DB<33> sub trim { $_[1] //= qr/\s/; $_[0] =~ s/^[$_[1]]+|[$_[1]]+$//
+g }
DB<34> $a = $b = " \n . aaa . \n "
DB<35> trim $a
DB<36> trim $b, " "
DB<37> x $a,$b
0 '. aaa .'
1 '
. aaa .
'
DB<38>
| [reply] [d/l] |
|
|
|