How to trim a line from leading and trailing blanks without using regex or non-standard modules

likbez has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules by hippo (Archbishop) on Aug 14, 2020 at 06:46 UTC
without resorting to regular expressions (which are definitely an overkill for this particular purpose)? Sure, just write your own function to do it. Having written that you will then come to the conclusion that regular expressions are definitely not an overkill for this particular purpose. This is clearly an important special case. ... which clearly is an abuse of regex. You keep using that word. I don't think it means what you think it means. 🦛	[reply]
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules by kcott (Archbishop) on Aug 14, 2020 at 09:35 UTC
G'day likbez, I will usually reach for one of Perl's string handling functions (e.g. index, rindex, substr, and so on) in preference to a regex when that is appropriate; however, in this case, I would say that the regex makes for much cleaner code. You could implement a `trim()` function using the guts of this code (which uses neither a regex nor any modules, standard or otherwise): $ perl -E ' my @x = (" a b c ", "d e f ", " g h i", "j k l", " ", ""); say "* Initial strings "; say "\|$_\|" for @x; for my $i (0 .. $#x) { my $str = $x[$i]; while (0 == index $str, " ") { $str = substr $str, 1; } my $str_end = length($str) - 1; while ($str_end == rindex $str, " ") { $str = substr $str, 0, $str_end; --$str_end; } $x[$i] = $str; } say " Final strings "; say "\|$_\|" for @x; ' Initial strings * \| a b c \| \|d e f \| \| g h i\| \|j k l\| \| \| \|\| * Final strings * \|a b c\| \|d e f\| \|g h i\| \|j k l\| \|\| \|\| [download] If your question was genuinely serious, please Benchmark a `trim()` function using something like I've provided against another `trim()` function using a regex. You could obviously do the same for `ltrim()` and `rtrim()` functions. [As others have either asked or alluded to, please explain phrases such as "definitely an overkill", "important special case" and "abuse of regex". Unfortunately, use of such language makes your post come across as some sort of trollish rant — I'm not saying that was your intent, just how it presents itself.] — Ken	[reply] [d/l] [select]
Re^2: How to trim a line from leading and trailing blanks without using regex or non-standard modules by LanX (Saint) on Aug 14, 2020 at 11:22 UTC
Hi Ken I suppose your solution works only for blank `" "` and not for other whitespace characters like `"\n"` So it's not exactly the same like with `\s` ° `DB<11> $a="x \n \n \n " DB<12> $a =~ s/\s+$// DB<13> x $a 0 'x' DB<14>` [download] The OP should be clearer about the semantics he wants. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} see also Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules	[reply] [d/l] [select]
Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules by kcott (Archbishop) on Aug 15, 2020 at 11:02 UTC
G'day Rolf, That's a valid point. My main intent with that code was really to show the complexity of the solution when a regex or module were not used. Anyway, adding a little more complexity, you can trim whatever blanks you want: $ perl -E ' my @blanks = (" ", "\n", "\r", "\t"); my @x = ( " a b c ", "d e f \r ", " \t g h i", "j k l", " ", "\n", "\n\nXYZ\n\n", "" ); say "* Initial strings "; say "\|$_\|" for @x; for my $i (0 .. $#x) { my $str = $x[$i]; while (grep { 0 == index $str, $_ } @blanks) { $str = substr $str, 1; } my $str_end = length($str) - 1; while (grep { $str_end == rindex $str, $_ } @blanks) { $str = substr $str, 0, $str_end; --$str_end; } $x[$i] = $str; } say " Final strings "; say "\|$_\|" for @x; ' Initial strings * \| a b c \| \| e f \| g h i\| \|j k l\| \| \| \| \| \| XYZ \| \|\| * Final strings * \|a b c\| \|d e f\| \|g h i\| \|j k l\| \|\| \|\| \|XYZ\| \|\| [download] You're quite correct about "The OP should be clearer ...". The word 'blank' is often used to mean various things: a single space, multiple consecutive spaces, a whitepace character, multiple consecutive whitepace characters, and I have also seen it used to refer to a zero-length string. Similarly, the word 'space' can mean a single space, any gap between visible characters, and so on. So, as with many posts, we're left with guessing the most likely meaning from the context. My belief, that a regex is a better option, strengthens as the complexity of the non-regex and non-module code increases. :-) — Ken	[reply] [d/l]
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules by LanX (Saint) on Aug 14, 2020 at 03:28 UTC
> which clearly is an abuse of regex. Why is it an abuse of regex? Problem is that `\s` is a meta character for any white-space not only blank `" "` , but only usable inside regex.° So if you want the exact same semantic, it'll become far more complicated than this regex. But better define your own trim() using a regex inside. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) compare Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules	[reply] [d/l] [select]
Re^2: How to trim a line from leading and trailing blanks without using regex or non-standard modules by likbez (Sexton) on Aug 14, 2020 at 19:39 UTC
So if you want the exact same semantic, it'll become far more complicated than this regex. I agree. That's a good point. Thank you ! In other words it is not easy to design a good trim function without regex, but it is possible to design one that used regex, but treating the single quoted string as a special case For example trim(' ',$line) vs trim(/\s/.$line) BTW this is impossible in Python which implements regex via library, unless you add a new lexical type to the Language (regex string instead of raw string that is used).	[reply]
Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules by LanX (Saint) on Aug 15, 2020 at 01:04 UTC
> `trim(/\s/.$line)` I doubt this is valid syntax. you probably mean `trim( qr/\s/, $line)` see Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules for a slightly better implementation > this is impossible in Python passing regex inside a string is fine in Perl, why not in Python? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules by jwkrahn (Abbot) on Aug 14, 2020 at 03:58 UTC
(IMHO) the most common solution is: `s/^\s+//, s/\s+$// for $line;` [download]	[reply] [d/l]
Re^2: How to trim a line from leading and trailing blanks without using regex or non-standard modules by Marshall (Canon) on Aug 14, 2020 at 04:33 UTC
That used to be the "standard way" to do this and was recommended in the Perl Docs and I thought it was just fine. `s/^\s+\|\s+$//g` has been benchmarked. And I now think this is faster and "better" than 2 statements. There is one post at Re^3: script optmization that shows some benchmarks. This is certainly not an "abuse" of regex. This is what regex is is for! The Perl regex engine continually becomes better and usually faster between releases.	[reply] [d/l]
Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules by Haarg (Priest) on Aug 17, 2020 at 07:21 UTC
That benchmark shows `s/^\s+\|\s+$//g` as always being slower than two regexes. It proposes a third option, to extract the inner portion with a match, which is shows as faster in the benchmark given. But that is a very limited benchmark. A more complete benchmark shows that doing two regexes is almost always the fastest. use strict; use warnings; use Benchmark::Dumb qw(cmpthese); my @strings = ( no_trim_short => 'asd', no_trim_mid => 'asdasdasdasdasdasdasd', no_trim_long => 'asd' x 500, no_trim_mid_with_ws => 'asd asd asd asd asd asd asd', no_trim_long_with_ws => (join ' ', ('asd') x 500), short => ' asd ', mid => ' asdasdasdasdasdasdasd ', long => ' '.('asd' x 500).' ', mid_with_ws => ' asd asd asd asd asd asd asd ', long_with_ws => ' '.(join ' ', ('asd') x 500).' ', ); while (my ($name, $string) = splice @strings, 0, 2) { print "$name:\n"; cmpthese(0.0005, { global => sub { my $s = $string; $s =~ s/\A\s+\|\s+\z//g; }, startend => sub { my $s = $string; s/\A\s+//, s/\s+\z// for $s; }, match => sub { my $s = $string; ($s) = $s =~ /\A\s(.?)\s*\z/s; }, }); print "\n"; } __END__ no_trim_short: Rate match global startend match 1.42373e+06+-420/s -- -29.2% -52.9% global 2.01075e+06+-840/s 41.2% -- -33.4% startend 3.02e+06+-1800/s 112.12+-0.14% 50.2+-0.11% -- no_trim_mid: Rate global match startend global 519190+-150/s -- -12.3% -81.1% match 591890+-290/s 14.0% -- -78.5% startend 2.7543e+06+-1400/s 430.5% 365.3% -- no_trim_long: Rate global match startend global 8420.17+-0.00081/s -- -31.7% -97.8% match 12324.1+-4/s 46.4% -- -96.8% startend 384590+-0.95/s 4467.5% 3020.6% -- no_trim_mid_with_ws: Rate global match startend global 388912+-98/s -- -19.5% -70.2% match 482948+-4/s 24.2% -- -63.0% startend 1.30366e+06+-19/s 235.2% 169.9% -- no_trim_long_with_ws: Rate global match startend global 5750.5+-2.8/s -- -33.6% -81.3% match 8663.4+-3.4/s 50.7% -- -71.9% startend 30807.4+-0.011/s 435.7% 255.6% -- short: Rate global startend match global 968450+-390/s -- -12.8% -32.0% startend 1.11124e+06+-460/s 14.7% -- -22.0% match 1.42383e+06+-490/s 47.0% 28.1% -- mid: Rate global match startend global 387160+-190/s -- -35.3% -56.4% match 598710+-260/s 54.64+-0.1% -- -32.5% startend 887420+-380/s 129.21+-0.15% 48.2% -- long: Rate global match startend global 8410.31+-0.0012/s -- -31.8% -97.2% match 12323.2+-4/s 46.5% -- -95.9% startend 298990+-140/s 3455.0% 2326.2% -- mid_with_ws: Rate global match startend global 303500+-130/s -- -37.2% -48.5% match 482925+-4/s 59.1% -- -18.0% startend 589220+-300/s 94.14+-0.13% 22.0% -- long_with_ws: Rate global match startend global 5691.4+-2.6/s -- -34.4% -81.1% match 8672.9+-4/s 52.4% -- -71.1% startend 30035.1+-0/s 427.7% 246.3% -- [download]	[reply] [d/l] [select]
Re^4: How to trim a line from leading and trailing blanks without using regex or non-standard modules by Marshall (Canon) on Aug 28, 2020 at 02:33 UTC
Re: How to trim a line from leading and trailing blanks without using regex or non-standard modules by perlfan (Parson) on Aug 14, 2020 at 12:23 UTC
>`$line =~ s/^\s+\|\s+$//g` which clearly is an abuse of regex. Why do you say that? >trim function which BTW was present in Perl 6 You say this like it's a good thing. I bet there is also one in PHP.	[reply] [d/l]
Re^2: How to trim a line from leading and trailing blanks without using regex or non-standard modules by karlgoethebier (Abbot) on Aug 14, 2020 at 12:34 UTC
You won «The Crux of the Biscuit is the Apostrophe» `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]
Re^3: How to trim a line from leading and trailing blanks without using regex or non-standard modules by LanX (Saint) on Aug 14, 2020 at 14:43 UTC
easily re-implemented in Perl. It seems ... `DB<33> sub trim { $_[1] //= qr/\s/; $_[0] =~ s/^[$_[1]]+\|[$_[1]]+$// +g } DB<34> $a = $b = " \n . aaa . \n " DB<35> trim $a DB<36> trim $b, " " DB<37> x $a,$b 0 '. aaa .' 1 ' . aaa . ' DB<38>` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^4: How to trim a line from leading and trailing blanks without using regex or non-standard modules by marto (Cardinal) on Aug 14, 2020 at 15:13 UTC
Re^5: How to trim a line from leading and trailing blanks without using regex or non-standard modules by LanX (Saint) on Aug 14, 2020 at 16:06 UTC
Re^4: How to trim a line from leading and trailing blanks without using regex or non-standard modules by karlgoethebier (Abbot) on Aug 14, 2020 at 17:12 UTC