thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks,

I am trying to apply split on string every two characters, next step is to trim leading / trailing white space from string. Last step is join all the pieces together with white space in between. Of course this is easy on two steps. Is it possible to be done in one step?

Sample of code:

#!/usr/bin/perl use utf8; use strict; use warnings; use feature 'say'; use Encode qw(decode encode); use String::HexConvert ':all'; binmode( STDOUT, ':utf8' ); my $Chinese = '北亰'; # Chinese characters for Bei Jing (U ++5317 U+4EB0) say 'UTF-8'; my $utf8 = encode( 'UTF-8', $Chinese ); my $ascii2hexUTF8 = ascii_to_hex($utf8); $ascii2hexUTF8 = join(' ', split(/(..)/, $ascii2hexUTF8)); say $ascii2hexUTF8; $ascii2hexUTF8 =~ s/^\s+|\s+$//g; say $ascii2hexUTF8; __END__ $ perl test.pl UTF-8 e5 8c 97 e4 ba b0 e5 8c 97 e4 ba b0

Thank you for your time and effort.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re: How to split, join and trim leading / leading white space
by kcott (Archbishop) on Sep 06, 2017 at 04:47 UTC

    G'day thanos1983,

    "Is it possible to be done in one step?"

    If you're using Perl 5.14, or later, you can chain those operations using the 'r' modifier. See "perl5140delta: Non-destructive substitution".

    It's somewhat unclear what you're actually trying to achieve here. The use of Chinese characters seems superfluous to the actual question asked. The use of the 'g' modifier on the substitution, together with the '^' and '$' assertions, makes me wonder if you're perhaps dealing with multiline strings; however, the absence of the 'm' modifier suggests otherwise.

    Here's some guesses as to the type of thing you might want:

    $ perl -Mutf8 -C -E 'say join " ", split //, "北亰"'
    北 亰
    
    $ perl -Mutf8 -C -E 'say join " ", split //, " 北亰 " =~ s/^\s+|\s+$//r'
    北 亰
    
    $ perl -E 'say join(" ", split /(..)/, "e58c97e4bab0")' e5 8c 97 e4 ba b0 $ perl -E 'say join(" ", split /(..)/, "e58c97e4bab0") =~ s/^\s+|\s+$/ +/r' e5 8c 97 e4 ba b0

    If you're simply unfamiliar with what's going on with split, that's explained at the end of that documentation: "If the PATTERN contains capturing groups, ...".

    $ perl -E 'my @x = split /(..)/, "1234"; say "|$_|" for @x' || |12| || |34| $ perl -E 'my $x = join "_", split /(..)/, "1234"; say $x' _12__34

    Update (additional information): As an additional example, to extend that last chaining example, you could do this to reduce multiple embedded spaces to a single space:

    $ perl -E 'say join(" ", split /(..)/, "e58c97e4bab0") =~ s/^\s+|\s+$/ +/r =~ y/ / /rs' e5 8c 97 e4 ba b0

    See "perlop: y/SEARCHLIST/REPLACEMENTLIST/cdsr" for more about that.

    Update (further discussion): See my subsequent response (below) for further discussion and "some clarifications and corrections".

    — Ken

      Hello kcott,

      That is perfect, thanks a lot for your time and effort. :)

      Seeking for Perl wisdom...on the process of learning...not there...yet!
        "That is perfect, ..."

        Well, not quite! :-)

        I saw your meditation after I read, and responded to, your OP in this thread. I now see where the Chinese characters come from; although, I still think they're superflous in the context of this specific question.

        The focus of my answer was the 'r' modifier (in response to your "possible ... in one step?"). I probably should have paid more attention to your regex (/^\s+|\s+$/), rather than just copying it verbatim. With the Chinese issue out of the way, and having spent some time looking more closely at what I wrote, here's some clarifications and corrections.

        The substitution example with the Chinese characters should have included a 'g' modifier. I'm now reasonably certain that wasn't what you wanted; however, it should have been written like this:

        $ perl -Mutf8 -C -E 'say join " ", split //, " 北亰 " =~ s/^\s+|\s+$//gr'
        北 亰
        

        I was correct in not using the 'g' modifier in the other two substitution examples; however, I should have also removed the alternation. As the two examples splitting "1234" clearly demonstrate, there's no trailing whitespace: you only need to remove the leading whitespace. For those examples, these would have been better:

        $ perl -E 'say join(" ", split /(..)/, "e58c97e4bab0") =~ s/^\s+//r' e5 8c 97 e4 ba b0 $ perl -E 'say join(" ", split /(..)/, "e58c97e4bab0") =~ s/^\s+//r =~ + y/ / /rs' e5 8c 97 e4 ba b0

        Now, hopefully, it's "perfect". :-)

        — Ken

Re: How to split, join and trim leading / leading white space
by Your Mother (Archbishop) on Sep 06, 2017 at 00:41 UTC

    Your posts on the topic make it so much harder and weirder than it should be–

    moo@cow~>perl -CSD -le '$_ = " \N{U+5317}\N{U+4EB0}  "; print "<$_>"; s/\A\s+|\s+\z//g; print "<$_>"; print join " ", split //;'
    < 北亰  >
    <北亰>
    北 亰
    

      On Perl v5.22 and higher, splitting on \b{gcb} (extended grapheme cluster boundary) might be better:

      $ perl -CSD -le 'print map "-$_- ", split //, "u\x{0308}ber"'
      -u- -̈- -b- -e- -r- 
      $ perl -CSD -le 'print map "-$_- ", split /\b{gcb}/, "u\x{0308}ber"'
      -ü- -b- -e- -r- 
      $ perl -CSD -le 'print map "-$_- ", split //,
          "k\x{0301}u\x{032D}o\x{0304}\x{0301}n"'
      -k- -́- -u- -̭- -o- -̄- -́- -n- 
      $ perl -CSD -le 'print map "-$_- ", split /\b{gcb}/,
          "k\x{0301}u\x{032D}o\x{0304}\x{0301}n"'
      -ḱ- -ṷ- -ṓ- -n- 
      

      (If the 2nd and 4th outputs above aren't displaying correctly, like in my browser, they should be "-ü- -b- -e- -r-" and "-ḱ- -ṷ- -ṓ- -n-".)

      As an alternative in Perl v5.12 and above, \X can be used. Update 2: E.g. split /\X\K(?=\X)/, ...

      Update: Made last sentence more clear.

      Hello Your Mother,

      Thanks a lot for the time and effort. I was under the impression that it should be done the trim process after the split.

      It looks like a was wrong :). Thanks again, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: How to split, join and trim leading / leading white space
by Anonymous Monk on Sep 06, 2017 at 01:25 UTC
    Next you'll be asking why there are two spaces between bytes instead of one. Here, maybe this will help.
    use Data::Dump; my $banana = 'banana'; my @a = split /(..)/, $banana; dd(\@a); my @b = $banana =~ /(..)/g; dd(\@b); __END__ ["", "ba", "", "na", "", "na"] ["ba", "na", "na"]