clamport has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, long time lurker first time poster. I've been working on a script and I have a function that _should_ allow me to split based on a separator character.

I am currently attempting to figure out how to get the split to work correctly if the separator is a special escape character. I've tried the following methods. My goal is to get the given character replaced with a new substring '<SEP_CHAR>' which is assigned to $SEP_CHAR.

$character = "/\Q$character\E/" if $character eq '+'; my $stringToProcess = join($SEP_CHAR, split($character, $string));
and
$character = "\\\$character" if $character eq '+'; my $stringToProcess = join($SEP_CHAR, split($character, $string));

Does anyone have any recommendations?

Replies are listed 'Best First'.
Re: Splitting string based on potentially escaped character
by kennethk (Abbot) on Mar 07, 2017 at 18:53 UTC
    First, I'll comment this has some XY Problem smell to it - as a long time lurker, I'm sure you've read that one. Why do you expect the new token will be less problematic in your process than the existing escape character? How does this factor into a broader parsing problem?

    To solve the asked question, split uses a regex to act on a string. Therefore, you should be feeding it a regular expression, not a static character. Assuming you've stored the character literal in $character, you should get your desired result from:

    my $stringToProcess = join($SEP_CHAR, split(/\Q$character\E/, $string) +);
    See quotemeta. If you explicitly only want character escaping in the particular scenario, you can get your desired result with
    $character = qr/\Q$character\E/ if $character eq '+'; my $stringToProcess = join($SEP_CHAR, split($character, $string));
    where you store a regular expression as an object; see Regexp Quote Like Operators in perlop.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Splitting string based on potentially escaped character
by AnomalousMonk (Archbishop) on Mar 07, 2017 at 19:02 UTC

    The split/join sequence suggests  s/// substitution:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $string = 'foo+bar+baz+boff'; my $separator = '+'; my $replace = '<REPLACE>'; ;; $string =~ s{ \Q$separator\E }{$replace}xmsg; dd $string; " "foo<REPLACE>bar<REPLACE>baz<REPLACE>boff"
    Note that this works regardless of the meta-nature of the separator character/string:
    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $string = 'fooXbarXbazXboff'; my $separator = 'X'; my $replace = '<REPLACE>'; ;; $string =~ s{ \Q$separator\E }{$replace}xmsg; dd $string; " "foo<REPLACE>bar<REPLACE>baz<REPLACE>boff"

    Update: See quotemeta; see perlre, perlretut, and perlrequick; see Regexp Quote-Like Operators in perlop for s///.


    Give a man a fish:  <%-{-{-{-<

      It is unnecessary and extremely annoying for you to provide your path and to wrap the code the way you do, consider: Invocation:
      perl -wMstrict -MData::Dump -le
      
      Code:
      my $string = 'fooXbarXbazXboff'; my $separator = 'X'; my $replace = '<REPLACE>'; $string =~ s{ \Q$separator\E }{$replace}xmsg; dd $string;
      Yields
      foo<REPLACE>bar<REPLACE>baz<REPLACE>boff
      
      So much more useful for the OP and no additional effort on your behalf.
Re: Splitting string based on potentially escaped character
by kcott (Archbishop) on Mar 08, 2017 at 06:56 UTC

    G'day clamport,

    Welcome to the Monastery.

    "Does anyone have any recommendations?"

    If $SEP_CHAR really is a character, then transliteration is possibly your best bet.

    $ perl -E 'say "a+b+c" =~ y/+/_/r' a_b_c

    If $SEP_CHAR isn't a character, I'd suggest giving it a more meaningful, and less confusing, name; perhaps $SEP_STR for a separator string. In this case, substitution (as already suggested) would be a better option.

    $ perl -E 'my ($x, $y, $z) = qw{a+b+c + __}; say $x =~ s/\Q$y/$z/gr' a__b__c

    If you're using '\Q' to escape characters to the end of the string, the '\E' is superfluous. You really only need this if you want to escape part of a string.

    $ perl -E 'my ($x, $y) = qw{++ --}; say for "\Q$x\E$y", "$x\Q$y", "\Q$ +x$y"' \+\+-- ++\-\- \+\+\-\-

    See also:

    — Ken

Re: Splitting string based on potentially escaped character
by 1nickt (Canon) on Mar 07, 2017 at 18:47 UTC

    You could use a compiled regexp:

    perl -E ' my $x = "foo+bar+baz"; my $y = "+"; my $z = qr/\Q$y\E/; say for split $z, $x; '
    Output:
    foo bar baz
    ... but why not just use:
    say for split /\Q$y\E/, $x;
    in all cases? (There's probably a reason; I don't know it.)

    Hope this helps!


    The way forward always starts with a minimal test.

      That's great, Thank you! I'm relatively new to perl so I must not have had the syntax correct when attempting to use the /Q/E. Much appreciated!