jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I would like to split a string. Sounds simple, but somehow I couldn't do it. This string can have to shapes:
$str = "AAAAA_BBBB" ; #or $str1 = "AAAAA" ($part1, $part2) = ($str =~ /^(\w+)_(\w+)$/ ) ; # works fine ($part1, $part2) = ($str1 =~ /^(\w+)[_(\w+)]$/ ) ; # NOPE
(of course 'AAAAA' and 'BBBBB" represent text)
Any suggestions how I can parse both situations with only one regexp ?

Thanks
Luca

Replies are listed 'Best First'.
Re: simple regexp problem
by davidrw (Prior) on Dec 05, 2005 at 17:09 UTC
    what about just using perldoc -f split?
    ($part1, $part2) = split(/_/, $str); ($part1, $part2) = split(/_/, $str1); # or possibly better depending on context: @parts = split(/_/, $str);
      I think you have to add a limit parameter to split here, to not lose additional "underscore delimited fields"

      D:\temp>perl -e "$x='a_b_c'; ($j, $k) = split /_/, $x; print $k" b D:\temp>perl -e "$x='a_b_c'; ($j, $k) = split /_/, $x, 2; print $k" b_c
        true, but is totally dependent upon OP's requirements .. should "a_b_c" result in part1=a, part2=b and c getting lost, or should "a_b_c" result in part1=a and part2=b_c ?
Re: simple regexp problem
by choedebeck (Beadle) on Dec 05, 2005 at 17:08 UTC
    how about
    my ($part1, $part2) = split /_/, $str;

      Sometimes you don't want to use split because it's too lenient! Your example will be happy to accept AAAA_BBBB_CCCC. The original poster probably wants this string to be rejected, but split will eat AAAA, BBBB, and discard CCCC.

        Which is exactly why the original poster should be capturing the output into an array instead of multiple sequentially numbered scalars.

        I think split is a very good and acceptable solution for this problem.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        

        In that case, split still works:

        my ($part1, $part2) = split /_/, $str, 2;
        That extra parameter asks for no more than two items - more _'s will not be even looked at.

Re: simple regexp problem
by ikegami (Patriarch) on Dec 05, 2005 at 17:05 UTC

    Square brackets don't mean what you think they mean. Use (?:...) to group things without capturing, and use the ? suffix to make something optional.

    /^(\w+)(?:_(\w+))?$/
      Ah shoot! While the advice in the above post is sound, the code won't work because "\w" includes "_"! Fix:
      /^([a-zA-Z0-9+])(?:_(\w+))?$/

      or

      /^((?:(?!_)\w)+)(?:_(\w+))?$/
        This works too: /^(\w+?)(?:_(\w+))?$/ But why the last ? ? If I leave it off, it doesn't parse the string "AAAAAAA"

        Thanks
        Luca