princepawn has asked for the wisdom of the Perl Monks concerning the following question:

I am upset. First off, I know that a new positional variable is created each time a new set of parentheses is encountered in a regexp. Such that:
my $C='((w)hite|(b)lack)'; open F, $file; my ($c) = grep /$C\s+"princepawn"/i, <F>;
yields
[ 'Black', undef, 'B' ]; ### or [ 'Black', 'W', undef ];
but you would think that alternation would realize that only one of the alternatives would bind right? But oh well.

The other thing is it is too hard to get the first character from a string in Perl. If I can push and pop arrays, why can't i push and pop strings?

Replies are listed 'Best First'.
Re: string processing and regexp alternation...
by mirod (Canon) on Jan 26, 2002 at 15:27 UTC

    pushing and poping strings can be done with substr:

    my( $first_char, $rest)= (substr( $string, 0, 1), substr( $string, 1)) +; my( $last_char, $rest)= (substr( $string, -1), substr( $string, 0, -1) +);

    or if you really want to use push and pop:

    my @char= split //, $string; my $first_char= shift @char; my $rest= join "", @char; my @char= split //, $string; my $last_char= pop @char; my $rest= join "", @char;

    Note that using thi is slower than using substr.

Re: string processing and regexp alternation...
by Juerd (Abbot) on Jan 26, 2002 at 16:29 UTC
    Maybe this helps:
    my $C='((w)hite|(b)lack)'; my @c; my @d; while (<DATA>) { if (/$C\s+"princepawn"/i) { push @c, [ // ]; push @d, [ $1, $2 || $3 ]; } } use Data::Dumper; print Dumper(\@c); # What you said print Dumper(\@d); # ['black', 'b'] or ['white', 'w'] __DATA__ white "princepawn" foo black "princepawn"


    About popping and pushing strings:
    sub popstr { substr $_[0], -1, 1, '' } # or chop sub shiftstr { substr $_[0], 0, 1, '' } sub pushstr { length($_[0] .= join '', @_[1..$#_]) } sub unshiftstr { length($_[0] = join('', @_[1..$#_]) . $_[0]) } my $string = 'testing'; my $foo = shiftstr $string; # esting my $bar = popstr $string; # estin pushstr $string, $foo; # estint unshiftstr $string, $bar; # gestint print "$string\n";

    Or, using regexes:
    sub popstr { $_[0] =~ s/(.)\z//s; $1 } sub shiftstr { $_[0] =~ s/^(.)//s; $1 } sub pushstr { $_[0] =~ s/\z/join '', @_[1..$#_]/e; length $_[0]; } sub unshiftstr { $_[0] =~ s/^/join '', @_[1..$#_]/e; length $_[0]; } my $string = 'testing'; my $foo = shiftstr $string; # esting my $bar = popstr $string; # estin pushstr $string, $foo; # estint unshiftstr $string, $bar; # gestint print "$string\n";

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: string processing and regexp alternation...
by n3dst4 (Scribe) on Jan 26, 2002 at 16:45 UTC
    The number of positional variables is determinative. In your example, there are three sets of parens, so there are three vars. If the expression was:

    '((b)li(n)d|mi(c)e)'

    There would be four. (Sorry, I know you know this :) But if the alternation left out the 'unused' parens there could be two or three vars. So you'd have to start doing some other checking before moving on, or you wouldn't know which parens the vars came from. And hey, you can always grep {defined;}.

Re: string processing and regexp alternation...
by danger (Priest) on Jan 27, 2002 at 13:17 UTC

    There is the oft neglected $+ RE variable that refers to the highest numbered capturing parens that actually matched --- when the choice is limited as in your example it comes in handy:

    my $pat = qr/\w+ ((w)hite|(b)lack)/; $_ = 'mostly black'; print "$1 starts with $+\n" if /$pat/; $_ = 'but some white spaces'; print "$1 starts with $+\n" if /$pat/;

    Of course, when your choice revolves around multiple captures per alternation, as in  /((w)hit(e)|(b)lac(k))/, then we are back to the same problem: do we have $2 and $3, or $4 and $5? $+ doesn't provide much help here. In some such cases, simply grep'ing the return values can give you what you want (the uncaptured parens would be undefined):

    my $pat = qr/\w+ ((w)hit(e)|(b)lac(k))/; $_ = 'mostly black'; my($word, $first, $last) = grep defined $_, /$pat/; print "$word starts with $first and ends with $last\n" if $word; $_ = 'but some white spaces'; ($word, $first, $last) = grep defined $_, /$pat/; print "$word starts with $first and ends with $last\n" if $word;

    But that can break down if you want to iterate over a /match/g in scalar context. In that case, you might formulate a quickie routine that returned only the successfully matched subgroups (in order) along the lines of:

    my $pat = qr/\w+ ((w)hit(e)|(b)lac(k))/; $str = 'mostly black but some white spaces'; while ($str =~ /$pat/g) { my @m = get_captures($str); print "$m[1] starts with $m[2] and ends with $m[3]\n"; } sub get_captures { map { defined $+[$_] ? substr($_[0],$-[$_],$+[$_]-$-[$_]) :() } 0..$#+; }

    Another possibility is that you may be using regular expressions when some other function is more appropriate --- getting the first or last character from a string (that you've already captured) can be done with substr as previously mentioned.