string processing and regexp alternation...

princepawn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: string processing and regexp alternation... by mirod (Canon) on Jan 26, 2002 at 15:27 UTC
pushing and poping strings can be done with `substr`: `my( $first_char, $rest)= (substr( $string, 0, 1), substr( $string, 1)) +; my( $last_char, $rest)= (substr( $string, -1), substr( $string, 0, -1) +);` [download] or if you really want to use `push` and `pop`: `my @char= split //, $string; my $first_char= shift @char; my $rest= join "", @char; my @char= split //, $string; my $last_char= pop @char; my $rest= join "", @char;` [download] Note that using thi is slower than using `substr`.	[reply] [d/l] [select]
Re: string processing and regexp alternation... by Juerd (Abbot) on Jan 26, 2002 at 16:29 UTC
Maybe this helps: `my $C='((w)hite\|(b)lack)'; my @c; my @d; while (<DATA>) { if (/$C\s+"princepawn"/i) { push @c, [ // ]; push @d, [ $1, $2 \|\| $3 ]; } } use Data::Dumper; print Dumper(\@c); # What you said print Dumper(\@d); # ['black', 'b'] or ['white', 'w'] __DATA__ white "princepawn" foo black "princepawn"` [download] About popping and pushing strings: `sub popstr { substr $_[0], -1, 1, '' } # or chop sub shiftstr { substr $_[0], 0, 1, '' } sub pushstr { length($_[0] .= join '', @_[1..$#_]) } sub unshiftstr { length($_[0] = join('', @_[1..$#_]) . $_[0]) } my $string = 'testing'; my $foo = shiftstr $string; # esting my $bar = popstr $string; # estin pushstr $string, $foo; # estint unshiftstr $string, $bar; # gestint print "$string\n";` [download] Or, using regexes: `sub popstr { $_[0] =~ s/(.)\z//s; $1 } sub shiftstr { $_[0] =~ s/^(.)//s; $1 } sub pushstr { $_[0] =~ s/\z/join '', @_[1..$#_]/e; length $_[0]; } sub unshiftstr { $_[0] =~ s/^/join '', @_[1..$#_]/e; length $_[0]; } my $string = 'testing'; my $foo = shiftstr $string; # esting my $bar = popstr $string; # estin pushstr $string, $foo; # estint unshiftstr $string, $bar; # gestint print "$string\n";` [download] `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l] [select]
Re: string processing and regexp alternation... by n3dst4 (Scribe) on Jan 26, 2002 at 16:45 UTC
The number of positional variables is determinative. In your example, there are three sets of parens, so there are three vars. If the expression was: `'((b)li(n)d\|mi(c)e)'` [download] There would be four. (Sorry, I know you know this :) But if the alternation left out the 'unused' parens there could be two or three vars. So you'd have to start doing some other checking before moving on, or you wouldn't know which parens the vars came from. And hey, you can always `grep {defined;}`.	[reply] [d/l] [select]
Re: string processing and regexp alternation... by danger (Priest) on Jan 27, 2002 at 13:17 UTC
There is the oft neglected `$+` RE variable that refers to the highest numbered capturing parens that actually matched --- when the choice is limited as in your example it comes in handy: `my $pat = qr/\w+ ((w)hite\|(b)lack)/; $_ = 'mostly black'; print "$1 starts with $+\n" if /$pat/; $_ = 'but some white spaces'; print "$1 starts with $+\n" if /$pat/;` [download] Of course, when your choice revolves around multiple captures per alternation, as in `/((w)hit(e)\|(b)lac(k))/`, then we are back to the same problem: do we have $2 and $3, or $4 and $5? $+ doesn't provide much help here. In some such cases, simply grep'ing the return values can give you what you want (the uncaptured parens would be undefined): `my $pat = qr/\w+ ((w)hit(e)\|(b)lac(k))/; $_ = 'mostly black'; my($word, $first, $last) = grep defined $_, /$pat/; print "$word starts with $first and ends with $last\n" if $word; $_ = 'but some white spaces'; ($word, $first, $last) = grep defined $_, /$pat/; print "$word starts with $first and ends with $last\n" if $word;` [download] But that can break down if you want to iterate over a /match/g in scalar context. In that case, you might formulate a quickie routine that returned only the successfully matched subgroups (in order) along the lines of: `my $pat = qr/\w+ ((w)hit(e)\|(b)lac(k))/; $str = 'mostly black but some white spaces'; while ($str =~ /$pat/g) { my @m = get_captures($str); print "$m[1] starts with $m[2] and ends with $m[3]\n"; } sub get_captures { map { defined $+[$_] ? substr($_[0],$-[$_],$+[$_]-$-[$_]) :() } 0..$#+; }` [download] Another possibility is that you may be using regular expressions when some other function is more appropriate --- getting the first or last character from a string (that you've already captured) can be done with substr as previously mentioned.	[reply] [d/l] [select]