fiat has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all

I need to process two (and possibly a third) group of patterns in my input . My current code snippet for this is:

while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)??/ig) { $data_out = "$2, $1$3 "; errors here ; further processing here }

However I am getting this error: 'Use of uninitialized value in concatenation (.) or string' referring I believe to $3 when it does not exist.

I did try the following:

while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)??/ig) { if ($+ =~ /<\?query\?>/i) {; errors here now $data_out = "$2, $1$3 "; } else { $data_out = "$2, $1 "; } ; further processing here }

But then I get this:
Use of uninitialized value in pattern match (m//)

Of course there is probably a much simpler way of achieving this.

Thanks for any advice
fiat

Replies are listed 'Best First'.
Re: Special Variables in Regular Expression
by kyle (Abbot) on Jul 25, 2008 at 14:36 UTC

    That warning is just a warning, so you can turn it off if you want to.

    while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)??/ig) { { no warnings 'uninitialized'; $data_out = "$2, $1$3 "; } # further processing }

    When you turn off a warning that way, it's good to keep that change to the smallest scope possible. In this case, I've scoped it just to that one assignment. See perllexwarn for more.

    Alternately, you could check whether $3 is defined:

    while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)??/ig) { if ( defined $3 ) { $data_out = "$2, $1$3 "; } else { $data_out = "$2, $1 "; } # further processing }

    Same thing but different:

    while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)??/ig) { $data_out = ( defined $3 ) ? "$2, $1$3 " : "$2, $1 "; }

    P. S.—Yay for warnings!

Re: Special Variables in Regular Expression
by toolic (Bishop) on Jul 25, 2008 at 14:47 UTC
    The ?? at the end of your regex looks a little suspicious. One question mark would mean "match the string <?query?> zero or one times". Do you really want the 2nd question mark un-escaped there?

    It would have been more helpful if you had provided an example of the $data_in string. Is this what you had in mind? ...

    use strict; use warnings; my $data_in = '<container><a>bbb</a><b>cccc</b></container><?query?>'; my $data_out; while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)?/ig) { print "1=$1\n"; print "2=$2\n"; print "3=$3\n"; $data_out = "$2, $1$3 "; } __END__ 1=bbb 2=cccc 3=<?query?>
    Of course there is probably a much simpler way of achieving this.
    I agree with Fletch that you probably would be better off using a CPAN parser than your own regex solution.

    Update: Another general note on regexes. You can use alternate delimiters in order to avoid excessive escaping of forward slashes. For example, you can replace // with m{}:

    while ($data_in =~ m{<container><a>(.+?)</a><b>(.+?)</b></container>(< +\?query\?>)?}ig) {
      The lazy modifier on the regex atom at the very end of the pattern -- the (...)?? -- means that the final capture is never needed to make the overall regex match, so the corresponding capture variable (i.e., $3) will never be defined.
Re: Special Variables in Regular Expression
by Fletch (Bishop) on Jul 25, 2008 at 14:35 UTC

    The much simpler way's probably going to involve using a real XML parser (see XML::Twig or XML::Simple) rather than trying to kludge something together with regexen.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Special Variables in Regular Expression
by ikegami (Patriarch) on Jul 25, 2008 at 14:39 UTC

    I tend to avoid putting ?, +, *, etc on captures. I don't know if the behaviour is even documented, but the behaviour is as follows:

    (...)? Returns undef if it matched 0 items. ((?:...)?) Returns an empty but defined string if it matched 0 items.

    You can use defined to test in the first case.

Re: Special Variables in Regular Expression
by InfiniteSilence (Curate) on Jul 25, 2008 at 16:04 UTC
    toolic was right, the problem is that extra question mark:
    my ($data_in, $data_out); $data_in = q|<container><a>smile!!! </a><b> smile again!! </b></contai +ner><?query?>|; while ($data_in =~ /<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container> +(<\?query\?>)?/ig) { $data_out = "$2, $1$3 "; #errors here ; #further processing here print $data_out; }

    If you put back the second question mark at the end you will get the error:

    Use of uninitialized value $3 in concatenation (.) or string at perlmo +nks_700143.pl line 8.
    If you are new to using regular expressions and are on Windows, you can try this free tool: Regex Coach. It will help you see, graphically, how the regex breaks down and will allow you to step through your expression.

    Celebrate Intellectual Diversity

      To decipher foreign regexes, I also use YAPE::Regex::Explain (which is also free because it is on CPAN :). Using the OP's original regex (excluding the modifiers, /ig):
      use warnings; use strict; use YAPE::Regex::Explain; my $re = '<container><a>(.+?)<\/a><b>(.+?)<\/b><\/container>(<\?query\ +?>)??'; print YAPE::Regex::Explain->new($re)->explain;

      produces this output:

      While helpful, I still did not grok the ?? usage :(

      Thanks for the info InfiniteSilence. RegexCoach is neat.

Re: Special Variables in Regular Expression
by Anonymous Monk on Jul 25, 2008 at 14:37 UTC
    $_ = q~ one one two one two 3 ~; while( /(one) (two)( 3)?/gs ){ print 'Got 1 ', $1,"\n" if defined $1; print 'Got 2 ', $2,"\n" if defined $2; print 'Got 3 ', $3,"\n" if defined $3; print "----\n"; } __END__ Got 1 one Got 2 two ---- Got 1 one Got 2 two Got 3 3 ----