archimca has asked for the wisdom of the Perl Monks concerning the following question:

could anyone please help me to understand what's going on in these regular expressions. I am new for Perl..........
 $page=~ m/(%TABLE{.*?name\=\"History[^"]*"[^}]*}%\s*(\|[^\|]*){3}\|\s)((\|[^\|]*){3}\|\s)*/o);

$page =~ s/(%TABLE{.*?name\=\"History[^"]*"[^}]*}%\s*(\|[^\|]*){3}\|\s +)((\|[^\|]*){3}\|\s)*/$table/eo;

Update Thank you so much for all your help. I am all set now. I appreciate everybody's input in helping me to understand this complicated part of Perl Regular expression. :)

Replies are listed 'Best First'.
Re: Regular expression Problem
by kennethk (Abbot) on Jan 13, 2011 at 22:10 UTC

    As best as I can tell, there are syntax problems with your posted regular expressions. This might just be because the square brackets ( [ ] ) in your post got converted into links. This is because you didn't wrap your code in <code> tags. See Writeup Formatting Tips.

    You can get Perl to interpret a regular expression for you using the YAPE::Regex::Explain module off of CPAN. You might use that module something like:

    #!/usr/bin/perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/regex_here/; print YAPE::Regex::Explain->new($re)->explain();

    Alternatively, if you clean up your post or post again, I might be able to explain the results to you.

    Update:AnomalousMonk's comment made me reread the above; by "post again" I meant in this thread, but clearly that was inobvious.

      ... if you ... post again ...

      archimca: Please clean up this thread. Please don't post the same thing in another thread, even if more readable.

      Thanks for your suggestion....
        Thank you for cleaning that up. The code:

        #!/usr/bin/perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/(%TABLE{.*?name\=\"History[^"]*"[^}]*}%\s*(\|[^\|]*){3}\|\ +s)((\|[^\|]*){3}\|\s)*/o; print YAPE::Regex::Explain->new($re)->explain();
        outputs
        The regular expression: (?-imsx:(%TABLE{.*?name="History[^"]*"[^}]*}%\s*(\|[^\|]*){3}\|\s)((\| +[^\|]*){3}\|\s)*) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- %TABLE{ '%TABLE{' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- name="History 'name="History' ---------------------------------------------------------------------- [^"]* any character except: '"' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- [^}]* any character except: '}' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- }% '}%' ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2 (3 times): ---------------------------------------------------------------------- \| '|' ---------------------------------------------------------------------- [^\|]* any character except: '\|' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ){3} end of \2 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \2) ---------------------------------------------------------------------- \| '|' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \3 (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \4 (3 times): ---------------------------------------------------------------------- \| '|' ---------------------------------------------------------------------- [^\|]* any character except: '\|' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ){3} end of \4 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \4) ---------------------------------------------------------------------- \| '|' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- )* end of \3 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \3) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        The match and the substitution have identical bodies, so they will match the same thing. Let us know if the above is unclear.

        As JavaFan points out below, you should likely omit the o and e modifiers. bart and AnomalousMonk's discussion below is also very useful - You can likely have wholly equivalent behavior in your code while having only one instance of this long and fragile regex.

Re: Regular expression Problem
by bart (Canon) on Jan 13, 2011 at 23:42 UTC
    As far as I can tell, those two regexes are identical: one as a plain match, and one as a substitution.

    I'm guessing somebody is first testing to see if anything is found, and if it does, replace it.

    Don't do that. Just plain s/// is enough, it won't do anything if nothing is found, so it's safe.

    Now all you've got is needless repetition, and a source of errors if somebody does a bad job copy/pasting after the regex is updated.

    Now, as far as what it's trying to match, it's hard to tell formatted like this, but it looks like somebody tried to match a string looking like "%TABLE ... %" with something (of a specific format) instead of the "...", and in that something, the string 'name="History' (plus something more) ought to be found.

    Well: it is wrong. It'll most likely do a substitution on a larger string, spanning more than one such string, if the first one doesn't contain that substring. As a simple example: it'd match a string like "%TABLE% blah blah blah %TABLE name="History"(something)%" as a single match. I don't think that is what's intended.

    Better would be to try in match the string in a first step, and in the substitution, see if it contains that substring. If it does: substitute; if it doesn't: leave the original matched string.

    At the very least, you should forbid presence of a "%" character between the "%TABLE" and the "name=" parts.

      As far as I can tell, those two regexes are identical: one as a plain match, and one as a substitution.
      I'm guessing somebody is first testing to see if anything is found, and if it does, replace it.
      Don't do that. Just plain s/// is enough, it won't do anything if nothing is found, so it's safe.

      And if it happens to be the case that the existence of a match is used to conditionally control more than just a substitution, the  s/// operator returns the number of substitutions done, or false if there is no match, so it works just fine as a conditional expression. See s/PATTERN/REPLACEMENT/msixpogce in perlop.

      >perl -wMstrict -le "my $str = 'foo'; ;; if ($str =~ s{ foo }{bar}xms) { print qq{substituted with $str}; } " substituted with bar
Re: Regular expression Problem
by luis.roca (Deacon) on Jan 13, 2011 at 22:28 UTC

    What specifically don't you understand about it? Please don't say "Everything". Can you describe the data(text) these regular expressions are being applied to?

    A little more information will go a long way to getting your question answered. :)


    "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
Re: Regular expression Problem
by JavaFan (Canon) on Jan 14, 2011 at 10:17 UTC
    On top of what already has been said, lose the modifiers. In your case, the /o is pointless. In the few regexes where it may matter, /o at best gives you a very, very tiny performance gain*, but can easily lead to incorrect code. As for the /e, it's useless here as well. It tells the operator to not use the replacement directly, but to evaluate it as code. But the result of an expression consisting of a single variable is just the value of that variable anyway.

    *There used to be a more significant speed up (which only matters if you execute the same regex repeatedly), but that speed up has been made largely redundant by regex improvements made long, long time ago (around the 5.003/5.004 era, IIRC).

Re: Regular expression Problem
by planetscape (Chancellor) on Jan 15, 2011 at 08:31 UTC