in reply to regex catch pattern that doesn't contain a pattern

To shed some light, tip #4 from the Basic debugging checklist (Data::Dumper):

use warnings; use strict; use Data::Dumper; "<div></div>" =~ /(?<start>.*?)((?!\< *\/[\w\d\-]+\>).)*/; print Dumper(\%+); __END__ $VAR1 = { 'start' => '' };

Tip #9: YAPE::Regex::Explain

---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ----------------------------------------------------------------------

Your regex is telling it that nothing is a valid match. Have you considered using an HTML parser module from CPAN?

Replies are listed 'Best First'.
Re^2: regex catch pattern that doesn't contain a pattern
by AnomalousMonk (Archbishop) on Apr 29, 2015 at 18:03 UTC

    I agree that the OPed regex will match and capture the first empty string it finds (i.e., the one at the beginning of the string), but is YAPE::Regex::Explain at all valid for constructs such as  (?<NAME>pattern) introduced with Perl version 5.10?


    Give a man a fish:  <%-(-(-(-<

      but is YAPE::Regex::Explain at all valid for constructs such as (?<NAME>pattern) introduced with Perl version 5.10?
      Nope. According to the POD (LIMITATIONS):
      There is no support for regular expression syntax added after Perl version 5.6, particularly any constructs added in 5.10.

      But, it is valid for .*?

        But the  .*? is wrapped in a  (?<start>.*?) Is something like that explained reliably in all cases?


        Give a man a fish:  <%-(-(-(-<

Re^2: regex catch pattern that doesn't contain a pattern
by Anonymous Monk on Apr 29, 2015 at 22:42 UTC

    There is always wxPPIxregexplain.pl/ ppixregexplain.pl

    And rxrx
    (?<start>     # The start of a named capturing block (also $1)
      .*?         #   Match any character (except newline), zero-or-more times (as few as possible)
    )             # The end of the named capturing block
    (             # The start of a capturing block ($2)
      (?!         #   Match negative lookahead
        \< *      #     Match a literal '<' character, zero-or-more times (as many as possible)
        /         #     Match a literal '/' character
        [\w\d\-]+ #     Match any of the listed characters, one-or-more times (as many as possible)
        \>        #     Match a literal '>' character
      )           #   The end of negative lookahead
      .           #   Match any character (except newline)
    )*            # The end of $2 (matching zero-or-more times (as many as possible))

    Neither are perfect but they're almost perfect :P