Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I thought this will work but not worked..
"<div></div>" =~ /(?<start>.*?)((?!\< *\/[\w\d\-]+\>).)*/; print $+{start}; # it should print <div>

Replies are listed 'Best First'.
Re: regex catch pattern that doesn't contain a pattern
by toolic (Bishop) on Apr 29, 2015 at 17:26 UTC

    To shed some light, tip #4 from the Basic debugging checklist (Data::Dumper):

    use warnings; use strict; use Data::Dumper; "<div></div>" =~ /(?<start>.*?)((?!\< *\/[\w\d\-]+\>).)*/; print Dumper(\%+); __END__ $VAR1 = { 'start' => '' };

    Tip #9: YAPE::Regex::Explain

    ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ----------------------------------------------------------------------

    Your regex is telling it that nothing is a valid match. Have you considered using an HTML parser module from CPAN?

      I agree that the OPed regex will match and capture the first empty string it finds (i.e., the one at the beginning of the string), but is YAPE::Regex::Explain at all valid for constructs such as  (?<NAME>pattern) introduced with Perl version 5.10?


      Give a man a fish:  <%-(-(-(-<

        but is YAPE::Regex::Explain at all valid for constructs such as (?<NAME>pattern) introduced with Perl version 5.10?
        Nope. According to the POD (LIMITATIONS):
        There is no support for regular expression syntax added after Perl version 5.6, particularly any constructs added in 5.10.

        But, it is valid for .*?

      There is always wxPPIxregexplain.pl/ ppixregexplain.pl

      And rxrx
      (?<start>     # The start of a named capturing block (also $1)
        .*?         #   Match any character (except newline), zero-or-more times (as few as possible)
      )             # The end of the named capturing block
      (             # The start of a capturing block ($2)
        (?!         #   Match negative lookahead
          \< *      #     Match a literal '<' character, zero-or-more times (as many as possible)
          /         #     Match a literal '/' character
          [\w\d\-]+ #     Match any of the listed characters, one-or-more times (as many as possible)
          \>        #     Match a literal '>' character
        )           #   The end of negative lookahead
        .           #   Match any character (except newline)
      )*            # The end of $2 (matching zero-or-more times (as many as possible))

      Neither are perfect but they're almost perfect :P

Re: regex catch pattern that doesn't contain a pattern
by edimusrex (Monk) on Apr 29, 2015 at 17:26 UTC
    Could you explain a little bit more what you are trying to do? The syntax looks a little strange and would like a little more clarification. Thanks