in reply to Regular Expression Matching

Be very careful about using regular expressions to parse HTML. What if they use single quotes around attributes? What if they drop the quotes altogether? Your regex could fail.

danger pointed out the benefit of using a negated character class. This is not only more precise, it can have huge performance benefits. This node can give you a good background on this.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: (Ovid) Re: Regular Expression Matching
by Anonymous Monk on Mar 08, 2001 at 05:53 UTC
    Thanks everyone! The negated character class is definetly my solution. Ovid, I looked at the discussion you sent and I think I (for the most part) understood it. However, I'm confused what the "non-backreferencing parenthesis" is for. (?: I tried to find more info on it and came up a little empty handed. Thanks for your help!
      Occassionally, you'll find a need to write a complicated regular expression, but you want to be able to group elements of it without capturing them to a dollar/number ($1, $2, etc.) variable. For example, imagine a simple log file in this format:
      line number: action filename
      A typical section of the log may have data as follows:
      9248: OPEN perl.doc 9249: DELETE incriminating_evidence.txt 9250: EDIT autoexec.bat
      Ignoring the over-simplicity of this example, what if you wanted to write a logfile analyzer that justs extracts records that have been deleted or edited? One way, though perhaps not the best way, to do that would be the following:
      while (<>) { if ( /^(\d+):\s(?:EDIT|DELETE)\s(.*)$/ ) { $results{ $1 } = $2; } }
      What the (?:xxx) does is allow me to group that alternation without capturing the value. It's useful in that it is faster than capturing the value and there's no sense in capturing data if I really don't need it (though I'd probably want to know if a file was edited or deleted).

      Also, note that I do have a dot star at the end. This is appropriate in this case because it's doing exactly what I wanted it to do: slurp up the rest of the line.

      Also, in case you weren't aware: a regular expression without a binding operator ('=~' or '!~') automatically matches against $_, as in the above example.

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.