Cap'n Steve has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently writing something that uses a variant of bbcode to allow the user to create an HTML form. It works so far, but in an effort to idiot-proof it, I've decided to make putting quotes around the attributes optional. This is the regexp that does that:
/name=(")? # "name" attribute followed by optional quote ([^(?(1)"|\s\])]+) # if the quotation mark was present, match until an +other one, otherwise stop at whitespace or a bracket (?(1)") # match end quote, if applicable /xi
What I'm wondering now is if I should allow users to escape quotes in the attribute. I know this is more trouble than it's worth, but I'm curious. How would I go about ignoring the quotation marks preceded by a backslash? I've tried a negative lookbehind within the conditional, but it didn't work.
  • Comment on Regular expression help: Taking HTML-like attributes with optional quotes
  • Download Code

Replies are listed 'Best First'.
Re: Regular expression help: Taking HTML-like attributes with optional quotes
by tlm (Prior) on Jun 28, 2005 at 10:24 UTC
Re: Regular expression help: Taking HTML-like attributes with optional quotes
by rlucas (Scribe) on Jun 28, 2005 at 07:08 UTC
    ... might be one for Parse::RecDescent.

    If you need something performant, P::RD may not be the way to go. But if you've never programmed a parser before, using P::RD will make you a better programmer for the experience.

    Also, Parse::RecDescent is very well documented.

    A final bit of info to consider is that there is a set of regexes that Perl will not play nice with (due to how the regex engine works, being nondeterministic, IIRC). By "not play nice," I mean take billions of CPU cycles for ostensibly simple decisions. I think that dealing with optional escaped quoting with regexes might be putting you into that territory.