So how do you include a value that contains a space and ends with a backslash?

You should change \\\2 to \\. and then decide which of three treatments you want:

  1. \x always becomes x
  2. \x stays \x except that \" becomes " and \\ becomes \
  3. \x stays \x except that \" becomes " and \\" becomes \" and \\\" becomes \\" etc.

But I find a much better method is to not use \ for escaping embedded quote characters if that is the only character you want to escape. Instead, use two adjacent quote characters to represent one embedded quote character.

That is, change \\\2 to \2\2 and then post-process the match to undouble the embedded quote characters.

One problem with this approach is if you end up nesting lots of these constructs you'll end up with:     q{one="two=""three=""""a b""""""" two=abc} but that isn't much worse than the alternative of     q{one="two=\"three=\\\"a b\\\"\"" two=abc} and allowing multiple quote characters (like you have) is the real solution to such problems     q{one="two='three=`a b`'" two=abc} and avoiding a single escape character is why I prefer my approach.

Update: I wouldn't use a non-greedy match. I'd also be more strict so the regex engine doesn't have any option about matching things other than the way I want it to. So in your original code [^\2] should be [^\\\2] (though I recall [^\2] not working when I tested it so perhaps this means that your code won't work on older versions of Perl).

You don't want the regex engine to decide to look at 'I\'m' and match \ against [^\2] and then have the middle ' terminate the string too early. Right now this probably won't happen due to subtle rules (I assume, based on your testing -- the rules are subtle enough that I'd have guessed that the regex would go the other route) but this leeway means that the regex can backtrack when a closing quote is missing and match a different quote in the manner I describe. You don't want to allow this.

You should also allow empty strings (so change +? to just *). And I'd use [^\2]+ in hopes of being more efficient, but such concerns should be considered last.

Update2: I notice you use \t in your values but I don't see you dealing with that anywhere. Is that supposed to stay \t or become a tab? Or is that just to test that other backslashes doesn't get eaten? For that matter, I don't see where you turn \' into ' so...

And no need to backslash the quotes in a character class so you can use ["'`] instead (though it doesn't hurt either).

You might want to look at Regex::Common to compare how it does some of these things. Unfortunately, reading the code of that module is rather difficult. Luckilly, you can just print out the regexes it gives back to you instead. (:

                - tye

In reply to Re: Regex capturing either quoted strings or bare words (final backslash) by tye
in thread Regex capturing either quoted strings or bare words by gmax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.