Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Regex capturing either quoted strings or bare words (final backslash)

by tye (Sage)
on Jan 13, 2003 at 18:53 UTC ( [id://226557]=note: print w/replies, xml ) Need Help??


in reply to Regex capturing either quoted strings or bare words

So how do you include a value that contains a space and ends with a backslash?

You should change \\\2 to \\. and then decide which of three treatments you want:

  1. \x always becomes x
  2. \x stays \x except that \" becomes " and \\ becomes \
  3. \x stays \x except that \" becomes " and \\" becomes \" and \\\" becomes \\" etc.

But I find a much better method is to not use \ for escaping embedded quote characters if that is the only character you want to escape. Instead, use two adjacent quote characters to represent one embedded quote character.

That is, change \\\2 to \2\2 and then post-process the match to undouble the embedded quote characters.

One problem with this approach is if you end up nesting lots of these constructs you'll end up with:     q{one="two=""three=""""a b""""""" two=abc} but that isn't much worse than the alternative of     q{one="two=\"three=\\\"a b\\\"\"" two=abc} and allowing multiple quote characters (like you have) is the real solution to such problems     q{one="two='three=`a b`'" two=abc} and avoiding a single escape character is why I prefer my approach.

Update: I wouldn't use a non-greedy match. I'd also be more strict so the regex engine doesn't have any option about matching things other than the way I want it to. So in your original code [^\2] should be [^\\\2] (though I recall [^\2] not working when I tested it so perhaps this means that your code won't work on older versions of Perl).

You don't want the regex engine to decide to look at 'I\'m' and match \ against [^\2] and then have the middle ' terminate the string too early. Right now this probably won't happen due to subtle rules (I assume, based on your testing -- the rules are subtle enough that I'd have guessed that the regex would go the other route) but this leeway means that the regex can backtrack when a closing quote is missing and match a different quote in the manner I describe. You don't want to allow this.

You should also allow empty strings (so change +? to just *). And I'd use [^\2]+ in hopes of being more efficient, but such concerns should be considered last.

Update2: I notice you use \t in your values but I don't see you dealing with that anywhere. Is that supposed to stay \t or become a tab? Or is that just to test that other backslashes doesn't get eaten? For that matter, I don't see where you turn \' into ' so...

And no need to backslash the quotes in a character class so you can use ["'`] instead (though it doesn't hurt either).

You might want to look at Regex::Common to compare how it does some of these things. Unfortunately, reading the code of that module is rather difficult. Luckilly, you can just print out the regexes it gives back to you instead. (:

                - tye

Replies are listed 'Best First'.
Re: Re: Regex capturing either quoted strings or bare words (final backslash)
by gmax (Abbot) on Jan 13, 2003 at 19:30 UTC
    Many thanks for your analysis. You have pinpointed many of the risks that I overlooked in my tests.
    I didn't think about a string ending with a backslash. I will follow your suggestion about doubling the quotes. It is unlikely to happen that much, though. Having three different quotes available, the users should be able to use the more appropriate symbol to avoid clashes.
    "\t" was just a test, it wasn't supposed to be extended (at least in this script.)
     _  _ _  _  
    (_|| | |(_|><
     _|   
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://226557]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-26 05:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found