bplegend has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have some strings. It contains one or more pairs of single quote or double quotes. Within the single or double quotes boundary, there will be some escaped quotes. My goal is to mask out anything within the single quotes or double quotes. My regular expression works fine in most cases but it fails when there is an escaped quotes. I tried to remove the \' first using this `s/\x5c\x27//g` but it didn't work. If it works, the test #2 result should look the same as test #1.
$line = sprintf qq{Test #1 'Show me Waynes world','Jennys Basketball s +hoes'\n}; $line =~ s/(['"]).+?(['"])/$1SSS$2/g; print $line; $line = sprintf qq{Test #2 'Show me Wayne\'s world','Jenny\'s Basketba +ll shoes'\n}; $line =~ s/(['"]).+?(['"])/$1SSS$2/g; print $line; __END__ Test #1 'SSS','SSS' Test #2 'SSS's world'SSS'Jenny'SSS'

Replies are listed 'Best First'.
Re: skip over an escaped single quote
by GrandFather (Saint) on Jan 30, 2011 at 01:59 UTC

    A technique I often use for this sort of problem is to have the regex work its way over the problematic section one character at a time and use look around assertions to handle the quoting. Consider:

    use warnings; use strict; my $line = qq{Test #1 'Show me Waynes world','Jennys Basketball shoes' +\n}; print "< $line"; $line =~ s/(['"]).+?(['"])/$1SSS$2/g; print "> $line"; $line = qq{Test #2 'Show me Wayne\\'s world','Jenny\\'s Basketball sho +es'\n}; print "< $line"; $line =~ s/(['"]) (?: (?! (?<!\\)\1 ) .)+ \1/$1SSS$1/gx; print "> $line";

    Prints:

    < Test #1 'Show me Waynes world','Jennys Basketball shoes' > Test #1 'SSS','SSS' < Test #2 'Show me Wayne\'s world','Jenny\'s Basketball shoes' > Test #2 'SSS','SSS'

    Note that I removed to redundant sprintfs, showed before and after versions of the test strings, and fixed the quote issue in the second test string.

    True laziness is hard work
Re: skip over an escaped single quote
by eyepopslikeamosquito (Archbishop) on Jan 30, 2011 at 01:49 UTC

    As a matter of technique, you should print the value of $line before you change it to make sure it is what you think it is. In your second test, you need to escape the \ after Wayne. The following code should get you started:

    $line = sprintf qq{Test #1 'Show me Waynes world','Jennys Basketball s +hoes'\n}; print $line; # <- added this line $line =~ s/(')[^'\\]*(?:\\.[^'\\]*)*(')/$1SSS$2/g; print $line; $line = sprintf qq{Test #2 'Show me Wayne\\'s world','Jenny\\'s Basket +ball shoes'\n}; # <- changed this line (two extra \ added) print $line; # <- added this line $line =~ s/(')[^'\\]*(?:\\.[^'\\]*)*(')/$1SSS$2/g; print $line;
    How to match quoted strings is discussed in many places, notably in Friedl's Mastering Regular Expressions O'Reilly book (the regex I used is derived from one in Friedl's book).

    Update: Matching quoted strings is also discussed in perlre (in the "Quantifiers" section), where they suggest /"(?:[^"\\]++|\\.)*+"/ (perl 5.10 and above); see the discussion in perlre for more details. See also this stack overflow question and $RE{quoted} in Regexp::Common.

Re: skip over an escaped single quote
by toolic (Bishop) on Jan 30, 2011 at 01:48 UTC
    If you can construct your line a little differently, you can make your regex a whole lot uglier by using negative look-behind assertions (perlre):
    use strict; use warnings; my $line = qq{Test #1 'Show me Waynes world','Jennys Basketball shoes' +\n}; $line =~ s/(?<!\\)(['"]).+?(?<!\\)(['"])/$1SSS$2/g; print $line; $line = q{Test #2 'Show me Wayne\'s world','Jenny\'s Basketball shoes' +} . "\n"; $line =~ s/(?<!\\)(['"]).+?(?<!\\)(['"])/$1SSS$2/g; print $line; __END__ Test #1 'SSS','SSS' Test #2 'SSS','SSS'
    I also removed the superfluous sprintf's.

    Update: YAPE::Regex::Explain to the rescue:

Re: skip over an escaped single quote
by wind (Priest) on Jan 30, 2011 at 08:28 UTC

    Take a look at perlre - Regular Expressions, Subsection Quantifiers, Second to last paragraph.

    The most efficient way to capture a single or double quoted string is to use an independent subexpression to avoid backtracking. The below example will also avoid the case where the backslash is escaped itself.

    use strict; my $single_quote_re = qr{ ' (?: (?> [^\\']+ ) | \\ . )* ' }sx; # Normal String my $line = qq{Test #1 'Show me Waynes world','Jennys Basketball shoes' +\n}; print $line; $line =~ s/$single_quote_re/SSS/g; print $line; # Escaped single quotes embedded $line = qq{Test #2 'Show me Wayne\\'s world','Jenny\\'s Basketball sho +es'\n}; print $line; $line =~ s/$single_quote_re/SSS/g; print $line; # Literal backslash before closing single quote $line = qq{Test #2 'Show me Waynes world\\\\','Jenny\\'s Basketball sh +oes'\n}; print $line; $line =~ s/$single_quote_re/SSS/g; print $line; __END__

    - Miller

Re: skip over an escaped single quote
by bplegend (Novice) on Jan 30, 2011 at 02:07 UTC
    Thank you. Let me read more on perlre.
Re: skip over an escaped single quote
by JavaFan (Canon) on Jan 30, 2011 at 17:57 UTC
    That's a classic.
    s/(?|(')[^'\\]*(?:\\.[^'\\]*)*'| (")[^"\\]*(?:\\.[^"\\]*)*")/${1}SSS$1/gxs;

      I just learned the (?|pattern) "branch reset" pattern from this post. It'll come in very handy—save me shenanigans—I'm sure. Thanks, ++JavaFan!