ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to do the following:

my $test_string = "foo l?est sé?jou ? foo "; $test_string =~ s/([a-zA-Z0-9]+)\?([a-zA-Z0-9]+)/$1'$2/sig;
Basically, we need to replace ? with ' , but ONLY when its NOT got a space *before* or *after* it. It needs to allow for foreign charachters too (as this is a french site :))

Is this possible? (I would imagine it is - but I'm still learning the more advanced regex stuff ;))

TIA

Andy

Replies are listed 'Best First'.
Re: Simple regex question
by kennethk (Abbot) on Sep 15, 2010 at 14:42 UTC
    You can accomplish this fairly easily using negative Look Around Assertions.

    #!/usr/bin/perl use strict; use warnings; my $test_string = "foo l?est sé?jou ? foo "; $test_string =~ s/(?<!\s)\?(?!\s)/'/sig; print "$test_string\n";

    The expression will fail when the character before ((?<!...)) the question mark is white space and when the character after ((?!...)) the question mark is white space. This will substitute if there is a leading or trailing question mark. These are zero-width assertions, so you don't have to worry about capturing. See perlre for details.

      Perfect, works a charm. Will have to read up on that page - looks like it has a lot of cool features in that I wasn't aware of :)

      Thanks again

      Andy
Re: Simple regex question
by Your Mother (Archbishop) on Sep 15, 2010 at 16:39 UTC

    Seems like you got some good answers but this is an XY problem. You are not fixing the real issue which seems to be your site is clobbering ’s and probably anything outside Latin-1. You should fix that. Monkeying around with data in a lossy way ends up causing more pain than it saves.

    moo@cow[789]~>perl -MEncode -le 'print encode_utf8(chr(8217) . " or \x +{2019}")' ’ or ’
Re: Simple regex question
by JavaFan (Canon) on Sep 15, 2010 at 16:23 UTC
    Without using advanced regexp stuff:
    s/(\s\?|\?\s)|\?/$1 || "'"/eg; # Look ma, no look behind/ahead
    But that may replace a question mark at the end of a string (or beginning). If you don't want that:
    1 while s/(\S)\?(\S)/$1'$2/;
    I am assuming that with "space", you mean any whitespace. If you really mean just a space (and not tabs, newlines, etc), replace \s with a space, and \S with [^ ].

      I haven't checked, but look-around assertions have certainly been around for many years, perhaps more than a decade. Can they really still be considered to be 'advanced'?

      Also, the  (?<!\?) and  (?!\?) assertions seem to me to perfectly express the notions 'not preceded by...' and 'not followed by...', respectively. Are they not preferable to the somewhat convoluted logic of your example code? (Admitedly, this may be largely a matter of taste.)

        I haven't checked, but look-around assertions have certainly been around for many years, perhaps more than a decade. Can they really still be considered to be 'advanced'?
        How long a construct has been part of a language doesn't determine whether it's advanced or not. They are described in the section:
           Extended Patterns
               Perl also defines a consistent extension syntax for features not found
               in standard tools like awk and lex.  The syntax is a pair of
               parentheses with a question mark as the first thing within the
               parentheses.  The character after the question mark indicates the
               extension.
        
               The stability of these extensions varies widely.  Some have been part
               of the core language for many years.  Others are experimental and may
               change without warning or be completely removed.  Check the
               documentation on an individual feature to verify its current status.
        
               A question mark was chosen for this and for the minimal-matching
               construct because 1) question marks are rare in older regular
               expressions, and 2) whenever you see one, you should stop and
               "question" exactly what is going on.  That’s psychology...
        
        I can very well imagine that some people call this an advanced structure.
        Also, the (?<!\?) and (?!\?) assertions seem to me to perfectly express the notions 'not preceded by...' and 'not followed by...', respectively.
        That's fine. And if the OP is fine with that, he'll use it. If he doesn't like it (or anyone else who stumbles upon this thread), I've offered him an alternative.

        There's more than one way to skin a cat.

Re: Simple regex question
by umasuresh (Hermit) on Sep 15, 2010 at 15:34 UTC
Re: Simple regex question
by planetscape (Chancellor) on Sep 15, 2010 at 20:01 UTC