/dev/luser has asked for the wisdom of the Perl Monks concerning the following question:

Greetings

I have an infile, for example:

<snip>
this is a "test of some regexp" blah blah
this is another "test of some regexp" foo bar
</snip>

...and I would like to replace all of the spaces in the
quoted sections with underscores. Hence the result would
look like this:

<snip>
this is a "test_of_some_regexp" blah blah
this is another "test_of_some_regexp" foo bar
</snip>

I am afraid this is beyond my limited regexp abilities, so
your help would be greatly appreciated. :-)
  • Comment on regexp - replace spaces in quoted string

Replies are listed 'Best First'.
Re: regexp - replace spaces in quoted string
by ambrus (Abbot) on Sep 06, 2005 at 10:23 UTC

      The regex to match quoted strings is discussed in detail in Jeffrey Friedl's excellent book: Mastering Regular Expressions. The regex to do this (and many more) is also available in the superb Regexp::Common CPAN module.

      Modifying ambrus's solution with Friedl's optimized regex to match a quoted string (which also allows for escaped characters) gives:

      s/("[^"\\]*(?:\\.[^"\\]*)*")/($x=$1)=~y: :_:, $x/ge;

Re: regexp - replace spaces in quoted string
by Anonymous Monk on Sep 06, 2005 at 10:15 UTC
    Not sure if this can be done using a single regexp, but here's a simple possible solution...

    use strict; use warnings; my ($quote_on,$output); $_ = q/this is a "test of some regexp" blah blah this is another "test of some regexp" foo bar/; for (split //) { $quote_on = ($quote_on ? 0 : 1) if (/\"/); s/\s/_/g if $quote_on; $output .= $_; } print "output: $output\n";
Re: regexp - replace spaces in quoted string
by /dev/luser (Acolyte) on Sep 06, 2005 at 10:35 UTC
    My thanks to you both. To help with my regexp education, would you mind expanding a little on the syntax?

      YAPE::Regex::Explain will help you:

      The regular expression: (?-imsx:s/(".*?")/($x=$1)=~y: :_:, $x/ge) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- s/ 's/' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- x= 'x=' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- 1 '1' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- =~y: :_:, '=~y: :_:, ' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- x/ge 'x/ge' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      HTH,


      Update: See below for corrections. (Thanks, japhy!)

      planetscape
        s/(".*?")/($x=$1)=~y: :_:, $x/ge) [CUT] =~y: :_:, '=~y: :_:, ' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- x/ge 'x/ge'
        Hm... Is it good explanation??
        I'll try in my poor english:
        s/(".*?")/($x=$1)=~y: :_:, $x/ge) ^^^^^^^^^1 ^^^^^^^2 ^^^^^^^3 ^^4 ^^^5
        1) take ALL between " and "
        2) So you have ALL in $1. $1 is read only, so copy $1 to $x. And now, for one moment forget about all and work only on $x (this is thanx /e which "wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring".
        3)in $x replace all ' ' to '_'
        4)put replaced $x into first s/from/to/
        5)g - works globaly; e - eval the replacment string

        greets
        Uksza
      To explain the first response syntax:
      #declarations my ($quote_on,$output); #default scalar is the data $_ = q/this is a "test of some regexp" blah blah this is another "test of some regexp" foo bar/; #using $_ begin a for loop performing a split on each #element for (split //) { #quote_on is a boolean set to true when the increments reach #a quote (and then off again when reaching another quote) $quote_on = ($quote_on ? 0 : 1) if (/\"/); #if processing elements after a quote, then replace the #whitespace with an underscore s/\s/_/g if $quote_on; #assign all that to $output $output .= $_; }
      I didn't write the original post, but I was thinking about doing it a different way, although this one seems to be
      perfectly adequate and more importantly, maintainable, than a seemingly more elegant one-liner. Just IMHO
Re: regexp - replace spaces in quoted string
by /dev/luser (Acolyte) on Sep 06, 2005 at 11:18 UTC
    spot on - thanks for your explanations!