eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

A workmate asked me the best way to convert all " quoted strings that span more than one line into a string with the newlines and surrounding whitespace compressed to a single space. Note: he does not need to worry about escaped " within the " strings.

To clarify, this input data:

"boom" hello "" bill hello " " bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "
should produce this output:
"boom" hello "" bill hello " " bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

I suggested this code:

use strict; use warnings; my $s = <<"GROK"; "boom" hello "" bill hello " " bill "baz \t hello jock" "boom2" abc "baz2 hello2 \t jock2 " GROK $s =~ s{"([^"]*)"} { if ($1 =~ tr/\n//) { my $x = $1; $x =~ s/[ \t]*\n[ \t]*/ /g; '"' . $x . '"'; } else { '"' . $1 . '"'; } }eg;
Though this code does appear to work, improvements or advice are welcome. Also, there may be diabolical test data that breaks my code that I have missed. Admittedly the spec is a bit vague, but if you see some test data that breaks the code above, please let me know.

Update: Added extra line hello "  " bill to the test data to clarify the requirements. Thanks GrandFather. Also added extra space after jock2 to further clarify.

Replies are listed 'Best First'.
Re: Changing quoted strings spanning more than one line ($/)
by tye (Sage) on Sep 19, 2007 at 03:24 UTC
    $/= '"'; while( <> ) { s/\s+/ /g if 0 == $. % 2 && /\n/; print; }

    Update: Changed "if  0 == ( 1 & $. );" because the bit-op was a bit silly and to leave single-line strings uncollapsed.

    - tye        

      Collapses

      hello " " bill

      to

      hello " " bill

      which may or may not be important to OP's workmate.


      DWIM is Perl's answer to Gödel

        Yes. The requirement is to not change existing strings that do not span more than one line (I updated the root node with this extra line of test data to clarify the requirements).

Re: Changing quoted strings spanning more than one line
by GrandFather (Saint) on Sep 19, 2007 at 02:43 UTC

    No more testing than your sample, but the following could be considered a little cleaner:

    use strict; use warnings; my @chunks = split '"', do{local $/; <DATA>}, -1; s/(?<=.)\s*\n\s*(?=.)/ /g for @chunks; print join '"', @chunks; __DATA__ "boom" hello "" bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

    Prints:

    "boom" hello "" bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

    DWIM is Perl's answer to Gödel

      Like BrowserUk's solution below, this one similarly fails with the new test data (see root node update above). To clarify, if you run it with:

      hello "" bill hello " " bill
      it produces:
      hello "" bill hello " " bill
      when it should not alter the input data in this case.

Re: Changing quoted strings spanning more than one line
by BrowserUk (Patriarch) on Sep 19, 2007 at 04:52 UTC

    This is a bit simpler and produces the correct result from your sample.

    $_ = do{ local $/; <DATA> }; s[("[^\n"]+\n[^"]+")][ (my $x = $1) =~ s[\s+][ ]g; $x ]ge; print; "boom" hello "" bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Unfortunately, it fails with the new test data (see root node update above). To clarify, if you run it with:

      hello "" bill hello " " bill
      it produces:
      hello "" bill hello " " bill
      when it should not alter the input data in this case.

        Tad more complicated:

        $_ = do{ local $/; <DATA> }; s[("[^\n"]*?")|("[^\n"]+?\n[^"]+?")][ ##" $1 || do{ (my $x = $2) =~ s[\s+][ ]g; $x } ]ge; print; "boom" hello "" bill hello " " bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Changing quoted strings spanning more than one line
by shmem (Chancellor) on Sep 19, 2007 at 12:05 UTC
    I'd probably say
    $s =~ s!"([^"]*)"!local $_ = $1; s/\s+/ /g if /\n/; "\"$_\""!seg;

    which is basically your solution golfed down a bit (but still not too obfuscated).

    I wonder how the $1 =~ tr/\n// in your solution doesn't die with "modification of a readonly value attempted"?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      I wonder how the $1 =~ tr/\n// in your solution doesn't die with "modification of a readonly value attempted"?
      This special form of tr with an empty replacement list is used to count the number of characters. Example from perlop:
      $cnt = tr/0-9//; # count the digits in $_
      In perlfaq4, they give an example indicating that this idiomatic use of tr is the canonical Perl way to count the number of characters in a string. Admittedly, apart from these examples, I can't find an explicit statement in the docs that this special form of tr does not attempt to modify the string.

        Um, it's so long ago that I read the faq... thanks for the clarification.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      I wonder how the $1 =~ tr/\n// in your solution doesn't die with "modification of a readonly value attempted"?

      Have a peek at this:

      local $, = ", "; local $\ = "\n"; for (1..2) { print map { ++$_ } 0..4; }
      1, 2, 3, 4, 5 2, 3, 4, 5, 6
        That's cute, but doesn't help explaining why $1 =~ tr/\n// doesn't attempt to modify $1, which is read-only.

        Interesting, though.

        local $, = ", "; local $\ = "\n"; for ($foo,$bar) { print map { ++$_ } 0..4; }
        1, 2, 3, 4, 5 2, 3, 4, 5, 6

        The outer $_ isn't related to map's $_. But it seems like map's $_ isn't being reset each time through the loop. What gives? Bug or some (undocumented) feature?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}