A workmate asked me the best way to convert all " quoted strings that span more than one line into a string with the newlines and surrounding whitespace compressed to a single space. Note: he does not need to worry about escaped " within the " strings.

To clarify, this input data:

"boom" hello "" bill hello " " bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "
should produce this output:
"boom" hello "" bill hello " " bill "baz hello jock" "boom2" abc "baz2 hello2 jock2 "

I suggested this code:

use strict; use warnings; my $s = <<"GROK"; "boom" hello "" bill hello " " bill "baz \t hello jock" "boom2" abc "baz2 hello2 \t jock2 " GROK $s =~ s{"([^"]*)"} { if ($1 =~ tr/\n//) { my $x = $1; $x =~ s/[ \t]*\n[ \t]*/ /g; '"' . $x . '"'; } else { '"' . $1 . '"'; } }eg;
Though this code does appear to work, improvements or advice are welcome. Also, there may be diabolical test data that breaks my code that I have missed. Admittedly the spec is a bit vague, but if you see some test data that breaks the code above, please let me know.

Update: Added extra line hello "  " bill to the test data to clarify the requirements. Thanks GrandFather. Also added extra space after jock2 to further clarify.


In reply to Changing quoted strings spanning more than one line by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.