in reply to global whitespace delete

You can do this, but the issues stem from how you parse the quotes, and how they are delimited, especially with respect to embedded quotes. In some cases you have formats which encode a quote like:
" \" " " "" "
Some DB scripting languages have a truly horrific way of doing it, but the basics are the same. In terms of a regex, you are looking for a quote, zero or more non-quote or delimited quote characters, and the terminating quote. You can easily change the delimited quote character bit to suit your fancy.

Here is my rather unruly specimen:
s/((?:"(?:\\"|[^"])*?")|(?:'(?:\\'|[^'])*?'))|(\s+)/$2?" ":$1/ge;
Here is what it did to my test data:
A language by "any other \"name\"", would it smell as sweet? A language by "any other \"name\"", would it smell as sweet?
If you weren't concerned about delimited quotes, as HTML has no such thing, really, then you could use a simplified version of same:
s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?" ":$1/ge;

Replies are listed 'Best First'.
Re: Re: global whitespace delete
by physi (Friar) on Jul 31, 2001 at 13:21 UTC
    that's really fantastic, many thanks. I have modified it to:
    s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?$2 ne"\n"?" ":"\n":$1/ge;
    so that the "\n" on the end of the line isn't changed to ' '.
    And now I will read more about ?: and try to understand it. :-)
    btw. It was no homework as BrentDax suspects, it's just a single line in a convertscript for comfiche jobs.
    Thanks
    ----------------------------------- --the good, the bad and the physi-- -----------------------------------
      I'm not sure where BrentDax got the homework idea. A little quick to judge, is all, I suppose.

      Your comparison is peculiar. You might want to specify a set instead of \s+, such as:
      s/((?:"[^"]*?")|(?:'[^']*?'))|([ \t]+)/$2?" ":$1/ge;
      The set of space and tab is probably more efficient than asking for more than you want, and then discarding the extras. \s by default contains tab, space, and newline. Since you have no use for newline, just don't ask for it.