physi has asked for the wisdom of the Perl Monks concerning the following question:

I currently got a problem with the substitution of whitespaces.
On a single line I want to reduce more then one whitespace to exactly one .
 s/\s+/ /g is doing this, but there is a little trap :)
The substitution should not be made, if the whitespaces are inside a ' ' or " " block.
Can anybody help with that, or is this not possible in a single substitute line ?
Any help is welcome.
----------------------------------- --the good, the bad and the physi-- -----------------------------------

Replies are listed 'Best First'.
Re: Global Whitespace Delete
by tadman (Prior) on Jul 31, 2001 at 11:41 UTC
    You can do this, but the issues stem from how you parse the quotes, and how they are delimited, especially with respect to embedded quotes. In some cases you have formats which encode a quote like:
    " \" " " "" "
    Some DB scripting languages have a truly horrific way of doing it, but the basics are the same. In terms of a regex, you are looking for a quote, zero or more non-quote or delimited quote characters, and the terminating quote. You can easily change the delimited quote character bit to suit your fancy.

    Here is my rather unruly specimen:
    s/((?:"(?:\\"|[^"])*?")|(?:'(?:\\'|[^'])*?'))|(\s+)/$2?" ":$1/ge;
    Here is what it did to my test data:
    A language by "any other \"name\"", would it smell as sweet? A language by "any other \"name\"", would it smell as sweet?
    If you weren't concerned about delimited quotes, as HTML has no such thing, really, then you could use a simplified version of same:
    s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?" ":$1/ge;
      that's really fantastic, many thanks. I have modified it to:
      s/((?:"[^"]*?")|(?:'[^']*?'))|(\s+)/$2?$2 ne"\n"?" ":"\n":$1/ge;
      so that the "\n" on the end of the line isn't changed to ' '.
      And now I will read more about ?: and try to understand it. :-)
      btw. It was no homework as BrentDax suspects, it's just a single line in a convertscript for comfiche jobs.
      Thanks
      ----------------------------------- --the good, the bad and the physi-- -----------------------------------
        I'm not sure where BrentDax got the homework idea. A little quick to judge, is all, I suppose.

        Your comparison is peculiar. You might want to specify a set instead of \s+, such as:
        s/((?:"[^"]*?")|(?:'[^']*?'))|([ \t]+)/$2?" ":$1/ge;
        The set of space and tab is probably more efficient than asking for more than you want, and then discarding the extras. \s by default contains tab, space, and newline. Since you have no use for newline, just don't ask for it.
Re: global whitespace delete
by bwana147 (Pilgrim) on Jul 31, 2001 at 12:25 UTC

    How about Text::ParseWords?

    I haven't tested it but you may use this module to split your strings on whitespace, while still ignoring those that are quoted. Then join the words together:

    use Text::ParseWords; @words = quotewords('\s+', 1, $text); $text = join ' ' => @words;

    --bwana147

    Update: finally, I tested it and it works!

A reply falls below the community's threshold of quality. You may see it by logging in.