in reply to LWP on Windows: whitespace removed from HTML

It may be that whitespace is being preserved. Unix programs to use "\n" as the End-Of-Line character, while DOS/Windows programs more frequently use "\r\n" for the End-Of-Line. When you "type" a file in a DOS/windows command window, all the text will appear on the same line. If it looks fine when you open the document in WordPad, that's likely to be your problem.

...roboticus

Update: Fixed grammar.

  • Comment on Re: LWP on Windows: whitespace removed from HTML

Replies are listed 'Best First'.
Re^2: LWP on Windows: whitespace removed from HTML
by Anonymous Monk on May 05, 2008 at 21:15 UTC
    Thanks, but I've tried viewing the output in text editors that recognize the UNIX end-of-line character, like TextPad and WordPad, and the result is the same - the HTML appears mainly on one line. Also, it's not just end-of-line characters that are removed; whitespace used for padding at the beginning of lines is removed as well.

      I assume you did a View|Source on both the DOS and Leopard browsers and that they both show HTML formatted as you expect. The last arrows in my quiver1 are:

      1) Change the agent identity LWP uses? Perhaps the server system just happens to serve up a different version of the document for agents other than the browsers you tried.

      2) Try using a network scanner (Ethereal or some such) to view the packets as they come across the network to verify that the software stack is munching your whitespace.

      ...roboticus

      1: I don't do web/HTML stuff, so my quiver is rather sparse.

      My guess, then, is that there aren't any carriage returns in the HTML - i.e. the server isn't generating them. If you give us an example url we can verify that.