aspen has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to use LWP to retrieve a webpage. When I retreive this page via Mozilla and select View Source I see the following for a piece of the given page:

Received: from unknown (66.218.66.218)<BR> by m6.grp.scd.yahoo.com with QMQP; 17 Sep 2002 09:47:41 -0000<BR> Received: from unknown (HELO hotmail.com) (207.68.162.118)<BR> by mta3.grp.scd.yahoo.com with SMTP; 17 Sep 2002 09:47:41 -0000<BR> Received: from mail pickup service by hotmail.com with Microsoft SMTPS +VC;<BR> Tue, 17 Sep 2002 02:47:41 -0700<BR>

When I retrieve the same page via LWP I seem to be receiving the following in the HTTP::Response->content:

Received: from unknown (66.218.66.218)<BR> by m6.grp.scd.yahoo.com with QMQP; 17 Sep 2002 09:47:41 -0000<BR> Received: from unknown (HELO hotmail.com) (207.68.162.118)<BR> by mta3.grp.scd.yahoo.com with SMTP; 17 Sep 2002 09:47:41 -0000<BR> Received: from mail pickup service by hotmail.com with Microsoft SMTPS +VC;<BR> Tue, 17 Sep 2002 02:47:41 -0700<BR>

Basically the same content but with the leading spaces missing. These spaces are important as the content is supposed to be a valid SMTP (RFC 2822) message, in which leading spaces matter.

Has anyone experienced a similar problem with LWP? Any suggested solutions?

Andy

@_="the journeyman larry disciple keeps learning\n"=~/(.)/gs, print(map$_[$_-77],unpack(q=c*=,q@QSdM[]uRMNV^[ni_\[N]eki^y@))

Replies are listed 'Best First'.
Re: LWP not returning leading spaces in web page
by fokat (Deacon) on Feb 01, 2003 at 04:36 UTC

    Since you did not post any code on your question, I have to assume your code has some problem. I did the following:

    #!/usr/bin/perl; use LWP; use strict; use warnings; my $ua = new LWP::UserAgent; my $r = $ua->get("http://www.perlmonks.org/index.pl?node_id=231795"); if ($r->is_success) { if ($r->content =~ m/^\s+\S/m) { print "I saw some trailing spaces...\n"; } else { print "Oppsie! I did not see any trailing spaces...\n"; } } else { die "Failed miserably\n"; }

    ... and guess what:

    bash-2.05a$ perl lwp
    I saw some trailing spaces...
    bash-2.05a$ perl -MLWP -e 'print $LWP::VERSION, "\n";'
    5.65
    bash-2.05a$ 
    

    As you see, in my machine / perl / LWP I obtain trailing spaces alright. So please go ahead and post your code so that we can take a deeper look at what's wrong.

    Best regards

    -lem, but some call me fokat

      This is helping to narrow down the problem. I made two small changes to your code and now I can replicate the problem.

      #!/usr/bin/perl -d use LWP; use strict; use warnings; my $ua = new LWP::UserAgent; my $r = $ua->get("http://groups.yahoo.com"); if ($r->is_success) { if ($r->content =~ m/^\s+\S$/m) { print "I saw some trailing spaces...\n"; } else { print "Oppsie! I did not see any trailing spaces...\n"; } } else { die "Failed miserably\n"; }

      I updated the regex to (I believe) properly look for individual lines with leading spaces and some text. Then I tried a different web page. The problem now appears.

      If you actually print out the content $ua->content, you'll see that LWP doesn't print any of the leading spaces that Mozilla View Source does show!

      I'd really like to understand what's happening here!

      Andy

      @_="the journeyman larry disciple keeps learning\n"=~/(.)/gs, print(map$_[$_-77],unpack(q=c*=,q@QSdM[]uRMNV^[ni_\[N]eki^y@))
        I updated the regex to (I believe) properly look for individual lines with leading spaces and some text.if ($r->content =~ m/^\s+\S$/m)

        Your regex doesn't do what you claim it does. This regex looks for

        1. beginning of line
        2. at least one whitespace
        3. one non whitespace character
        4. end of line
        That's quite a strange line, containing only one non whitespace character ... fokat's original regex correctly looks for a line beginning with a couple of whitespaces followed by a non whitespace char.

        -- Hofmator