aspen has asked for the wisdom of the Perl Monks concerning the following question:
I am resubmitting this LWP question. Thanks to some help from fokat I can now provide a short, complete program to recreate the problem.
With some websites LWP appears to be not returning leading spaces that are at the start of many lines. I need to retrieve those spaces when using LWP.
Note that LWP will return the leading spaces for many (most?) webpages, groups.yahoo.com is just one well-known site where it seems to be losing these spaces.
To replicate this issue perform these steps:
Any help diagnosing this will be greatly appreciated.
The code that performs the above (after saving the Mozilla's view source to the current directory) is:
#!/usr/bin/perl use LWP; use strict; use warnings; my $ua = new LWP::UserAgent; print "\n\nUsing LWP to grap http://groups.yahoo.com.\n". "Printing first 500 characters.\n"; my $r = $ua->get("http://groups.yahoo.com"); ${$r->content_ref} =~ s/ /./g; ${$r->content_ref} =~ s/\cJ/<<LF>>\cJ/g; print substr($r->content,0,500), "\n"; print "\n\nUsing previously-saved Mozilla groups.yahoo.com source.\n". "Print first 500 characters.\n"; open FH, "groups.yahoo.com.html"; undef $/; my $s = <FH>; $s =~ s/ /./g; $s =~ s/\cJ/<<LF>>\cJ/g; print substr($s,0,500), "\n";
For those that just want to see the results, this is what is printed when I run the above program:
Using LWP to grap http://groups.yahoo.com. Printing first 500 characters. <<LF>> <HTML><<LF>> <HEAD><<LF>> <META.http-equiv="PICS-Label".content='(PICS-1.1."http://www.icra.org/ +ratingsv02.html".l.gen.true.for."http://groups.yahoo.com".r.(nz.0.vz. +0.lz.0.oz.0.ca.1))'><<LF>> <META.content="free.email.groups,.mailing.lists,.communities,.majordom +o,.e-mail,.bounce.handling,.mlm.software,.listserv,.Yahoo!.Groups,.ne +wletters,.announcement,.email.lists,.list.hosting".name=keywords><<LF +>> <META.content="Yahoo!.Groups.-.Free,.easy.email.groups".name=descripti +on><<LF>> <TITLE><<L Using previously-saved Mozilla groups.yahoo.com source. Print first 500 characters. <<LF>> <<LF>> <HTML><<LF>> <<LF>> <HEAD><<LF>> <<LF>> ........<<LF>> ........<META.http-equiv="PICS-Label".content='(PICS-1.1."http://www.i +cra.org/ratingsv02.html".l.gen.true.for."http://groups.yahoo.com".r.( +nz.0.vz.0.lz.0.oz.0.ca.1))'><<LF>> ..<<LF>> ......<META.content="free.email.groups,.mailing.lists,.communities,.ma +jordomo,.e-mail,.bounce.handling,.mlm.software,.listserv,.Yahoo!.Grou +ps,.newletters,.announcement,.email.lists,.list.hosting".name=keyword +s><<LF>> ....<META.content="Yahoo!.Group
Very strange.
Andy
@_="the journeyman larry disciple keeps learning\n"=~/(.)/gs, print(map$_[$_-77],unpack(q=c*=,q@QSdM[]uRMNV^[ni_\[N]eki^y@))
|
|---|