coltman has asked for the wisdom of the Perl Monks concerning the following question:
I met a weird case when I tried to download the webpage "http://securities.stanford.edu/1008/UTIIQ96/" using LWP::Simple::get() and save the $content to a txt file.
The weird thing is that if I open the txt file using some editor (e.g., UltraEdit), it shows perfectly normal:
<HTML><HEAD><TITLE>Unitech Industries, Inc. - Securities Class Action</TITLE>
However, if I use "print $content" during the downloading . The log shows something differently:
< H T M L > < H E A D > < T I T L E > U n i t e c h I n d u s t r i e s , I n c . - S e c u r i t i e s C l a s s A c t i o n < / T I T L E >
It just adds a space after every character.
When I try to use regex to extract information, the space issue just haunted me all the time as perl will always read the txt file as if it has the extra space!
I will appreciate it if someone can give me some hint on the cause and solution to the problem.
Thank you!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: LWP problem
by kyle (Abbot) on Sep 08, 2008 at 15:50 UTC | |
|
Re: LWP problem
by betterworld (Curate) on Sep 08, 2008 at 15:40 UTC | |
|
Re: LWP problem
by deus.lemmus (Initiate) on Sep 10, 2008 at 13:31 UTC |