Re: Using GET in a loop
by EdwardG (Vicar) on Sep 17, 2004 at 09:09 UTC
|
If MODE is '>>', the file is opened for appending, again being
created if necessary.
Try this -
Open FILE, "> C:/perl/output.txt"; # <-- notice the single '>'
| [reply] [d/l] [select] |
|
|
I want to append the file, but with new information.
$content gets appended, that is the problem. Every time the loop is executed, $content gets bigger and bigger as the newly retrieved website is added to the previously retrieved websites. It still does so when I "empty" $content ($content= " ";) every time before retrieving a new website and use a local $content variable.
Thus, my problem is not with the output of the data into the file but with the input.
Thank you for your rapid reaction!
| [reply] |
|
|
| [reply] |
|
|
|
|
Re: Using GET in a loop
by Zaxo (Archbishop) on Sep 17, 2004 at 09:19 UTC
|
Isn't your program failing from undefined 'Open' and 'Print'? The perl builtins are all lower-case.
Anyhow, what else is failing? You explicitly place the results all in the same file by trying to open to append. You might as well open the output file before the loop and close afterwards. Also, your url looks fishy. Is that just to hide the address you're scraping? Lexical $content should get rid of previous $content just fine.
| [reply] |
|
|
| [reply] |
|
|
I am using strict and warnings. Makes no difference.
Cheers!
| [reply] |
Re: Using GET in a loop
by davidj (Priest) on Sep 17, 2004 at 09:47 UTC
|
Obviously, due to the fact that it doesn't even compile, the code you have posted is not the code you are using, but a strip-down for this question. I would suggest that you post the code you are using. I guess you could dummy the urls you are fetching, but the rest of the code should be posted as is.
davidj | [reply] |
|
|
Here is the compilable code. I thought it would be easier to focus on the problem directly. Sorry about any inconvencience caused.
#! C:/programme/perl
use LWP::Simple;
use LWP::UserAgent;
use HTML::Stripper;
use warnings;
use strict;
our $stripper = HTML::Stripper->new( skip_cdata => 1, strip_ws => 1 );
our $ID;
our @ID=(161060, 160920, 160999, 160899);
our $count=1;
foreach $ID (@ID) {
my $content;
my $content_full;
my $url="http://europa.eu.int/prelex/detail_dossier_real.cfm?CL=en&Do
+sId="."$ID";
$content_full=" ";
$content_full=get($url);
$content=$stripper->strip_html($content_full);
our $i_type=index($content, " COM ");
our $d_type=substr($content, $i_type+1,3);
our $d_year=substr($content, $i_type+6,4);
our $d_number=substr($content, $i_type+12,3);
our $proposal="$d_type "."\($d_year\)"." $d_number";
print "Proposal\: $proposal \n";
open DB, ">> C:/programme/perl/test/prelex.dta"
or die "Problem: $!";
flock (DB, 2);
print DB "$proposal\n";
close DB;
}
| [reply] [d/l] |
|
|
By "focusing on the problem", you managed to focus away the part of the code with the bug.
Now, with your assumptions tempered, it is obvious that the problem lies in the re-use of the $stripper object.
Easiest solution; make a new $stripper in each iteration.
foreach $ID (@ID) {
$stripper = HTML::Stripper->new( ... );
...
}
And in case it isn't clear, this has nothing to do with get().
| [reply] [d/l] |
|
|
|
|
|
|
Well, now the next question: is it $content_full or $content that is getting appended to instead of replaced? That is, is it the result of LWP::Simple's get function or HTML::Stripper's strip_html function that is not working correctly?
davidj
| [reply] [d/l] [select] |
|
|
Re: Using GET in a loop
by ccn (Vicar) on Sep 17, 2004 at 09:09 UTC
|
use LWP::Simple;
foreach $ID (@ID) {
$content=" ";
$url="http:://...docid=$ID";
# this is completely NEW value
$content=get($url);
# here you APPEND new value to the previous values
open FILE, ">> C:/perl/output.txt";
print FILE $content;
}
| [reply] [d/l] |
|
|
This is exactly what I expected to happen, but it doesn't.
Each time the (global or local) variable $content gets a new value assigned (via GET), the content of the variable is not replaced by the new value (content of the newly retrieved website) but it is appended.
This is the case even when I explicitly "empty" it ($content= " "). Apparently, GET uses an internal variable to store the information retrieved, which is appended. Thus, each time the loop is executed $content simply gets bigger and bigger as this internal variable adds the content of newly retrieved webpages to all previously retrieved ones.
Thank you for your response!
| [reply] |