in reply to Re: Using GET in a loop
in thread Using GET in a loop

Here is the compilable code. I thought it would be easier to focus on the problem directly. Sorry about any inconvencience caused.

#! C:/programme/perl use LWP::Simple; use LWP::UserAgent; use HTML::Stripper; use warnings; use strict; our $stripper = HTML::Stripper->new( skip_cdata => 1, strip_ws => 1 ); our $ID; our @ID=(161060, 160920, 160999, 160899); our $count=1; foreach $ID (@ID) { my $content; my $content_full; my $url="http://europa.eu.int/prelex/detail_dossier_real.cfm?CL=en&Do +sId="."$ID"; $content_full=" "; $content_full=get($url); $content=$stripper->strip_html($content_full); our $i_type=index($content, " COM "); our $d_type=substr($content, $i_type+1,3); our $d_year=substr($content, $i_type+6,4); our $d_number=substr($content, $i_type+12,3); our $proposal="$d_type "."\($d_year\)"." $d_number"; print "Proposal\: $proposal \n"; open DB, ">> C:/programme/perl/test/prelex.dta" or die "Problem: $!"; flock (DB, 2); print DB "$proposal\n"; close DB; }

Replies are listed 'Best First'.
Re^3: Using GET in a loop
by EdwardG (Vicar) on Sep 17, 2004 at 10:22 UTC

    By "focusing on the problem", you managed to focus away the part of the code with the bug.

    Now, with your assumptions tempered, it is obvious that the problem lies in the re-use of the $stripper object.

    Easiest solution; make a new $stripper in each iteration.

    foreach $ID (@ID) { $stripper = HTML::Stripper->new( ... ); ... }

    And in case it isn't clear, this has nothing to do with get().

     

      That is the solution! Thank you!!!

      Sometimes you do not see the forest because of all the trees...

      The problem is that HTML::Stripper uses HTML::Parser (which accumulates content) in strange ways and the documentation is not very explicit about it.
Re^3: Using GET in a loop
by davidj (Priest) on Sep 17, 2004 at 10:22 UTC
    Well, now the next question: is it $content_full or $content that is getting appended to instead of replaced? That is, is it the result of LWP::Simple's get function or HTML::Stripper's strip_html function that is not working correctly?

    davidj

      I think it is the $content_full variable. I printed content_full into a file last week and it got bigger and bigger being appended.

      There is a hint at this problem in the LWP::User Agent documentation. It states that there should be a new object for each request. Presumably, because an internal variable in GET is appended and not replaced with each new request.