foreach loop not retrieving all data.

marcoss has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm extracting info from a website using HTML::TreeBuilder::XPath. With the help of the monks I've been able to do this from other websites with almost no complication...until now. Basically, the foreach loop is not looping through the whole table in order to extract the info from each node. It's only retrieving the first results. I have tried this in many ways, but it's always the same result no matter the route I use for the node (even copying the whole XPath route from the browser by rightclicking on it). This is the code, if you execute it and look at the sourcecode, you'll see what I mean.

#!/usr/bin/perl -w
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use Data::Dumper;
use strict;
my $debug=1;
my $base='http://www.msccrociere.it/it_it';
my $url='/Partenza-Crociere/Trova-La-Tua-Crociera.aspx?Reg=CAR&DateF=2
+01211&ddl=n&p=1&';
my $page = get($base.$url) or die $!;
my $p = HTML::TreeBuilder::XPath->new_from_content( $page );
#binmode( STDOUT, ':utf8');

my @trips= $p->findnodes( '//table[@id="tblFYCXML_Itin"]');

        foreach my $trip (@trips){

                my $destination = $trip->findvalue('.//h2[@class="FYCm
+aneDestXML"]');
                my $shipname = $trip->findvalue('.//div[@class="cConte
+ntLeft"]/a/h3');

                print "$destination\n";
                print "$shipname\n";

        }
[download]

I know I'm making a newbie mistake somewhere, like I said I've tried many different things before asking here. I hope you can give me a hand. Thanks a lot!!

Comment on foreach loop not retrieving all data. Download Code

Replies are listed 'Best First'.
Re: foreach my $question (@perlmonks){} by tobyink (Canon) on Jun 19, 2012 at 09:32 UTC
I'll give you a clue. Your problem is here: `my @trips= $p->findnodes( '//table[@id="tblFYCXML_Itin"]');` How many tables with `id="tblFYCXML_Itin"` do you expect the page to contain? `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^2: foreach my $question (@perlmonks){} by marcoss (Novice) on Jun 19, 2012 at 10:10 UTC
Hi, in that page there's only 1 table with that ID...so, I'm expecting the `foreach` loop to show me every `h2`, or every `div[@class="something"]/a`... or whatever I need to extract, such as departure dates, ship names, prices, duration..etc. mmmmm...I still can't see where the mistake is... perhaps one more clue...? Thanks!	[reply] [d/l] [select]
Re^3: foreach my $question (@perlmonks){} by muba (Priest) on Jun 19, 2012 at 10:27 UTC
But that's not what the code says... Walk through it with me. `my @trips= $p->findnodes( '//table[@id="tblFYCXML_Itin"]'); # So there's exactly one table with that id. # So @trips contains now exactly one node, that node being that one ta +ble. # You still with me? # If not, try it: print "There is/are ", scalar(@trips), " nodes in \@trips.\n";` [download] Okay. And then: `foreach my $trip (@trips){` [download] You see it? Look at that line again. See it now? Look again until you do. For each element of @trips, an array of which we just established that it has exactly one* element, anyway, so for each element of that set of one element,, you want to do something. And you get a result like it runs the loop only exactly one* time. Hmm, boggles the mind, don't it :) If, at this point, you still really need another clue? Try finding those nodes that you want to loop over, and loop over them, instead of trying to loop over something that you know only occurs once.	[reply] [d/l] [select]
Re^4: foreach my $question (@perlmonks){} by marcoss (Novice) on Jun 19, 2012 at 10:54 UTC
Re^3: foreach my $question (@perlmonks){} by Anonymous Monk on Jun 19, 2012 at 10:32 UTC
Well, there's your problem! This is essentially your code `my @tables = ( { 'h2' => [ .. ], 'div/a' => [ .. ], }, ); for my $table( @tables ){ my $oneh2 = $table->['h2']->[0]; my $onediv = $table->['div/a']->[0]; }` [download] You're asking why this doesn't look for multiple h2s or divs -- do you see why it doesn't?	[reply] [d/l]
Re^4: foreach my $question (@perlmonks){} by marcoss (Novice) on Jun 19, 2012 at 11:00 UTC
Re^5: foreach my $question (@perlmonks){} by Anonymous Monk on Jun 19, 2012 at 11:09 UTC
Some notes below your chosen depth have not been shown here