perlmad has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I have problem in nested div tgs

my code

#!/usr/bin/perl use strict; use HTML::TokeParser::Simple; my $parser = HTML::TokeParser::Simple->new(handle => \*DATA); my @dnldLinks; my @month_year; my @date; my @data; my $index; my $contract; while ( my $div = $parser->get_tag('div') ) { if($div->is_start_tag('div')){ if($div->[1]{class} =~ 'Cell month-year') { my $time = $parser->get_trimmed_text; push(@month_year,$time); } if($div->[1]{class} =~ 'Cell Release Date') { my $time = $parser->get_trimmed_text; push(@date,$time); $index=$time; } if($div->[1]{class} =~ 'Mortgage Contract Rate') { my $time = $parser->get_trimmed_text; push(@date,$time); $contract=$time; } print "date : $index, data : $contract\n"; } } #use Data::Dumper; #print Dumper \@dnldLinks; __DATA__ <div class='historicalChartTable'> <div class="Row Jun-2015"> <div class="Cell month-year "> Jun-2015 </div> <div class="Cell Release D +ate "> 2015-07-30 </div> <div class="Cell Natio +nal Mortgage Contract Rate "> 3.850 </div> </div> class="Row May-2015"> <div class="Cell month-year even"> May-2015 </div> <div class="Cell Release Date even"> 2015-06-25 </div> <div class="Cell National Mortgage Contract Rate even" +> 3.750 </div> </div> </div>
Getting Output: date : , data : date : , data : date : , data : date : 2015-07-30, data : date : 2015-07-30, data : 3.850 date : 2015-07-30, data : 3.850 date : 2015-06-25, data : 3.850 date : 2015-06-25, data : 3.750

my code is parsing entire div tag one by one , not recursively,Kindly help me to find out the data recursively in the div tag

Expected output: date : 2015-07-30, data : 3.850 date : 2015-06-25, data : 3.750

Replies are listed 'Best First'.
Re: Nested div tag
by haukex (Archbishop) on Jul 20, 2016 at 14:30 UTC

    Hi perlmad,

    If you move the print line one line up, i.e. into the third nested if, your output is as you expect.

    BTW, there isn't really anything recursive about your code, you just have three different ifs for three different classes, which are run against all divs. Assuming you only need to select divs based on their classes that's fine, but if you also happen to need to select them based on their position in the tree you'll need different (possibly recursive) code for that.

    The above fix will also get rid of all the warnings that your code would have produced had they been enabled before the fix. Since they can be quite useful in finding potential problems, you should leave warnings turned on and only disable specific ones if you know what you're doing.

    Also, the way you're matching CSS classes is a potential place for your code to break, since the order of class names might change. If your HTML is static and never changes, your code should work, but otherwise you might want to look into using proper CSS selectors. For example, an alternative way to parse HTML is Mojo::DOM, which supports several CSS selectors. There were two threads about it not too long ago, How to obtain text in Mojo::DOM ? and Mojo::DOM find tag after another tag.

    Hope this helps,
    -- Hauke D

    Update: Clarification.

Re: Nested div tag
by Anonymous Monk on Jul 20, 2016 at 20:33 UTC

    I have problem in nested div tgs

    Nope, your problem is you're not using a tree/DOM/xpath ... htmltreexpather.pl/HTML::TreeBuilder::XPath or xpather.pl/XML::LibXML

    If you did your code could become

    my @rows = $tree->findnodes(q{ //div[ @class =~ /row/ ] }); for my $row ( @rows ){ my @cells = $row->findnodes(q{ //div[ @class =~ /cell/ ] }); for my $cell ( @cells ){ Dance( $cell ); } }

      Thanks for your interest, but when i ran the file nothing assigned to @rows mean $#rows is -1

        Thanks for your interest, but when i ran the file nothing assigned to @rows mean $#rows is -1

        What?