Re: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR

Thank you all, as always, for you valuable input and ideas! Ye monks are a smart bunch.

As much as I'd love to help debug W::M::Chrome, I have a short deadline so I decided to use LanX's idea to use xpath to get the table node and the HTML content and then parse that in Perl land. I decided to use HTML::Tree which is simple and tried.

For anyone having a similar issue, here is the code I wrote for this (assuming it has thead, th, and tbody, YMMV):


my @nodes = $mech->xpath('//table');
my @data = parse_table($nodes[0]);

sub parse_table ($table_node){
    my $root = HTML::TreeBuilder->new_from_content($table_node->get_at
+tribute('outerHTML'));
    my @tparts = $root->find_by_tag_name('table')->content_list;
    my @colnames = ( );
    my @data;
    foreach my $tpart (@tparts){
        if($tpart->tag eq 'thead'){
            my @rows = $tpart->content_list;
            foreach my $row (@rows) {
                if($row->tag eq 'tr'){
                    my @cells = $row->content_list;
                    # assumes no TH is empty (see below safeguard for 
+data cells)
                    foreach (@cells) {
                        push @colnames, $_->content->[0];
                    }
                }
            }
        }
        elsif($tpart->tag eq 'tbody'){
            my @rows = $tpart->content_list;
            foreach my $row (@rows) {
                my %row_data = ();
                if($row->tag eq 'tr'){
                    my @cells = $row->content_list;
                    foreach (0..$#cells) {
                        # HTML::Element's content method weirdness
                        if($cells[$cell]->content && scalar(@{$cells[$
+cell]->content})){
                            $row_data{ $colnames[$cell] } = $cells[$ce
+ll]->content->[0];
                        }
                        else{
                            $row_data{ $colnames[$cell] } = '';
                        }
                    }
                }
                push @data, \%row_data;
            }
        }
    }
    return \@data;
}
[download]

Thanks again y'all !
--
Alex

Comment on Re: WWW::Mechanize::Chrome VERY slow on xpath obtaining TDs of a TR Download Code