Thank you all, as always, for you valuable input and ideas! Ye monks are a smart bunch.
As much as I'd love to help debug W::M::Chrome, I have a short deadline so I decided to use LanX's idea to use xpath to get the table node and the HTML content and then parse that in Perl land. I decided to use HTML::Tree which is simple and tried.
For anyone having a similar issue, here is the code I wrote for this (assuming it has thead, th, and tbody, YMMV):
my @nodes = $mech->xpath('//table');
my @data = parse_table($nodes[0]);
sub parse_table ($table_node){
my $root = HTML::TreeBuilder->new_from_content($table_node->get_at
+tribute('outerHTML'));
my @tparts = $root->find_by_tag_name('table')->content_list;
my @colnames = ( );
my @data;
foreach my $tpart (@tparts){
if($tpart->tag eq 'thead'){
my @rows = $tpart->content_list;
foreach my $row (@rows) {
if($row->tag eq 'tr'){
my @cells = $row->content_list;
# assumes no TH is empty (see below safeguard for
+data cells)
foreach (@cells) {
push @colnames, $_->content->[0];
}
}
}
}
elsif($tpart->tag eq 'tbody'){
my @rows = $tpart->content_list;
foreach my $row (@rows) {
my %row_data = ();
if($row->tag eq 'tr'){
my @cells = $row->content_list;
foreach (0..$#cells) {
# HTML::Element's content method weirdness
if($cells[$cell]->content && scalar(@{$cells[$
+cell]->content})){
$row_data{ $colnames[$cell] } = $cells[$ce
+ll]->content->[0];
}
else{
$row_data{ $colnames[$cell] } = '';
}
}
}
push @data, \%row_data;
}
}
}
return \@data;
}
Thanks again y'all !
--
Alex
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|