sub parseResPage
{
my ( $rawHTML ) = @_;
my $tree = HTML::TreeBuilder->new_from_content( $rawHTML );
my @tables = $tree->look_down('_tag', 'table'); # We wa
+nt the second table
my @tableRows = $tables[1]->look_down('_tag', 'tr'); # First ro
+w is headings, then the data
my $headRow = shift @tableRows;
my @headings;
my $res_hash;
my @cells = $headRow->look_down('_tag', 'td');
push @headings, $_->as_text() foreach (@cells);
foreach my $mainRow ( @tableRows )
{
my @cells = $mainRow->look_down('_tag', 'td');
my $iface = $cells[0]->as_text();
for( my $i=0; $i<scalar@cells; $i++ )
{
$res_hash->{$iface}{ $headings[$i] } = $cells[$i]->as_text
+();
}
}
# Explicity free the memory consumed by the tree.
$tree->delete();
return $res_hash;
}
Tip: If you are not already familiar with the perl command line debugger then now is the time to learn. When I am working with HTML::TreeBuilder code, my usual approach is to write a script that just loads the tree and sets a break point afterwards, and then start running $tree->look_down() commands interactively until I find a combination that gives me what I am looking for. I then paste that back into my editor and use it in my script.
I suspect that if you write a script that uses HTML::TreeBuilder then it will probably end up being slower than your simple grep based script. HTML::TreeBuilder is well optimised perl written by some clever people, but it contains lots code to handle malformed HTML, and other corner cases, so it will be slower than a simple regular expression based script. Why are you so concerned about speed anyway? How much time have you spent on writing these scripts already?
|