HTML::TreeBuilder is a useful module. But I had to spend some time combing manual pages before I could use it. This may save you that trouble.

Attached is perl code which does some rudimentary munging of the nested tables.

This is the start of some code I am writing to handle decision trees in Perl.

#!/usr/local/bin/perl use Data::Dumper; use HTML::TreeBuilder; use strict; die "must input filename" unless @ARGV; foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new; # empty tree $tree->parse_file($file_name); print "Hey, here's a dump of the parse tree of $file_name:\n"; # $tree->dump; # a method we inherit from HTML::Element # Now that we're done with it, we must destroy it. my %table; ( $table{root}, $table{cond}, $table{'cond-alternatives'}, $table{action}, $table{'action-entries'} ) = $tree->find_by_tag_name('table'); my %td; map { $td{$_} = [ $table{$_}->find_by_tag_name('td') ] } (keys %tabl +e); my %x; map { my $field = $_; map { push @{$x{$field}}, $_->content_array_ref } @{$td{$_}} } (keys %td); printf "cond has %s", Dumper $x{cond}; $tree = $tree->delete; }
Condition
 
Under 50.00
Pays by check
Pays by credit card
Unknown customer
Condition Alternatives
- y - y - n - n
y y n n y y n n
- - - - - - - -
y n y n y n y n
Action
 
Ring up sale
Check from local database
Call supervisor
Check credit card database
Action Entries
1
1
1 1 1
1 1 1

2001-03-04 Edit by Corion : Fixed up pasted HTML

Replies are listed 'Best First'.
Re: Sample HTML::TreeBuilder Usage
by Ido (Hermit) on Mar 05, 2001 at 02:52 UTC
    You use there map twice without using its return value, when you don't need the return list, why not using a regular foreach loop?