john.tm has asked for the wisdom of the Perl Monks concerning the following question:

I have some HTML files and I wish to extract two tables from each file. Is it possible to extract from both tables in one sweep? The column headers are slightly different, this script works but looks a bit long winded, is there any way i can have 'Schedule Name | Node Name' as a header for the last column and get both tables in one go.? tabes are depth/count 2.1 and 2.2.

#!/usr/bin/perl use strict; use warnings; #use diagnostics; use HTML::TableExtract; use Text::Table; ##my $sched = qr/Schedule Name|Node Name/; my $html = "c:\\Testin.htm"; my $out = "c:\\Testout.csv"; open( my $ofh, ">", $out ) or die "oops" ; my $headers = [ 'Status', 'Results', 'Schedule Name']; my $table_extract = HTML::TableExtract->new(headers => $headers); my $table_output = Text::Table->new(); $table_extract->parse_file($html); my ($table) = $table_extract->tables or die "no emails to process\ +n"; foreach my $row ($table->rows) { $table_output->load($row); print " ", join(',',grep defined, @$row), "\n"; print $ofh " ", join(',',grep defined, @$row ), "\n"; } $headers = [ 'Status', 'Results', 'Node Name']; $table_extract = HTML::TableExtract->new(headers => $headers); $table_output = Text::Table->new(); $table_extract->parse_file($html); ($table) = $table_extract->tables; foreach my $row ($table->rows) { $table_output->load($row); print " ", join(',',grep defined, @$row),"\n"; print $ofh " ", join(',',grep defined, @$row), "\n"; }

Replies are listed 'Best First'.
Re: perl html table extract to get data from two tables.
by nlwhittle (Beadle) on Jan 03, 2015 at 22:06 UTC
    You're on the right track with your question. The mostly duplicate code that handles the two tables should probably be put into a subroutine which your main script can then call and pass the header array references to as an argument:
    sub get_table { my $headers = shift; $table_extract = HTML::TableExtract->new(headers => $headers); $table_output = Text::Table->new(); $table_extract->parse_file($html); ($table) = $table_extract->tables; foreach my $row ($table->rows) { $table_output->load($row); print " ", join(',',grep defined, @$row),"\n"; print $ofh " ", join(',',grep defined, @$row), "\n"; } } # calling the sub for each table get_table( [ 'Status', 'Results', 'Schedule Name' ] ); get_table( [ 'Status', 'Results', 'Node Name' ] );
    --Nick