output HTML table

kykyxixi has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am new here, to seek perl experts' wisdom. I want to output <table> in a HTML file to a plain text file, but keep the pretty table format as it would be displayed in the web browser. What is the best way to approach this? I searched CPAN first, and found a few modules going the opposite direction (i.e., taking tabular data as input and output in HTML table format), but couldn't find what I want.

20060811 Janitored by Corion: Fixed <table> tag from screwing up the page

Comment on output HTML table Select or Download Code

Replies are listed 'Best First'.
Re: output HTML table by gellyfish (Monsignor) on Aug 11, 2006 at 11:50 UTC
I've always found: my $text = `lynx -dump $htmlfile`; [download] to work quite well. /J\	[reply] [d/l]
Re^2: output HTML table by lima1 (Curate) on Aug 11, 2006 at 12:21 UTC
or my $text = `links -dump $htmlfile`; [download] (in my experience slightly better). Third alternative: my $text = `html2text $htmlfile`; [download] It is not a trivial task to convert a HTML file to a textfile.	[reply] [d/l] [select]
Re: output HTML table by Hofmator (Curate) on Aug 11, 2006 at 11:55 UTC
Maybe using Text::Table does the trick ... -- Hofmator	[reply]
Re: output HTML table by madbombX (Hermit) on Aug 11, 2006 at 11:37 UTC
Have you considered using Perl's builtin formatting mechanisms? perlform. That seems to me that that's what your goal is for the output. For the input, creating a regex from the HTML file. If the <td>'s are all on their own line, then the regex can filter them out and you can push onto an array and manage your data like that. Eric	[reply]
Re^2: output HTML table by kykyxixi (Initiate) on Aug 11, 2006 at 11:46 UTC
Eric: Thanks for the quick response. The perlform would work for me if my table format is fixed, however, the column width for each table in different html files need to be adjusted and I don't know how to treat it as a variable in perlform. I also don't know how to treat cells which span a few columns in html.	[reply]
Re^3: output HTML table by madbombX (Hermit) on Aug 11, 2006 at 12:03 UTC
Perhaps some of the newer functionality in Perl6::Form would help you then. I know it can handle some variable width column data on the output side. Eric	[reply]