kykyxixi has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am new here, to seek perl experts' wisdom. I want to output <table> in a HTML file to a plain text file, but keep the pretty table format as it would be displayed in the web browser. What is the best way to approach this? I searched CPAN first, and found a few modules going the opposite direction (i.e., taking tabular data as input and output in HTML table format), but couldn't find what I want.

20060811 Janitored by Corion: Fixed <table> tag from screwing up the page

Replies are listed 'Best First'.
Re: output HTML table
by gellyfish (Monsignor) on Aug 11, 2006 at 11:50 UTC

    I've always found:

    my $text = `lynx -dump $htmlfile`;
    to work quite well.

    /J\

      or
      my $text = `links -dump $htmlfile`;
      (in my experience slightly better). Third alternative:
      my $text = `html2text $htmlfile`;
      It is not a trivial task to convert a HTML file to a textfile.
Re: output HTML table
by Hofmator (Curate) on Aug 11, 2006 at 11:55 UTC
    Maybe using Text::Table does the trick ...

    -- Hofmator

Re: output HTML table
by madbombX (Hermit) on Aug 11, 2006 at 11:37 UTC
    Have you considered using Perl's builtin formatting mechanisms? perlform. That seems to me that that's what your goal is for the output.

    For the input, creating a regex from the HTML file. If the <td>'s are all on their own line, then the regex can filter them out and you can push onto an array and manage your data like that.

    Eric

      Eric: Thanks for the quick response. The perlform would work for me if my table format is fixed, however, the column width for each table in different html files need to be adjusted and I don't know how to treat it as a variable in perlform. I also don't know how to treat cells which span a few columns in html.
        Perhaps some of the newer functionality in Perl6::Form would help you then. I know it can handle some variable width column data on the output side.

        Eric