I found a really nice program that will convert simple html tables into ascii (e.g., |-separated) tables. This is very useful as I am using the data for scientific work so I want to be able to parse it easily.

I successfully turned the program example into a module which takes my html and outputs ascii. Neat! Now the problem is that I don't really want to print the ascii, I want to store it (either string or array, doesn't matter). For some odd reason, if I change that 'print DumpTable(...' line in the convert sub into something like:

my $test = DumpTable(...

I don't get a text string. I've tried this lots of different ways and I'm afraid I'm just over my head a bit with this style of perl (I really, really hate the "operators" style... mostly because I don't understand what the heck is going on, and where things are actually stored, etc). So I guess as a side note, now that I've gone through the beginner perl book, where do I go to learn about this kind of perl coding?

I've attached the module file below:

package htmltoascii; use strict; use HTML::TreeBuilder; use Text::ASCIITable; use List::Util qw(max); sub convert { my $html = shift; my $t = HTML::TreeBuilder->new(); $t->parse($html); $t->eof; print DumpTable( $_ ), $/, $/ for $t->find_by_tag_name('table') ; } sub DumpTable { my $ht = shift; die "$ht is not a table" unless $ht->tag eq 'table'; my $tt = Text::ASCIITable::->new; my @co; my @da; my $da = []; for my $ro ( @{ $ht->content() } ) { if( $ro->tag eq 'tr' ) { push @da, $da if @$da; $da = []; for my $ce ( @{ $ro->content() } ) { if( $ce->tag eq 'td' ) { if( $ce->look_down( '_tag', 'table' ) ) { my $string = ''; for my $i ( @{ $ce->content() } ) { if( not ref $i ) { $string .= $i; } elsif( $i->tag eq 'table' ) { $string .= "\n"; $string .= DumpTable($i); $string .= "\n"; } else { $string .= $i->as_text; } } push @$da, $string; } else { push @$da, $ce->as_text; } } elsif( $ce->tag eq 'th' ) { push @co, $ce->as_text; } } } } push @da, $da if @$da; if(@co) { $tt->setCols(@co); } else { use List::Util qw(max); my $max = 1 + max( 0, map { $#$_ } @da ); $tt->setCols( (' ') x $max ); $tt->setOptions( hide_HeadRow => 1 ); $tt->setOptions( hide_HeadLine => 1 ); } $tt->addRow($_) for @da; $tt->setOptions( 'drawRowLine', 1) if $ht->attr('border'); # return $tt->draw(); return $tt->draw( [ '.=', '=.', '-', '-' ], # .=-----------=. [ '|', '|', '|' ], # | info | info | [ '|-', '-|', '=', '=' ], # |-===========-| [ '|', '|', '|' ], # | info | info | [ "'=", "='", '-', '-' ], # '=-----------=' [ '|=', '=|', '-', '*' ] # rowseperator ); } 1;

In reply to odd text object problem - how to store as string? by emmiesix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.