emmiesix has asked for the wisdom of the Perl Monks concerning the following question:
I found a really nice program that will convert simple html tables into ascii (e.g., |-separated) tables. This is very useful as I am using the data for scientific work so I want to be able to parse it easily.
I successfully turned the program example into a module which takes my html and outputs ascii. Neat! Now the problem is that I don't really want to print the ascii, I want to store it (either string or array, doesn't matter). For some odd reason, if I change that 'print DumpTable(...' line in the convert sub into something like:
my $test = DumpTable(...
I don't get a text string. I've tried this lots of different ways and I'm afraid I'm just over my head a bit with this style of perl (I really, really hate the "operators" style... mostly because I don't understand what the heck is going on, and where things are actually stored, etc). So I guess as a side note, now that I've gone through the beginner perl book, where do I go to learn about this kind of perl coding?
I've attached the module file below:
package htmltoascii; use strict; use HTML::TreeBuilder; use Text::ASCIITable; use List::Util qw(max); sub convert { my $html = shift; my $t = HTML::TreeBuilder->new(); $t->parse($html); $t->eof; print DumpTable( $_ ), $/, $/ for $t->find_by_tag_name('table') ; } sub DumpTable { my $ht = shift; die "$ht is not a table" unless $ht->tag eq 'table'; my $tt = Text::ASCIITable::->new; my @co; my @da; my $da = []; for my $ro ( @{ $ht->content() } ) { if( $ro->tag eq 'tr' ) { push @da, $da if @$da; $da = []; for my $ce ( @{ $ro->content() } ) { if( $ce->tag eq 'td' ) { if( $ce->look_down( '_tag', 'table' ) ) { my $string = ''; for my $i ( @{ $ce->content() } ) { if( not ref $i ) { $string .= $i; } elsif( $i->tag eq 'table' ) { $string .= "\n"; $string .= DumpTable($i); $string .= "\n"; } else { $string .= $i->as_text; } } push @$da, $string; } else { push @$da, $ce->as_text; } } elsif( $ce->tag eq 'th' ) { push @co, $ce->as_text; } } } } push @da, $da if @$da; if(@co) { $tt->setCols(@co); } else { use List::Util qw(max); my $max = 1 + max( 0, map { $#$_ } @da ); $tt->setCols( (' ') x $max ); $tt->setOptions( hide_HeadRow => 1 ); $tt->setOptions( hide_HeadLine => 1 ); } $tt->addRow($_) for @da; $tt->setOptions( 'drawRowLine', 1) if $ht->attr('border'); # return $tt->draw(); return $tt->draw( [ '.=', '=.', '-', '-' ], # .=-----------=. [ '|', '|', '|' ], # | info | info | [ '|-', '-|', '=', '=' ], # |-===========-| [ '|', '|', '|' ], # | info | info | [ "'=", "='", '-', '-' ], # '=-----------=' [ '|=', '=|', '-', '*' ] # rowseperator ); } 1;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: odd text object problem - how to store as string?
by ikegami (Patriarch) on Jun 24, 2011 at 21:30 UTC | |
by emmiesix (Novice) on Jun 24, 2011 at 21:47 UTC | |
by ikegami (Patriarch) on Jun 24, 2011 at 22:40 UTC | |
by emmiesix (Novice) on Jun 25, 2011 at 00:15 UTC | |
by ~~David~~ (Hermit) on Jun 24, 2011 at 22:34 UTC |