in reply to Re: HTML::TableExtract problem handling merged cells across rows (OBO rowspan colspan)
in thread HTML::TableExtract problem handling merged cells across rows

Thank you for the response. You obviously spent some time looking at this.

Unfortunately, I have little or no control over the incoming data. It's from my vendor's reporting system (aka my vendor's vendor) so I can't really change it.

After reading your post, I investigated further. MS-Excel and I.E. (v8.0) both render the table data the way that the report writer intended. Everything is in the "correct and legible" column.

But Firefox (v31.4) and Chrome (v40) (and apparently perl) all render the table data differently from MS-Excel and I.E. At least to the reader, it would be considered "incorrectly and not legible".

I use "correct" and "incorrectly" as loose terms since I'm not intimately familiar with the HTML standards.

Since the report comes to me as tagged table in a text file, is there a better way I should be parsing this data other than using HTML::TableExtract?

  • Comment on Re^2: HTML::TableExtract problem handling merged cells across rows (OBO rowspan colspan)

Replies are listed 'Best First'.
Re^3: HTML::TableExtract problem handling merged cells across rows (OBO rowspan colspan)
by poj (Abbot) on Feb 27, 2015 at 16:46 UTC

    Not sure if this helps but I 'cleaned' up the tags with this regex to remove the 3D's

    s/(class|rowspan|style|colspan)=3D/$1=/g;

    #!perl use strict; use HTML::TableExtract; my $infile = 'test.htm'; open IN,'<',$infile or die "$!"; open OUT,'>','clean.htm' or die "$!"; my $html; while (<IN>){ s/(class|rowspan|style|colspan)=3D/$1=/g; print OUT $_; $html .= $_; } my @col = ('Column_1','Asset Tag','Washed Number','Asset Name','Cust C +ode','Primary IP Address'); my $te = HTML::TableExtract->new( headers=>[@col],keep_headers => 1 ) +; $te->parse( $html ); foreach my $ts ($te->tables) { print "\nLine 0 ", join(', ',$ts->row(0)); print "\nLine 1 ", join(', ',$ts->row(1)); print "\nLine 2 ", join(', ',$ts->row(2)); print "\nLine 3 ", join(', ',$ts->row(3)); print "\nLine 4 ", join(', ',$ts->row(4)); print "\n"; }
    poj
Re^3: HTML::TableExtract problem handling merged cells across rows (OBO rowspan colspan)
by Anonymous Monk on Feb 28, 2015 at 00:59 UTC