Hi, I am trying to parse a HTML file with HTML::TableExtract. The main aim is to capture the final rows (that contain "TOTAL") into array reference. How come my code below doesn't do the job?
#!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; use HTML::TableExtract; my $temp_file = do { open my $in, '<', 'myfile.html' or carp "Can't open in $!\n"; local $/ = undef; <$in>; }; #-------------------------------------------------- # Extract Element of HTML Table #-------------------------------------------------- #print Dumper $temp_file ; ( my $id ) = $temp_file =~ /([\w]+\.[\w\d]+)/ms; print "$id\n"; my $te = HTML::TableExtract->new( headers => [ 'Data set','nTP','nFP', 'nFN','nTN','sTP', 'sFP','sFN',' ','nSn', 'nPPV','nSp','nPC', 'nCC','sSn','sPPV', 'sASP', ] ); $te->parse($temp_file); my @all_table_content = $te->tables; # Here to extract the 'last' row my @total = @{ $all_table_content[0]->[-1] }; print Dumper \@all_table_content ;
The HTML file (myfile.html) that I want to parse and obtain the TOTAL result looks like this:
<html> <head> <title> scrPage </title> </head> <!-- --> <!-- jsp:setProperty name="manager" property="*" /--> <body bgcolor="#ffffff"> <h1> Assessment Score </h1> <b> Here is your confirmation ID: SP.A91389F67D1C79B4157818A8EDF2A6C2 </b> <br> <form method="get" action="http://wingless.cs.washington.edu:8080/asse +ssment/servlet"> <input type="hidden" value="submission/SP.A91389F67D1C79B4157818A8EDF2 +A6C2" name="filenameID"/> <input type="hidden" name="pageType" value="visualizationForm"/> <br> <INPUT TYPE=submit name="action" value="Visualize It"> <input type=submit name="action" value="Get Excel Spreadsheet"/> <a href=http://bio.cs.washington.edu/assessment/statistics.html>statis +tics explanation </form> <Table border = 3> <tr><th>Data set<td>nTP<td>nFP<td>nFN<td>nTN<td>sTP<td>sFP<td>sFN<td> +<td>nSn<td>nPPV<td>nSp<td>nPC<td>nCC<td>sSn<td>sPPV<td>sASP<tr><th>dm +01g<td>0<td>80<td>125<td>5795<td>0<td>8<td>7<td> <td>0<td>0<td>0.9863 +83<td>0<td>-0.0169565<td>0<td>0<td>0 <tr><th> <tr><th>Fly <td>0<td>80<td>125<td>5795<td>0<td>8<td>7<td> <td>0<td>0<td>0.986383<t +d>0<td>-0.0169565<td>0<td>0<td>0 <tr><th>Human <td>0<td>0<td>0<td>0<td>0<td>0<td>0<td> <td>NaN<td>NaN<td>NaN<td>NaN<t +d>NaN<td>NaN<td>NaN<td>NaN <tr><th>Mouse <td>0<td>0<td>0<td>0<td>0<td>0<td>0<td> <td>NaN<td>NaN<td>NaN<td>NaN<t +d>NaN<td>NaN<td>NaN<td>NaN <tr><th>Yeast <td>0<td>0<td>0<td>0<td>0<td>0<td>0<td> <td>NaN<td>NaN<td>NaN<td>NaN<t +d>NaN<td>NaN<td>NaN<td>NaN <tr><th>Total <td>0<td>80<td>125<td>5795<td>0<td>8<td>7<td> <td>0<td>0<td>0.986383<t +d>0<td>-0.0169565<td>0<td>0<td>0 </table> </body> </html>

Regards,
Edward

In reply to Problem Parsing with HTML::TableExtract by monkfan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.