in reply to How to restore the colors of a webpage when extracting data

G'day ghosh123,

From looking at the documentation, it would appear there's three basic steps you'll need to follow:

  1. Use HTML::TableExtract to get the table cells as HTML::Element objects.
  2. Use HTML::Element to get the attributes of each cell.
  3. Use the optional $format parameter of Spreadsheet::WriteExcel's write($row, $column, $token, $format) method to apply the format you retrieved from the HTML table. (See CELL FORMATTING for details.)

[Aside: I'd recommend you look at using more meaningful variable names. You appear to be assigning cell data to a variable you've called $col. That variable would suggest column, not cell, data. As such, that's a potential source of confusion and errors either now or in the future.]

-- Ken

Replies are listed 'Best First'.
Re^2: How to restore the colors of a webpage when extracting data
by ghosh123 (Monk) on Jul 17, 2013 at 08:35 UTC

    Hi ken

    Thanks for your reply.

    But I am not quite able to figure out how to use HTML::TableExtract to get the table cells as HTML::Element objects.

    I am giving a very simple html table below, using the code I have already given above, can you please show me how can I use HTML::Element and get the attributes.

    Assume that the page contains the following table :

    <html> <title>Example table </title> <head> MyTable</head><br><br> <body> <table border border ="1"> <tr> <td><FONT COLOR = "green">ID</td> <td> <FONT COLOR = "blue">NAME</td> <td> <FONT COLOR = "RED">DOB</td> </tr> <tr> <td>1</td> <td>XYZ</td> <td>1-1-2000</td> </tr> <tr> <td>2</td> <td>PQR</td> <td>1-11-2000</td> </tr> </table> </body> </html>
      "But I am not quite able to figure out how to use HTML::TableExtract to get the table cells as HTML::Element objects."

      Really? It's mentioned repeatedly thoughout the HTML::TableExtract documentation (a link I provided yesterday). Within that page, search for the string "HTML::Element" and read every section it appears in (which, by a very brief visual inspection, appears to be all of them except AUTHOR and COPYRIGHT).

      -- Ken

        Well, I was following the instructions in that HTML::TableExtract page. But was not getting anything for the $te->first_table_found call, hence the $table variable was having undefined object and as a result the rest of the code did not run.
        Would be helpful if you can point out my mistakes.