in reply to Re: duplicate table with HTML::TreeBuilder look_down method
in thread duplicate table with HTML::TreeBuilder look_down method

Thanks everyone. Kcott, grep would work but I am hopping to use the filters in the look_down. Ken, the TD appears twice in my output and I believe you hit the nail of the issue. I am still working on my filters in the look_down. If I drop _tag => "td", i loss tr because it wasn't in the fitler. What is the correct syntax to nest multiple tags and classes in look_down filters? MY CODE:
my $h = HTML::TreeBuilder->new; $h->parse_file($tsmin); my @warnings = $h->look_down( _tag => "td", class => qr/Alt(Warning|Error)/ ); foreach my $warning (@warnings) { my @filtered = $warning->as_HTML( ); say "dump of my @filtered"; say $fh2 @filtered; }
Standard Input:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +"> <meta name="GENERATOR" content="TSM Reporting"> <meta name="ProgId" content="FrontPage.Editor.Document"> <title>TSM Operational Reporting</title> </head> <DIV class=HeaderBar>Daily Report TSM 24 hour Report for TSM1TSG gener +ated at 2015-05-12 09:00:26 on DIRECTOR covering 2015-05-11 09:00:26 +to 2015-05-12 09:00:25 </DIV> <body> <table border="0" width="100%%"> <DIV class=FooterBar>Server name: <a href="http://TSM1T.example.com:18 +80"> TSM1T</a>, platform: Linux/ppc64, version: 6.3.4.200, date/time: + 05/12/2015 09:00:01</DIV> <tr><td width="100%"><p> <DIV class=HeaderBar>Client Schedules</DIV> <TABLE class=HeaderFrame height=100 cellSpacing=0 cols=3 cellPadding=0 + width="100%" border=0 align="left"> <TR vAlign=top height=100> <TD vAlign=top width="100%" height="100"> <DIV style="overflow: auto; width: "100%"; height: 200; valign: +top"> <TABLE cellSpacing=0 cols=4 cellPadding=0 width="100%" border=0 +height="100"> <TR height=25 nowrap> <TD class=HeaderTitleNoVLine height="14" width="10">&nbsp;</ +TD> <TD class=HeaderTitle noWrap align=left height="14">Status</ +TD> <TD class=HeaderTitle noWrap align=left height="14">Results< +/TD> <TD class=HeaderTitle noWrap align=left height="14">Schedule + Start</TD> <TD class=HeaderTitle noWrap align=left height="14">Actual S +tart</TD> <TD class=HeaderTitle noWrap align=left height="14">Schedule + Name</TD> <TD class=HeaderTitle noWrap align=left height="14">Node Nam +e</TD> <TD class=HeaderTitle noWrap align=left height="14">Domain N +ame</TD></TR> <TR class=AltLight height=22> <TD class=AltLightNoVline align=middle height="17" width="10 +"> </TD> <TD class=AltLight align=left height="17">Completed</TD> <TD class=AltLight align=left height="17">Successful</TD> <TD class=AltLight align=left height="17">2015-05-11-17.00</ +TD> <TD class=AltLight align=left height="17">2015-05-11-17.10</ +TD> <TD class=AltLight align=left height="17">DAILYBACKUP_5PM</T +D> <TD class=AltLight align=left height="17">ServerA</TD> <TD class=AltLight align=left height="17">ST10_DOMAIN</TD></ +TR> <TR class=AltWarning height=22> <TD class=AltWarningNoVline align=middle height="17" width=" +10"> </TD> <TD class=AltWarning align=left height="17">Missed</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">2015-05-11-18.00 +</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">DAILYBACKUP_6PM< +/TD> <TD class=AltWarning align=left height="17">ServerB</TD> <TD class=AltWarning align=left height="17">ST10_DOMAIN</TD> +</TR> <TR class=AltWarning height=22> <TD class=AltWarningNoVline align=middle height="17" width=" +10"> </TD> <TD class=AltWarning align=left height="17">Missed</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">2015-05-11-18.00 +</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">NJDLYBACKUP_6PM< +/TD> <TD class=AltWarning align=left height="17">ServerC</TD> <TD class=AltWarning align=left height="17">ST13_DOMAIN</TD> +</TR> <TR class=AltDark height=22> <TD class=AltDarkNoVline align=middle height="17" width="10" +> </TD> <TD class=AltDark align=left height="17">QATSWAS85</TD> <TD class=AltDark align=left height="17">37899</TD> <TD class=AltDark align=left height="17">104,113</TD> <TD class=AltDark align=left height="17">617</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">25</TD> <TD class=AltDark align=left height="17">13</TD> <TD class=AltDark align=left nowrap height="17">251.30 MB</T +D> <TD class=AltDark align=left height="17">00:00:58</TD> <TD class=AltDark align=left height="17">4,378.98</TD> <TD class=AltDark align=left height="17">0%</TD> </TR> <TR class=AltLight height=22> <TD class=AltLightNoVline align=middle height="17" width="10 +"> </TD> <TD class=AltLight align=left height="17">ServerD</TD> <TD class=AltLight align=left height="17">38048</TD> <TD class=AltLight align=left height="17">31,461</TD> <TD class=AltLight align=left height="17">51</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">2</TD> <TD class=AltLight align=left height="17">2</TD> <TD class=AltLight align=left nowrap height="17">24.14 MB</T +D> <TD class=AltLight align=left height="17">00:00:12</TD> <TD class=AltLight align=left height="17">1,946.00</TD> <TD class=AltLight align=left height="17">0%</TD> </TR> </TABLE> </DIV></TD> </TR></TABLE> </td> </tr> <tr><td width="100%"><p>
MY OUTPUT:
<td align="middle" class="AltWarningNoVline" height="17" width="10"></ +td> <td align="left" class="AltWarning" height="17">Missed</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">2015-05-11-18.00</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">DAILYBACKUP_6PM</td> <td align="left" class="AltWarning" height="17">ServerB</td> <td align="left" class="AltWarning" height="17">ST10_DOMAIN</td> <td align="middle" class="AltWarningNoVline" height="17" width="10"></ +td> <td align="left" class="AltWarning" height="17">Missed</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">2015-05-11-18.00</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">NJDLYBACKUP_6PM</td> <td align="left" class="AltWarning" height="17">ServerC</td> <td align="left" class="AltWarning" height="17">ST13_DOMAIN</td>

Replies are listed 'Best First'.
Re^3: duplicate table with HTML::TreeBuilder look_down method
by jeffa (Bishop) on May 14, 2015 at 17:22 UTC

    I see what you are trying to do now. You want the first set of <td> elements to be separated from the second set (and any others that might happen to match the search term), correct? There are quite a few ways to do that, this way takes advantage of capturing all <tr> and <td> elements and then uses the presence of a <tr> element to put the next set of <td> elements into a new anonymous array reference:

    Output:

    $VAR1 = [ [ '<td align="middle" class="AltWarningNoVline" height="17" width="1 +0"></td>', '<td align="left" class="AltWarning" height="17">Missed</td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">2015-05-11-18.00< +/td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">DAILYBACKUP_6PM</ +td>', '<td align="left" class="AltWarning" height="17">ServerB</td>', '<td align="left" class="AltWarning" height="17">ST10_DOMAIN</td>' ], [ '<td align="middle" class="AltWarningNoVline" height="17" width="1 +0"></td>', '<td align="left" class="AltWarning" height="17">Missed</td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">2015-05-11-18.00< +/td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">NJDLYBACKUP_6PM</ +td>', '<td align="left" class="AltWarning" height="17">ServerC</td>', '<td align="left" class="AltWarning" height="17">ST13_DOMAIN</td>' ] ];

    There is lots of room for improvement in the code that i wrote, but hopefully this works for you or at least helps you realize your goal.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Thanks! I am trying to loop through the array ref but I am can't to able to. Am I missing something? My code:
      foreach my $td (\@tds) { say $td; }
      My output:
      ARRAY(0x2ae4b40)
        "I am trying to loop through the array ref but I am can't to able to. Am I missing something?"

        Your problem is looping through an arrayref!

        An arrayref is a single scalar. In this case: "ARRAY(0x2ae4b40)".

        You probably want to loop through the actual array (@tds), not the arrayref (\@tds).

        This will output each element of @tds:

        for my $td (@tds) { say $td; }

        -- Ken