Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: duplicate table with HTML::TreeBuilder look_down method

by codiac (Beadle)
on May 14, 2015 at 10:48 UTC ( #1126642=note: print w/replies, xml ) Need Help??


in reply to duplicate table with HTML::TreeBuilder look_down method

It's duplicated because the tr has the same class as the tds, so the first node matched contains the tr, which contains all the tds, and then the td's make up the rest of the nodes matched. Just add an extra parameter to the lookdown call so it only searches for td elements.
  • Comment on Re: duplicate table with HTML::TreeBuilder look_down method

Replies are listed 'Best First'.
Re^2: duplicate table with HTML::TreeBuilder look_down method
by mazdajai (Novice) on May 14, 2015 at 16:43 UTC
    Thanks everyone. Kcott, grep would work but I am hopping to use the filters in the look_down. Ken, the TD appears twice in my output and I believe you hit the nail of the issue. I am still working on my filters in the look_down. If I drop _tag => "td", i loss tr because it wasn't in the fitler. What is the correct syntax to nest multiple tags and classes in look_down filters? MY CODE:
    my $h = HTML::TreeBuilder->new; $h->parse_file($tsmin); my @warnings = $h->look_down( _tag => "td", class => qr/Alt(Warning|Error)/ ); foreach my $warning (@warnings) { my @filtered = $warning->as_HTML( ); say "dump of my @filtered"; say $fh2 @filtered; }
    Standard Input:
    <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +"> <meta name="GENERATOR" content="TSM Reporting"> <meta name="ProgId" content="FrontPage.Editor.Document"> <title>TSM Operational Reporting</title> </head> <DIV class=HeaderBar>Daily Report TSM 24 hour Report for TSM1TSG gener +ated at 2015-05-12 09:00:26 on DIRECTOR covering 2015-05-11 09:00:26 +to 2015-05-12 09:00:25 </DIV> <body> <table border="0" width="100%%"> <DIV class=FooterBar>Server name: <a href="http://TSM1T.example.com:18 +80"> TSM1T</a>, platform: Linux/ppc64, version: 6.3.4.200, date/time: + 05/12/2015 09:00:01</DIV> <tr><td width="100%"><p> <DIV class=HeaderBar>Client Schedules</DIV> <TABLE class=HeaderFrame height=100 cellSpacing=0 cols=3 cellPadding=0 + width="100%" border=0 align="left"> <TR vAlign=top height=100> <TD vAlign=top width="100%" height="100"> <DIV style="overflow: auto; width: "100%"; height: 200; valign: +top"> <TABLE cellSpacing=0 cols=4 cellPadding=0 width="100%" border=0 +height="100"> <TR height=25 nowrap> <TD class=HeaderTitleNoVLine height="14" width="10">&nbsp;</ +TD> <TD class=HeaderTitle noWrap align=left height="14">Status</ +TD> <TD class=HeaderTitle noWrap align=left height="14">Results< +/TD> <TD class=HeaderTitle noWrap align=left height="14">Schedule + Start</TD> <TD class=HeaderTitle noWrap align=left height="14">Actual S +tart</TD> <TD class=HeaderTitle noWrap align=left height="14">Schedule + Name</TD> <TD class=HeaderTitle noWrap align=left height="14">Node Nam +e</TD> <TD class=HeaderTitle noWrap align=left height="14">Domain N +ame</TD></TR> <TR class=AltLight height=22> <TD class=AltLightNoVline align=middle height="17" width="10 +"> </TD> <TD class=AltLight align=left height="17">Completed</TD> <TD class=AltLight align=left height="17">Successful</TD> <TD class=AltLight align=left height="17">2015-05-11-17.00</ +TD> <TD class=AltLight align=left height="17">2015-05-11-17.10</ +TD> <TD class=AltLight align=left height="17">DAILYBACKUP_5PM</T +D> <TD class=AltLight align=left height="17">ServerA</TD> <TD class=AltLight align=left height="17">ST10_DOMAIN</TD></ +TR> <TR class=AltWarning height=22> <TD class=AltWarningNoVline align=middle height="17" width=" +10"> </TD> <TD class=AltWarning align=left height="17">Missed</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">2015-05-11-18.00 +</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">DAILYBACKUP_6PM< +/TD> <TD class=AltWarning align=left height="17">ServerB</TD> <TD class=AltWarning align=left height="17">ST10_DOMAIN</TD> +</TR> <TR class=AltWarning height=22> <TD class=AltWarningNoVline align=middle height="17" width=" +10"> </TD> <TD class=AltWarning align=left height="17">Missed</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">2015-05-11-18.00 +</TD> <TD class=AltWarning align=left height="17"></TD> <TD class=AltWarning align=left height="17">NJDLYBACKUP_6PM< +/TD> <TD class=AltWarning align=left height="17">ServerC</TD> <TD class=AltWarning align=left height="17">ST13_DOMAIN</TD> +</TR> <TR class=AltDark height=22> <TD class=AltDarkNoVline align=middle height="17" width="10" +> </TD> <TD class=AltDark align=left height="17">QATSWAS85</TD> <TD class=AltDark align=left height="17">37899</TD> <TD class=AltDark align=left height="17">104,113</TD> <TD class=AltDark align=left height="17">617</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">0</TD> <TD class=AltDark align=left height="17">25</TD> <TD class=AltDark align=left height="17">13</TD> <TD class=AltDark align=left nowrap height="17">251.30 MB</T +D> <TD class=AltDark align=left height="17">00:00:58</TD> <TD class=AltDark align=left height="17">4,378.98</TD> <TD class=AltDark align=left height="17">0%</TD> </TR> <TR class=AltLight height=22> <TD class=AltLightNoVline align=middle height="17" width="10 +"> </TD> <TD class=AltLight align=left height="17">ServerD</TD> <TD class=AltLight align=left height="17">38048</TD> <TD class=AltLight align=left height="17">31,461</TD> <TD class=AltLight align=left height="17">51</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">0</TD> <TD class=AltLight align=left height="17">2</TD> <TD class=AltLight align=left height="17">2</TD> <TD class=AltLight align=left nowrap height="17">24.14 MB</T +D> <TD class=AltLight align=left height="17">00:00:12</TD> <TD class=AltLight align=left height="17">1,946.00</TD> <TD class=AltLight align=left height="17">0%</TD> </TR> </TABLE> </DIV></TD> </TR></TABLE> </td> </tr> <tr><td width="100%"><p>
    MY OUTPUT:
    <td align="middle" class="AltWarningNoVline" height="17" width="10"></ +td> <td align="left" class="AltWarning" height="17">Missed</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">2015-05-11-18.00</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">DAILYBACKUP_6PM</td> <td align="left" class="AltWarning" height="17">ServerB</td> <td align="left" class="AltWarning" height="17">ST10_DOMAIN</td> <td align="middle" class="AltWarningNoVline" height="17" width="10"></ +td> <td align="left" class="AltWarning" height="17">Missed</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">2015-05-11-18.00</td> <td align="left" class="AltWarning" height="17"></td> <td align="left" class="AltWarning" height="17">NJDLYBACKUP_6PM</td> <td align="left" class="AltWarning" height="17">ServerC</td> <td align="left" class="AltWarning" height="17">ST13_DOMAIN</td>

      I see what you are trying to do now. You want the first set of <td> elements to be separated from the second set (and any others that might happen to match the search term), correct? There are quite a few ways to do that, this way takes advantage of capturing all <tr> and <td> elements and then uses the presence of a <tr> element to put the next set of <td> elements into a new anonymous array reference:

      Output:

      $VAR1 = [ [ '<td align="middle" class="AltWarningNoVline" height="17" width="1 +0"></td>', '<td align="left" class="AltWarning" height="17">Missed</td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">2015-05-11-18.00< +/td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">DAILYBACKUP_6PM</ +td>', '<td align="left" class="AltWarning" height="17">ServerB</td>', '<td align="left" class="AltWarning" height="17">ST10_DOMAIN</td>' ], [ '<td align="middle" class="AltWarningNoVline" height="17" width="1 +0"></td>', '<td align="left" class="AltWarning" height="17">Missed</td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">2015-05-11-18.00< +/td>', '<td align="left" class="AltWarning" height="17"></td>', '<td align="left" class="AltWarning" height="17">NJDLYBACKUP_6PM</ +td>', '<td align="left" class="AltWarning" height="17">ServerC</td>', '<td align="left" class="AltWarning" height="17">ST13_DOMAIN</td>' ] ];

      There is lots of room for improvement in the code that i wrote, but hopefully this works for you or at least helps you realize your goal.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        Thanks! I am trying to loop through the array ref but I am can't to able to. Am I missing something? My code:
        foreach my $td (\@tds) { say $td; }
        My output:
        ARRAY(0x2ae4b40)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1126642]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (1)
As of 2023-09-24 04:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?