vxp has asked for the wisdom of the Perl Monks concerning the following question:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html lang="en-US"><head><title>No title</title> <link rev="made" href="mailto:noreply%40domain.com"> <link rel="stylesheet" type="text/css" href="/css/eman.css"> <script language="javascript" src="/js/eman.js" type="text/javascript" +></script> </head><body onload="eman_form_focus(); " bgcolor="#FFFFFF"> <!-- HOSTNAME: hostname.domain.com --> <form action=View.pcgi> <table border=0 bgcolor='#cccc99' width='100%'><tr><td align=left><fon +t size=+1> WLC download configuration jobs<//font></td><td align=righ +t><input type="submit" name="selector.showb" value="Use Selection Bar +"></td></tr></table><hr width='100%' noshade><table width='625' cells +pacing=0 cellmarging=0 cellpadding=10 border=1> <table class="datatbl" border="1"> <tr align="left" valign="top"> <th class="colhdrinactive" style="text-align : left; "><a class="c +olsortlink" href="http://hostname.domain.com/OPDATA/Config/View.pcgi? +table_sid=bbe93377ed6087e8fa79f7a135af7b2a&table_seq=2&sortby=0&start +row=0&DEVICE=WLC&JOB_TYPE=download&TITLE=1&JOB_STATUS=any" title="Job + Description">Job Description</a></th> <th class="colhdrinactive" style="text-align : left; "><a class="c +olsortlink" href="http://hostname.domain.com/OPDATA/Config/View.pcgi? +table_sid=bbe93377ed6087e8fa79f7a135af7b2a&table_seq=2&sortby=1&start +row=0&DEVICE=WLC&JOB_TYPE=download&TITLE=1&JOB_STATUS=any" title="Job + Owner">Job Owner</a></th> <th class="colhdrinactive" style="text-align : left; "><a class="c +olsortlink" href="http://hostname.domain.com/OPDATA/Config/View.pcgi? +table_sid=bbe93377ed6087e8fa79f7a135af7b2a&table_seq=2&sortby=2&start +row=0&DEVICE=WLC&JOB_TYPE=download&TITLE=1&JOB_STATUS=any" title="Job + Status">Job Status</a></th> <th class="colhdrinactive" style="text-align : left; "><a class="c +olsortlink" href="http://hostname.domain.com/OPDATA/Config/View.pcgi? +table_sid=bbe93377ed6087e8fa79f7a135af7b2a&table_seq=2&sortby=3&start +row=0&DEVICE=WLC&JOB_TYPE=download&TITLE=1&JOB_STATUS=any" title="Tim +estamp">Timestamp</a></th> </tr> <tr class="row1" valign="top" align="left"> <td><a href='Modify.pcgi?bottom=1&SESSION_ID=41f647b1a8c1e6ad9f8bd +25672459223'>WLC Download Summary</a></td> <td>eman</td> <td>running</td> <td>19:13:19 21/Jun/2009 EDT</td> </tr> <tr class="row2" valign="top" align="left"> <td><a href='Modify.pcgi?bottom=1&SESSION_ID=b533b920ee57d39133edf +75c234e8ffc'>WLC Download Summary</a></td> <td>eman</td> <td>running</td> <td>19:55:45 20/Jun/2009 EDT</td> </tr> <tr class="row1" valign="top" align="left"> <td><a href='Modify.pcgi?bottom=1&SESSION_ID=0ce53c9933be10114e6da +3b90940f458'>WLC Download Summary</a></td> <td>eman</td> <td>running</td> <td>19:51:41 19/Jun/2009 EDT</td> </tr> ... and some more stuff just like above.
The task at hand is to get the _first_ "SESSION_ID" value over there. in the example above, that'd be "SESSION_ID=41f647b1a8c1e6ad9f8bd25672459223"
I've looked at various CPAN modules, such as HTML:: TreeBuilder and such - but I'm not sure how to parse that particular field, seeing how there's nothing unique about it, other than it always being the first entry in that html input.
Any input/suggestions appreciated!
PS. Essentially, I guess, I'm looking for the equivalent of the following command, in Perl (cleaner that way :) )
grep SESSION val.html | awk -F'=' '{ print $4 }' | awk -F"'" '{ print $1 }' | head -1
[root@mybox ~]# grep SESSION val.html | awk -F'=' '{ print $4 }' | awk + -F"'" '{ print $1 }' | head -1 41f647b1a8c1e6ad9f8bd25672459223 [root@mybox ~]#
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing HTML to get a value from a specific table row
by locked_user sundialsvc4 (Abbot) on Jun 22, 2009 at 15:47 UTC | |
|
Re: Parsing HTML to get a value from a specific table row
by metaperl (Curate) on Jun 22, 2009 at 15:34 UTC | |
by vxp (Pilgrim) on Jun 22, 2009 at 20:30 UTC | |
|
Re: Parsing HTML to get a value from a specific table row
by Anonymous Monk on Jun 22, 2009 at 15:27 UTC |