Please pardon me if this question is well-hashed elsewhere,(no pun intended) but I have been digging on this for a few days, and searched everywhere I could find for the answer to no significant avail.. (Not even sure if that even meant what I thought :-)) I am using a script to query a website which contains data on clients. I am sending a "post" with %form_data using LWP and returning the response as $res like so:
$res = $ua->post($my_url, \%form_data);
The website i am posting to is returning data about my customer, which I intend to use to make decisions about what to "post" next, receiving more data and so on... I have done this in the past with another website where I was able to return and use the data as a cleanly formatted XML list of variables and values:
$response = $xml->XMLin($res->content); print "CallerLastName is:", $response->{CallerLastName}, "\n"; print "ReturnStatus is:", $response->{ReturnStatus}, "\n"; print "CallerPhoneNumber is:", $response->{CallerPhoneNumber}, "\n";
In this case now, I am getting a very convoluted HTML table with numerous sub-tables and it's a real mess.. The data I need is "hidden" within the table within named tags:
<html> <head> <title>Login Results</title> </head> <body> <h3>Database Results</h3> <table> <tr> <td>Return Status:&#160;<ReturnStatus>Done</ReturnStatus></td> </tr> </table> <table> <tr> <td>StateInfo:&#160;<StateInfo>0014378</StateInfo></td> </tr> <tr> <td>Multiple Sub Account:&#160;<MultipleSubAccount>N</MultipleSubA +ccount></t d> </tr> <tr> <td>Subscription Count:&#160;<UserCount>1</UserCount></td> </tr> <tr> <td>Default Account:&#160;<DefaultAccount>114879</DefaultAccount>< +/td> </tr> <tr> <td>Caller Phone Number:&#160;<CallerPhoneNumber>8005551212</Calle +rPhoneNumb er></td> </tr> <tr> <td>Caller House Number:&#160;<CallerHouseNumber> 123</Calle +rHouseNumb er></td> </tr> <tr> <td>Apartment Num:&#160;<ApartmentNum> </ApartmentNum></td> </tr> <tr> <td>Caller Salutation:&#160;<CallerSalutation></CallerSalutation>< +/td> </tr> <tr> <td>Caller First Name:&#160;<CallerFirstName>JOHN</CallerFirstName +></td> </tr> <tr> <td>Caller Last Name:&#160;<CallerLastName>SMITH</CallerLastName>< +/td> </tr> <tr> <td>Salutation:&#160;<Salutation></Salutation></td> </tr> <tr> <td>FirstName:&#160;<FirstName>JOHN</FirstName></td> </tr> <tr> <td>MiddleInitial:&#160;<MiddleInitial></MiddleInitial></td> </tr> <tr> <td>LastName:&#160;<LastName>SMITH</LastName></td> </tr> ..........SNIP..........

I am trying to pull out data such as their account number or name etc. Whai I am finding is that if I bring it through XMLin like so: $response = XMLin($res->content); I get an output like this:

$VAR1 = { 'body' => { 'table' => [ { 'tr' => { 'td' => { 'ReturnStatus' => 'Do +ne', 'content' => "Return +Status:\x{ a0}" } } }, { 'tr' => [ { 'td' => { 'StateInfo' => '001 +4378', 'content' => "State +Info:\x{a0 }" } }, { 'td' => { 'MultipleSubAccount +' => 'N', 'content' => "Multi +ple Sub Ac count:\x{a0}" } }, { 'td' => { 'UserCount' => '1', 'content' => "User +Count:\x{a0}" } }, { 'td' => { 'content' => "Defau +lt Account :\x{a0}", 'DefaultAccount' => + '114879' } }, { ..........SNIP..........

Looking through this mess, it looks like I can probably use it as a 3 dimensional array, (but I was hopiing for something easier.

Right now I am referencing data like so:

print "content is:", $response->{body}->{table}->{tr}->{td}->{content} +, "\n";

But this seems like a real mess, and is looking like I am going to have to write a seperate sub-routine for every different page I post to, as they are all formatted slightly different. I was hoping to do it like I have in the past where I can get it into a single level XML array.

Hopefully I haven't confused anyone too much, but am I on the right track, or is there an easier way to grab this data? Thanks for any input, and please redirect me if I posted wrong.


In reply to Parse HTML Code for hidden values by benchtoplabs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.