benchtoplabs has asked for the wisdom of the Perl Monks concerning the following question:
The website i am posting to is returning data about my customer, which I intend to use to make decisions about what to "post" next, receiving more data and so on... I have done this in the past with another website where I was able to return and use the data as a cleanly formatted XML list of variables and values:$res = $ua->post($my_url, \%form_data);
In this case now, I am getting a very convoluted HTML table with numerous sub-tables and it's a real mess.. The data I need is "hidden" within the table within named tags:$response = $xml->XMLin($res->content); print "CallerLastName is:", $response->{CallerLastName}, "\n"; print "ReturnStatus is:", $response->{ReturnStatus}, "\n"; print "CallerPhoneNumber is:", $response->{CallerPhoneNumber}, "\n";
<html> <head> <title>Login Results</title> </head> <body> <h3>Database Results</h3> <table> <tr> <td>Return Status: <ReturnStatus>Done</ReturnStatus></td> </tr> </table> <table> <tr> <td>StateInfo: <StateInfo>0014378</StateInfo></td> </tr> <tr> <td>Multiple Sub Account: <MultipleSubAccount>N</MultipleSubA +ccount></t d> </tr> <tr> <td>Subscription Count: <UserCount>1</UserCount></td> </tr> <tr> <td>Default Account: <DefaultAccount>114879</DefaultAccount>< +/td> </tr> <tr> <td>Caller Phone Number: <CallerPhoneNumber>8005551212</Calle +rPhoneNumb er></td> </tr> <tr> <td>Caller House Number: <CallerHouseNumber> 123</Calle +rHouseNumb er></td> </tr> <tr> <td>Apartment Num: <ApartmentNum> </ApartmentNum></td> </tr> <tr> <td>Caller Salutation: <CallerSalutation></CallerSalutation>< +/td> </tr> <tr> <td>Caller First Name: <CallerFirstName>JOHN</CallerFirstName +></td> </tr> <tr> <td>Caller Last Name: <CallerLastName>SMITH</CallerLastName>< +/td> </tr> <tr> <td>Salutation: <Salutation></Salutation></td> </tr> <tr> <td>FirstName: <FirstName>JOHN</FirstName></td> </tr> <tr> <td>MiddleInitial: <MiddleInitial></MiddleInitial></td> </tr> <tr> <td>LastName: <LastName>SMITH</LastName></td> </tr> ..........SNIP..........
I am trying to pull out data such as their account number or name etc. Whai I am finding is that if I bring it through XMLin like so: $response = XMLin($res->content); I get an output like this:
$VAR1 = { 'body' => { 'table' => [ { 'tr' => { 'td' => { 'ReturnStatus' => 'Do +ne', 'content' => "Return +Status:\x{ a0}" } } }, { 'tr' => [ { 'td' => { 'StateInfo' => '001 +4378', 'content' => "State +Info:\x{a0 }" } }, { 'td' => { 'MultipleSubAccount +' => 'N', 'content' => "Multi +ple Sub Ac count:\x{a0}" } }, { 'td' => { 'UserCount' => '1', 'content' => "User +Count:\x{a0}" } }, { 'td' => { 'content' => "Defau +lt Account :\x{a0}", 'DefaultAccount' => + '114879' } }, { ..........SNIP..........
Looking through this mess, it looks like I can probably use it as a 3 dimensional array, (but I was hopiing for something easier.
Right now I am referencing data like so:
print "content is:", $response->{body}->{table}->{tr}->{td}->{content} +, "\n";
But this seems like a real mess, and is looking like I am going to have to write a seperate sub-routine for every different page I post to, as they are all formatted slightly different. I was hoping to do it like I have in the past where I can get it into a single level XML array.
Hopefully I haven't confused anyone too much, but am I on the right track, or is there an easier way to grab this data? Thanks for any input, and please redirect me if I posted wrong.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parse HTML Code for hidden values
by ikegami (Patriarch) on Aug 06, 2010 at 00:47 UTC | |
by benchtoplabs (Initiate) on Aug 27, 2010 at 14:39 UTC | |
by choroba (Cardinal) on Aug 30, 2010 at 14:19 UTC | |
by ikegami (Patriarch) on Sep 05, 2010 at 04:21 UTC |