chronicdose has asked for the wisdom of the Perl Monks concerning the following question:
Hi guys, I am a new Perl programmer, I started this summer working for a company writing web-crawlers, parsing various sites data. When I first ran into websites with JavaScript I found various work arounds, basically doing what the JS did without using the JS. My current issue is with an ASP.NET website. I was reading up on various tools I could use and I began to work with : HTML::TreeBuilderX::ASP_NET. Other modules that I have been using are WWW::Mechanize,LWP::UserAgent, HTML::TokeParser. The doPostBack JS methods were to complicated for me to understand to simply replicate the actions. The problem which I am currently running into is their are two separate links that are used on the site. The first is a <input type=image ..> which is simple enough to grab the content within that link.
my @inputs = $mech->find_all_inputs( type => 'image', name_regex => qr/$pattern1/, ); #the Pattern is simply a name that is unique to all the buttons I want + to access foreach my $i(@inputs){ my $temp = $i->name(); $mech->click_button(name => $temp); $tempContent = $mech->content; &getDetails($tempContent); # this is another function using tokepa +rser to grab info from the page linked by the images content $goToMoreDetails = $mech->uri();#variable to grab the current url, f +or future use. $mech->back(); #returning to original page. }
This code works fine, the problem is I need to go to the next page that has a new list if <input type="image"...> links, the "Button" that does this is hyper link with a doPostBack, using an img (not INPUT type IMAGE) as the click-able link.
<a title="Next Page" href="javascript:__doPostBack('ctl00$ContentB +ody$CtrlNotice$grdItems$ctl00$ctl03$ctl01$ctl26','')"><im +g title="Next Page" class="image2" src="/Images/Icons/next_16.gif" al +t="Next Page" style="border-width:0px;" /></a>
Using the HTML::TreeBuilderX::ASP_NET module I wrote the following code to handle this.
my $resp = $mech->response(); my $root = HTML::TreeBuilder->new_from_content( $resp->content ); #The next part is to grab the link element, it is a hack job, I wasn' +t able to get both tag-> a and title eq 'Next Page' in one line, whic +h would be cleaner. my @a_tags = $root->look_down( '_tag' , 'a' ); foreach my $atag(@a_tags){ my $temp = $atag->as_HTML; if($temp =~ 'title="Next Page"'){ $a = $atag; } } #This is code from the CPAN website for the module #It was noted to use an ->httpResponse, which doesn't exist #Since the response is the result of the request I have replaced it wi +th my $aspnet = HTML::TreeBuilderX::ASP_NET->new({ element => $a , baseURL =>$mech->uri ## takes into account posting r +edirects }); my $response = $mech->request($aspnet->httpRequest); my $content = $response->content; print $content; # I wanted to see if I got the proper html content
This code only grabs the current page I was on without going to the next page for some reason. So what I tried was actually sending the concatenated string created by the asp_net module like this
my $content = $mech->get($aspnet->httpRequest->as_string); print $content;
Passing the url as a string is how I would normally use the get i.e. $mech->get("http://www.google.ca"); however THIS is what results in the error. The "string" is to large for the get request. Is there any way I can extend the get requests max length so I can pass in the entire string, or is there something simple I am missing here to get the next pages content? Thanks in advance to anyone who looks at this. Liam
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: "Request URI Too Large (The size of the required header is too large...."
by Anonymous Monk on Jun 28, 2011 at 14:54 UTC | |
by chronicdose (Initiate) on Jun 28, 2011 at 15:51 UTC | |
by Corion (Patriarch) on Jun 28, 2011 at 16:13 UTC |