Hi guys, I am a new Perl programmer, I started this summer working for a company writing web-crawlers, parsing various sites data. When I first ran into websites with JavaScript I found various work arounds, basically doing what the JS did without using the JS. My current issue is with an ASP.NET website. I was reading up on various tools I could use and I began to work with : HTML::TreeBuilderX::ASP_NET. Other modules that I have been using are WWW::Mechanize,LWP::UserAgent, HTML::TokeParser. The doPostBack JS methods were to complicated for me to understand to simply replicate the actions. The problem which I am currently running into is their are two separate links that are used on the site. The first is a <input type=image ..> which is simple enough to grab the content within that link.
my @inputs = $mech->find_all_inputs( type => 'image', name_regex => qr/$pattern1/, ); #the Pattern is simply a name that is unique to all the buttons I want + to access foreach my $i(@inputs){ my $temp = $i->name(); $mech->click_button(name => $temp); $tempContent = $mech->content; &getDetails($tempContent); # this is another function using tokepa +rser to grab info from the page linked by the images content $goToMoreDetails = $mech->uri();#variable to grab the current url, f +or future use. $mech->back(); #returning to original page. }
This code works fine, the problem is I need to go to the next page that has a new list if <input type="image"...> links, the "Button" that does this is hyper link with a doPostBack, using an img (not INPUT type IMAGE) as the click-able link.
<a title="Next Page" href="javascript:__doPostBack('ctl00$ContentB +ody$CtrlNotice$grdItems$ctl00$ctl03$ctl01$ctl26','')"><im +g title="Next Page" class="image2" src="/Images/Icons/next_16.gif" al +t="Next Page" style="border-width:0px;" /></a>
Using the HTML::TreeBuilderX::ASP_NET module I wrote the following code to handle this.
my $resp = $mech->response(); my $root = HTML::TreeBuilder->new_from_content( $resp->content ); #The next part is to grab the link element, it is a hack job, I wasn' +t able to get both tag-> a and title eq 'Next Page' in one line, whic +h would be cleaner. my @a_tags = $root->look_down( '_tag' , 'a' ); foreach my $atag(@a_tags){ my $temp = $atag->as_HTML; if($temp =~ 'title="Next Page"'){ $a = $atag; } } #This is code from the CPAN website for the module #It was noted to use an ->httpResponse, which doesn't exist #Since the response is the result of the request I have replaced it wi +th my $aspnet = HTML::TreeBuilderX::ASP_NET->new({ element => $a , baseURL =>$mech->uri ## takes into account posting r +edirects }); my $response = $mech->request($aspnet->httpRequest); my $content = $response->content; print $content; # I wanted to see if I got the proper html content
This code only grabs the current page I was on without going to the next page for some reason. So what I tried was actually sending the concatenated string created by the asp_net module like this
my $content = $mech->get($aspnet->httpRequest->as_string); print $content;
Passing the url as a string is how I would normally use the get i.e. $mech->get("http://www.google.ca"); however THIS is what results in the error. The "string" is to large for the get request. Is there any way I can extend the get requests max length so I can pass in the entire string, or is there something simple I am missing here to get the next pages content? Thanks in advance to anyone who looks at this. Liam
In reply to "Request URI Too Large (The size of the required header is too large...." by chronicdose
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |