Hi guys, I am a new Perl programmer, I started this summer working for a company writing web-crawlers, parsing various sites data. When I first ran into websites with JavaScript I found various work arounds, basically doing what the JS did without using the JS. My current issue is with an ASP.NET website. I was reading up on various tools I could use and I began to work with : HTML::TreeBuilderX::ASP_NET. Other modules that I have been using are WWW::Mechanize,LWP::UserAgent, HTML::TokeParser. The doPostBack JS methods were to complicated for me to understand to simply replicate the actions. The problem which I am currently running into is their are two separate links that are used on the site. The first is a <input type=image ..> which is simple enough to grab the content within that link.

my @inputs = $mech->find_all_inputs( type => 'image', name_regex => qr/$pattern1/, ); #the Pattern is simply a name that is unique to all the buttons I want + to access foreach my $i(@inputs){ my $temp = $i->name(); $mech->click_button(name => $temp); $tempContent = $mech->content; &getDetails($tempContent); # this is another function using tokepa +rser to grab info from the page linked by the images content $goToMoreDetails = $mech->uri();#variable to grab the current url, f +or future use. $mech->back(); #returning to original page. }

This code works fine, the problem is I need to go to the next page that has a new list if <input type="image"...> links, the "Button" that does this is hyper link with a doPostBack, using an img (not INPUT type IMAGE) as the click-able link.

<a title="Next Page" href="javascript:__doPostBack(&#39;ctl00$ContentB +ody$CtrlNotice$grdItems$ctl00$ctl03$ctl01$ctl26&#39;,&#39;&#39;)"><im +g title="Next Page" class="image2" src="/Images/Icons/next_16.gif" al +t="Next Page" style="border-width:0px;" /></a>

Using the HTML::TreeBuilderX::ASP_NET module I wrote the following code to handle this.

my $resp = $mech->response(); my $root = HTML::TreeBuilder->new_from_content( $resp->content ); #The next part is to grab the link element, it is a hack job, I wasn' +t able to get both tag-> a and title eq 'Next Page' in one line, whic +h would be cleaner. my @a_tags = $root->look_down( '_tag' , 'a' ); foreach my $atag(@a_tags){ my $temp = $atag->as_HTML; if($temp =~ 'title="Next Page"'){ $a = $atag; } } #This is code from the CPAN website for the module #It was noted to use an ->httpResponse, which doesn't exist #Since the response is the result of the request I have replaced it wi +th my $aspnet = HTML::TreeBuilderX::ASP_NET->new({ element => $a , baseURL =>$mech->uri ## takes into account posting r +edirects }); my $response = $mech->request($aspnet->httpRequest); my $content = $response->content; print $content; # I wanted to see if I got the proper html content

This code only grabs the current page I was on without going to the next page for some reason. So what I tried was actually sending the concatenated string created by the asp_net module like this

my $content = $mech->get($aspnet->httpRequest->as_string); print $content;

Passing the url as a string is how I would normally use the get i.e. $mech->get("http://www.google.ca"); however THIS is what results in the error. The "string" is to large for the get request. Is there any way I can extend the get requests max length so I can pass in the entire string, or is there something simple I am missing here to get the next pages content? Thanks in advance to anyone who looks at this. Liam


In reply to "Request URI Too Large (The size of the required header is too large...." by chronicdose

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.