amitsq has asked for the wisdom of the Perl Monks concerning the following question:

hi,

so i am totally new and over challenged with proxies and all that web programming, so pls bear with me. I am currently trying to access with the LWP-Modul a (https-)webpage, to fill in its form, submit it and getting the result page. While I am able to get the content of the first page with the form, the response via Post is always empty. While I am trying to recreate the request-request, i see there is following line in the request-header :

Proxy-Authorization:"Negotiate YIIGzQYGKwYBBQUCoIIGwTCCBr2gMDAuBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICHgYKKwYBBAGCNwICCqKCBoc [shorted] "

There is no need to log in to the website, so I got no password or username. I tried my best to find out about it, but I got no clue how i could get the Negotiate key. So i use Active perl on Windows and the LWP-Authen-Negotiate Modul is not available for Windows. Are there any other solutions?

$param ='simpleSearchSearchForm=simpleSearchSearchForm&simpleSearchSea +rchForm%3Aj_idt379=ALLTXT&simpleSearchSearchForm%3AfpSearch=brushless ++motor&simpleSearchSearchForm%3AcommandSimpleFPSearch=Search&simpleSe +archSearchForm%3Aj_idt447=workaround&$viewState'; $request = HTTP::Request->new('POST', 'https://patentscope.wipo.int/se +arch/en/search.jsf'); $request->header('Content-Type' => 'application/x-www-form-urlencoded' +); $request->header('Referer' => "https://patentscope.wipo.int/search/en/ +search.jsf"); $request->header('Accept' => 'text/html,application/xhtml+xml,applicat +ion/xml;q=0.9,*/*;q=0.8'); $request->header('Connection' => 'keep-alive'); $request->header('Cookie' => uc $jsessionID.'; ABIW=balancer.cms41; wi +po_language=en; BSWA=balancer.bswa2'); $request->content($param); #Proxy-Authorization: Negotiate ? $response = $ua->request($request); $page = $response->decoded_content(); #empty print $page;

Replies are listed 'Best First'.
Re: Authen::Negotiate Wep-Programming
by 1nickt (Canon) on May 23, 2017 at 18:07 UTC

    You have not shown how you generate the value of $viewState, which is pertinent here. However it won't make any difference since you define the value of $param in single quotes, meaning $viewState won't be interpolated anyway.

    You also are using the most inconvenient way of building and making your request. Consider using at least HTTP::Request::Common, if not something simpler such as HTTP::Tiny.

    Sometimes sites are just so anti-scraping that you can't get there with LWP. This seems to be one of them. I don't think your problem is your proxy. I couldn't retrieve the page via LWP either. I am able to get search results using cURL with the same args: try running this from your command line and see what you get:

    curl -L 'https://patentscope.wipo.int/search/en/search.jsf' -H 'Host: +patentscope.wipo.int' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel M +ac OS X 10.11; rv:53.0) Gecko/20100101 Firefox/53.0' -H 'Accept: text +/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Acce +pt-Language: en-US,en;q=0.5' --compressed -H 'Referer: https://patent +scope.wipo.int/search/en/search.jsf' -H 'Content-Type: application/x- +www-form-urlencoded' -H 'Cookie: JSESSIONID=CD31D066DF8710F9FE9B5C9C5 +397A977.wapp1nC' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Req +uests: 1' --data 'simpleSearchSearchForm=simpleSearchSearchForm&simpl +eSearchSearchForm%3Aj_idt379=FP&simpleSearchSearchForm%3AfpSearch=bru +shless+motor&simpleSearchSearchForm%3AcommandSimpleFPSearch=Search&si +mpleSearchSearchForm%3Aj_idt447=workaround&javax.faces.ViewState=7923 +733300114075152%3A8489940171963107341' | grep 'Tokyo Parts'

    I don't know how long the tokens will be good for. That command will probably stop working after some time.

    You may have to move up to WWW::Mechanize (which handles cookies) or even Selenium::Remote::Driver (which handles JavaScript).

    Hope this helps!

    update:: add mention of Mech, thanks Corion...


    The way forward always starts with a minimal test.
Re: Authen::Negotiate Wep-Programming
by Anonymous Monk on May 23, 2017 at 13:50 UTC
Re: Authen::Negotiate Wep-Programming
by 1nickt (Canon) on May 23, 2017 at 13:09 UTC

    Please provide an SSCCE. It is impossible to help with what you have shown.

    You should be able to provide a script of < 20 lines that attempts the POST request. Remove all the other code in your program and reduce to just the HTTP transaction. Then, post that here if it doesn't do what you want.

    Also, before you even do that, please read Posting on PerlMonks.


    The way forward always starts with a minimal test.
      ok edited, i don't know if the posted code above was sufficient, so just in addition i post the full code here
      #!d:\perl\bin\perl.exe use warnings; use strict; use CGI qw(:standard); use CGI::Carp 'fatalsToBrowser'; use LWP; use HTTP::Request::Common; use HTTP::Cookies; use HTTP::Headers; use LWP::Debug qw(+); use URI::Escape qw(uri_escape_utf8 uri_escape uri_unescape); use Encode; print "Content-type:text/html;charset=UTF-8\n\n" ; our $page; my $jsessionID; my $viewState; sub extractLinks(){ $page =~ s/\"\/search(\/.[^ ]+)+\"/https:\/\/patentscope\.wipo\.int\/s +earch$1\" /g; } sub extractJSessionID(){ my $jsessionID=""; if($page =~ /(jsessionid=[A-Z0-9]*\.wapp[0-9]n[A-Z])/si){$jsessionID = + $1;} return $jsessionID; } sub extractViewStateID(){ my $ViewState=""; if($page =~ /javax.faces.ViewState:0\" value=\"(-?[0-9:]*)/si){ $ViewState = "javax.faces.ViewState=".$1;} $ViewState = join( "%3A", split(":", $ViewState) ); return $ViewState; } my $proxy = 'http://localhost:5865'; my $ua = new LWP::UserAgent(keep_alive => 0, ssl_opts => { verify_host +name => 0 }); $ua->proxy(['https', 'http', 'ftp'], $proxy); $ENV{HTTPS_PROXY} = $proxy; $ENV{'PERL_LWP_SSL_CA_PATH'} = "D:\\CA_certs\\input\\certs"; my $cookie_jar = new HTTP::Cookies(); $ua->cookie_jar($cookie_jar); $ua->agent('Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident +/5.0)'); #download init page for extracting jsession id my $response = $ua->get("https://patentscope.wipo.int/search/en/search +.jsf"); die "Couldn't get $url", $response->status_line unless $response->is_s +uccess; print $response->status_line; $page = $response -> decoded_content; extractLinks(); $jsessionID = extractJSessionID(); $viewState = extractViewStateID(); print $page; my $param ='simpleSearchSearchForm=simpleSearchSearchForm&simpleSearch +SearchForm%3Aj_idt379=ALLTXT&simpleSearchSearchForm%3AfpSearch=brushl +ess+motor&simpleSearchSearchForm%3AcommandSimpleFPSearch=Search&simpl +eSearchSearchForm%3Aj_idt447=workaround&$viewState'; #--------- problem part : my $request = HTTP::Request->new('POST', 'https://patentscope.wipo.int +/search/en/search.jsf'); $request->header('Content-Type' => 'application/x-www-form-urlencoded' +); $request->header('Referer' => "https://patentscope.wipo.int/search/en/ +search.jsf"); $request->header('Accept' => 'text/html,application/xhtml+xml,applicat +ion/xml;q=0.9,*/*;q=0.8'); $request->header('Connection' => 'keep-alive'); $request->header('Cookie' => uc $jsessionID.'\; ABIW=balancer.cms41\; +wipo_language=en\; BSWA=balancer.bswa2'); $request->content($param); $response = $ua->request($request); $page = $response->decoded_content(); print $page;
        my $proxy = 'http://localhost:5865';
        This line makes me think that we still don't have enough information to help you. What kind of proxy are you running on localhost, and why?