in reply to Getting XML from DOI address

Like You can see, it stops getting xml in one moment :/. Can someone tell me what is wrong, or what I'm doing wrong? Like I said Doi redirected to Science Direct worked pretty well

You're not reading the english words you get :) I get ScienceDirect does not support the use of the crawler software. If you have any questions please contact your helpdesk.

That is kinda self explanatory

Replies are listed 'Best First'.
Re^2: Getting XML from DOI address
by hooel (Initiate) on Jul 16, 2015 at 00:10 UTC

    Ok, so this is the whole code:

    #!"E:\xamp\perl\bin\perl.exe" -T use 5.010; use CGI; use strict; use warnings; use LWP::UserAgent; use LWP::Simple; use HTML::TreeBuilder; my $q = CGI->new(); my $userag = LWP::UserAgent->new(timeout=>30); #it's illegal :( + p.s. to make what they say, delete argument from "new" +"agent => 'MyApp/0.1'" say $q->header(), $q->start_html(); my $addressdio = ""; #getting the address from form for my $param ($q->param()) { my $safe_param = $q->escapeHTML($param); say "<p><strong>$safe_param</strong>: "; for my $value ($q->param($param)) { say $q->escapeHTML($value); $addressdio = $q->escapeHTML($value); } say '</p>'; } #the whole process of getting xml from site and giving it into + the string variable #my $dioaddress = "http://dx.doi.org/10.1016/j.nuclphysa.2015. +05.005"; my $reqforxml = new HTTP::Request GET => $addressdio; my $res = $userag->request($reqforxml); my $content = $res->content; # my $content = get("$addressdio"); # my $html = get("http://dx.doi.org/10.1007/s00601-015-1012-x") # or die "Couldn't fetch the Perl Cookbook's page."; # print "$html"; #opening the file to write string with xml open my $overwrite, '>', 'overwrite.xml' or die "error trying +to overwrite: $!"; #writing string with xml to the file say $overwrite "$content"; #A little system just to get the title #($title) = $content =~ <h1 class="svTitle" id="ti0010">(.*?)< +/h1>; #print "Title of article: $title"; #my $tree = HTML::TreeBuilder->new; #$tree ->parse_file("overwrite.txt"); #foreach my $h1 ($tree->find('h1')){ #print $h1->as_text, "<br />"; #} close $overwrite; say "<h1>And here's the site:</h1>"; print "$content"; #string with our site say $q->end_html();
    I got the XML from ScienceDirect thanks to this: agent => 'MyApp/0.1', but even when I give this paramater to agent while connecting to Link Springer, then the situation is the same like I explained in the first post.

      I got the XML from ScienceDirect thanks to this: agent => 'MyApp/0.1', but even when I give this paramater to agent while connecting to Link Springer, then the situation is the same like I explained in the first post.

      Yes, and then what happened?

      Its like you order a drink from a bartender while handing over some pesos. Bartender only response is we don't take pesos

      The website is telling you "i don't like that"

      I'm of the opinion, that if a website does that, and you can't figure out a way around it -- well, you should listen to the website

        Ok, You're totally right. That's why I decided to not use ScienceDirect but link springer, and now the problem is different I think.

        I was late with editing one of my comments. Actually i tried two doi of two articles on Link springer(without "agent=> 'MyApp/0.1'").

        http://dx.doi.org/10.1007/s00601-015-1012-x and http://dx.doi.org/10.1007/BF02579652

        The first work doesn't work like I said, and the second one is working perfectly. They both are redirecting to Link Springer, and in both cases I didn't use any way around - just simple getting xml.