Re: Getting XML from DOI address

Replies are listed 'Best First'.
Re^2: Getting XML from DOI address by hooel (Initiate) on Jul 16, 2015 at 00:10 UTC
Ok, so this is the whole code: #!"E:\xamp\perl\bin\perl.exe" -T use 5.010; use CGI; use strict; use warnings; use LWP::UserAgent; use LWP::Simple; use HTML::TreeBuilder; my $q = CGI->new(); my $userag = LWP::UserAgent->new(timeout=>30); #it's illegal :( + p.s. to make what they say, delete argument from "new" +"agent => 'MyApp/0.1'" say $q->header(), $q->start_html(); my $addressdio = ""; #getting the address from form for my $param ($q->param()) { my $safe_param = $q->escapeHTML($param); say "<p><strong>$safe_param</strong>: "; for my $value ($q->param($param)) { say $q->escapeHTML($value); $addressdio = $q->escapeHTML($value); } say '</p>'; } #the whole process of getting xml from site and giving it into + the string variable #my $dioaddress = "http://dx.doi.org/10.1016/j.nuclphysa.2015. +05.005"; my $reqforxml = new HTTP::Request GET => $addressdio; my $res = $userag->request($reqforxml); my $content = $res->content; # my $content = get("$addressdio"); # my $html = get("http://dx.doi.org/10.1007/s00601-015-1012-x") # or die "Couldn't fetch the Perl Cookbook's page."; # print "$html"; #opening the file to write string with xml open my $overwrite, '>', 'overwrite.xml' or die "error trying +to overwrite: $!"; #writing string with xml to the file say $overwrite "$content"; #A little system just to get the title #($title) = $content =~ <h1 class="svTitle" id="ti0010">(.*?)< +/h1>; #print "Title of article: $title"; #my $tree = HTML::TreeBuilder->new; #$tree ->parse_file("overwrite.txt"); #foreach my $h1 ($tree->find('h1')){ #print $h1->as_text, "<br />"; #} close $overwrite; say "<h1>And here's the site:</h1>"; print "$content"; #string with our site say $q->end_html(); [download] I got the XML from ScienceDirect thanks to this: agent => 'MyApp/0.1', but even when I give this paramater to agent while connecting to Link Springer, then the situation is the same like I explained in the first post.	[reply] [d/l]
Re^3: Getting XML from DOI address by Anonymous Monk on Jul 16, 2015 at 02:33 UTC
I got the XML from ScienceDirect thanks to this: agent => 'MyApp/0.1', but even when I give this paramater to agent while connecting to Link Springer, then the situation is the same like I explained in the first post. Yes, and then what happened? Its like you order a drink from a bartender while handing over some pesos. Bartender only response is we don't take pesos The website is telling you "i don't like that" I'm of the opinion, that if a website does that, and you can't figure out a way around it -- well, you should listen to the website	[reply]
Re^4: Getting XML from DOI address by hooel (Initiate) on Jul 16, 2015 at 05:46 UTC
Ok, You're totally right. That's why I decided to not use ScienceDirect but link springer, and now the problem is different I think. I was late with editing one of my comments. Actually i tried two doi of two articles on Link springer(without "agent=> 'MyApp/0.1'"). `http://dx.doi.org/10.1007/s00601-015-1012-x and http://dx.doi.org/10.1007/BF02579652` [download] The first work doesn't work like I said, and the second one is working perfectly. They both are redirecting to Link Springer, and in both cases I didn't use any way around - just simple getting xml.	[reply] [d/l]