I have been trying to harvest metadata using OAI-PMH protocol and also using a perl based harvester called 'Harvey' but I seem to be stuck at the below issue.
My last resumptionToken is "resumptionToken=2019-01-22T00:51:30Z!2037-01-01T00:00:00Z!!oai_dc!6045389!7351939!oai:union.ndltd.org:IBICT/aoi:localhost:jspui/2251" please let me know where I should add this on the Perl code for the process to continue? On the Perl code below, I see the variable resumptionToken at 4 places (Lines: 12, 21, 52,57).
$| = 1;
my $baseURL = 'http://union.ndltd.org/OAI-PMH/';
my $filename = 'aaaaafna';
my $resumptionToken = '2019-01-22T00:51:30Z!2037-01-01T00:00:00Z!!oai_
+dc!6045389!7351939!oai:union.ndltd.org:IBICT/aoi:localhost:jspui/2251
+';
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
# before running this script, execute:
# export http_proxy=http://localhost:<port>/ where <port> is your
+cntlm port
$ua->env_proxy();
do {
my $reqURL = $baseURL.'?verb=ListRecords&'.(($ eq '')?'metadataPref
+ix=oai_dc':'resumptionToken='.$resumptionToken);
# my $reqURL = $baseURL.'?verb=Identify';
my $req = HTTP::Request->new( GET => $reqURL );
print "Harvesting $reqURL\n";
my $state = 0;
my $res;
while ($state == 0)
{
$res = $ua->request($req);
if ($res->code == 503)
{
my $sleep = $res->header ('Retry-After');
if (not defined ($sleep) || ($sleep < 0) || ($sleep > 86400))
{ $state = 1;}
else
{
print "Sleeping for $sleep seconds\n";
sleep ($sleep);
}
}
else
{ $state = 1; }
}
my $content = $res->content;
my $records = (split (/<metadata>/, $content))-1;
print "Saving response with $records records to $filename.xml\n";
open (FILE, ">$filename.xml"); print FILE $content; close (FILE);
$filename++;
$resumptionToken = '';
if ($content =~ /<resumptionToken[^>]*>([^<]+)<\/resumptionToken>/)
{
$resumptionToken = $1;
}
} while ($resumptionToken ne '');
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|