Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

US Library of Congress perl module

by eg (Friar)
on Dec 10, 2000 at 13:24 UTC ( [id://45931]=sourcecode: print w/replies, xml ) Need Help??
Category: Web Stuff
Author/Contact Info eg
Description:

A perl module to access the library of congress' book database.

Unlike Amazon or Barnes and Noble, you can't just look up a book in the Library of Congress database with an ISBN; you need to first initialize a session with their web server. That's all this module does, it initializes a session for you and returns either a url of a book's web page or a reference to a hash containing that book's data (author, title, etc.)

package LOC;

use strict;
use CGI;  
use IO::Socket;

sub z3950_data {
        my $isbn = shift();

        local $/ = undef;
        my $ua = IO::Socket::INET->new(
                Proto    => 'tcp',
                PeerAddr => 'lcweb.loc.gov',
                PeerPort => 'http(80)', );

        print $ua "GET " . z3950_url($isbn) . "\x0d\x0a\x0d\x0a";

        my ($raw_data) = <$ua> =~ /<pre>(.*?)<\/pre>/is;
        return undef unless ( defined($raw_data) );

        my %data = ();
        my $last = 'UNKNOWN';
        foreach my $line ( split(/\n/, $raw_data ) ) {
                chomp( $line );
                $line =~ s/\s+/ /g;
                if ( my ($key, $value) = $line =~ /^([^:]+): (.*)/ ) {
                   $data{$key} .= $value;
                   $last = $key;
                }
                else {
                   $data{$last} .= $line;
                }
        }

        return \%data;
}

sub z3950_html {
        my $isbn = shift();

        my $data = z3950_data( $isbn );
        if ( !defined($data) ) {
                return "<font color='#990000'>z3950_html: Can't get LO
+C data</font>";
        }

        return "<pre>", join("\n", map { "$_ -> $$data{$_}" } keys( %$
+data )),
                "</pre>\n";
}

sub z3950_url {
        my $isbn = shift();
        my $sid = undef;

        local $/ = undef;
        my $ua = IO::Socket::INET->new(
                Proto    => 'tcp',
                PeerAddr => 'lcweb.loc.gov',
                PeerPort => 'http(80)', );

        print $ua "GET /cgi-bin/zgate?ACTION=INIT\&FORM_HOST_PORT=".
                "/prod/www/data/z3950/locils.html,z3950.loc.gov,7090".
                "\x0d\x0a\x0d\x0a";

        ($sid) = <$ua> =~ /NAME="SESSION_ID"\s+VALUE="(\d+)"/i;

        return undef unless ( defined($sid) );

        return 
                "http://lcweb.loc.gov/cgi-bin/zgate?".
                "ESNAME=F&".
                "ACTION=SEARCH&".
                "DBNAME=VOYAGER&".
                # "MAXRECORDS=20&".
                # "RECSYNTAX=1.2.840.10003.5.10&".
                # "REINIT=" . CGI::escape("/cgi-bin/zgate?ACTION=INIT&
+FORM_HOST_PORT=/prod/www/data/z3950/locils.html,z3950.loc.gov,7090") 
+. "&" .
                "TERM_1=$isbn&".
                "USE_1=7&".
                "SESSION_ID=$sid".
                "";
}

1;

=pod

=head1 TITLE

LOC.pm - an interface to the Library of Congress' book database

=head1 SYNOPSIS

To redirect from a web page:

        use CGI;
        use LOC;

        my $cgi = new CGI;
        my $isbn = $cgi->param('isbn');
        print $cgi->redirect( LOC::z3950_url($isbn) );

To get the data for a certain book:

        use LOC;

        my $data = LOC::z3950_data( $isbn );
        foreach my $key ( keys(%$data) ) {
                print "$key: $$data{$key}\n";
        }

=head1 DESCRIPTION

The Library of Congress' web-interface to their book database is screw
+y.
You just can't find the isbn and plug it into a simple url.  No, you
need to initialize a session first, and then plug the isbn into a simp
+le
url.  Oh well.  So this module initializes a session and redirects you
to the right url.  Or, you can just grab the data from the LOC and pre
+sent
it in whatever form you want.

=head1 FUNCTIONS

=over 4

=item \%hash z3950_data( $isbn ) 

Given an ISBN, return a reference to a hash with the data downloaded f
+rom
the Library of Congress.  The keys to the hash are the data field name
+s.

=item $html z3950_html( $isbn )

Dump out the data from z3950_data as HTML.  Sort of.  It's just plain
text with <pre> tags around it.

=item $url z3950_url( $isbn )

The url that will get the LOC page for this ISBN.

=back

=cut
Replies are listed 'Best First'.
Re: US Library of Congress perl module
by Anonymous Monk on Nov 10, 2010 at 20:15 UTC
    Thank you for this! perl hasn't changed much in 10 years. This still works with two changes. First, the website "lcweb.loc.gov" is now "www.loc.gov". Second, for the return in the html subroutine you need periods (.) instead of commas (,) to concatinate the join to the "pre" statements.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://45931]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-18 23:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found