in reply to Reading an XML page

Without having tried your script (and therefore guessing), but what happens if you comment out the print <$sock>; line just above the while loop?

I suspect that the print is 'draining' the socket, and by the time you reach the while loop, there is nothing left to read, so it hangs waiting for input which never arrives?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: Re: Reading an XML page
by Guildencrantz (Sexton) on May 28, 2003 at 04:41 UTC
    If only it were that simple. I actually posted an incorrect version of this script, my appologies. The script that I should have posted follows: (I had placed a number of print statements through the previous post which I had used to try and find the problem)

    #!/usr/bin/perl -wT use strict; use IO::Socket; use HTML::Entities; require './libraryCommon.pl'; unless (@ARGV) { die "Exiting without parameter. (HINT: You need to p +ass the ISBN)\n"; } unless (testISBN($ARGV[0])) { my $host="xml.amazon.com"; my $port=80; my $buff=""; my $line; my $title=""; my $date=""; my $manufacturer=""; my $author=""; my @author; my $count=0; my $isbn=$ARGV[0]; my $getBook="GET http://$host/onca/xml2?t=webservices-20&dev-t=D3N +1ICFCFE4DHV&AsinSearch=$isbn&type=lite&f=xml HTTP/1.0\ \n\n"; my $sock=new IO::Socket::INET(PeerAddr => $host, PeerPort => $port +, Proto => 'tcp') or die "Couldn't connect to $host"; $sock->autoflush(1); print $sock $getBook; $count=0; while (<$sock>) { decode_entities($_); if (m%<ProductName>(.*?)</ProductName>%) { $title = $1 } elsif (m%<Author>(.*?)</Author>%) { $author[$count]=$1; $count++; } elsif (m%<ReleaseDate>.*, (\d\d\d\d)</ReleaseDate>%) { $date=$ +1; } elsif (m%<Manufacturer>(.*?)<\/Manufacturer>%) { $manufacturer +=$1; } } close($sock); print $title, "\n"; foreach $author (@author) { print $author, "\n" }; print $date, "\n"; print $manufacturer, "\n"; } else { die "Exiting due to the fact that you have supplied an invalid +ISBN."; }

      Not sure if it will help you, but LWP::Simple seems to work ok.

      D:\Perl\test>perl58 -mLWP::Simple=getprint -e" getprint 'http://xml.am +azon.com/onca/xml2?t=webservices-20&dev-t=D3N1ICFCFE4DHV&AsinSearch=1 +565924193&type=lite&f=xml'" <?xml version="1.0" encoding="UTF-8"?><ProductInfo xmlns:xsi="http://w +ww.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="htt +p://xml.amazon.com/schemas2/dev-lite.xsd"> <Details url="http://www.amazon.com/exec/obidos/redirect?tag=webser +vices-20%26creative=D3N1ICFCFE4DHV%26camp=2025%26link_code=xm2%26path +=ASIN/1565924193"> <Asin>1565924193</Asin> <ProductName>CGI Programming with Perl</ProductName> <Catalog>Book</Catalog> <Authors> <Author>Gunther Birznieks</Author> <Author>Scott Guelich</Author> <Author>Shishir Gundavaram</Author> </Authors> <ReleaseDate>15 January, 2000</ReleaseDate> <Manufacturer>O'Reilly &amp; Associates</Manufacturer> <ImageUrlSmall>http://images.amazon.com/images/P/1565924193.01.T +HUMBZZZ.jpg</ImageUrlSmall> <ImageUrlMedium>http://images.amazon.com/images/P/1565924193.01. +MZZZZZZZ.jpg</ImageUrlMedium> <ImageUrlLarge>http://images.amazon.com/images/P/1565924193.01.L +ZZZZZZZ.jpg</ImageUrlLarge> <ListPrice>$34.95</ListPrice> <OurPrice>$24.47</OurPrice> <UsedPrice>$13.99</UsedPrice> </Details> </ProductInfo>

      I tried the latest version of your script you posted, and it too hung on the read from the socket. I can't see why either.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        Thanks a lot for the help. I really don't know what happened with the script: yesterday it worked, today it didn't. Very strange. It seems almost like Amazon's server was keeping the connection alive.

        Well, if LWP is working that's what I'll rewrite the script to use. Again, I appreciate the input.

        ~~Guildencrantz