Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Source Grabber

by Andrew_Levenson (Hermit)
on Mar 29, 2006 at 23:48 UTC ( [id://540080]=perlquestion: print w/replies, xml ) Need Help??

Andrew_Levenson has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing twin programs; one to store the source codes of a list of websites, and the other to use the stored info to tell me whether or not the website has been updated since last stored. I can't get the storing program to work, though, and can't figure out why.
use strict; use warnings; use LWP::Simple; my @list; my $source; my $i; my $file="C:/Documents and Settings/Dad/Desktop/url checker.txt"; open FILE, "<$file"; while(<FILE>){ push @list, $_; } close FILE; my $length=length(@list); for($i=0; $i<=$length; $i++){ my $j=$i+1; $source=get($list[$i]); open LOG, ">C:/Documents and Settings/Dad/Desktop/Websites/Website + Source[$j].txt"; print LOG $source; close LOG; }
I get the error "can't print on closed filehandle..." Thanks in advanced.

-edit
Now it works fine except for that it only logs the sources of 3 websites, out of a list of 13.
use strict; use warnings; use LWP::Simple; my @list; my $source; my $i; my $file="C:/Documents and Settings/Dad/Desktop/url checker.txt"; open FILE, "<$file" || die "Can't open the list of urls! \n"; while(<FILE>){ push @list, $_; } close FILE; my $length=length(@list); for($i=0; $i<=$length; $i++){ my $j=$i+1; $source=get($list[$i]); open LOG, ">C:/Documents and Settings/Dad/Desktop/Websites/Website + Source $j.txt" || die "Can't open website sources.txt: $!\n"; print LOG $source; close LOG; }
Any ideas?

Replies are listed 'Best First'.
Re: Source Grabber
by diotalevi (Canon) on Mar 30, 2006 at 01:59 UTC

    You wrote my $length=length(@list); and this is completely wrong. You should have written my $length = @list; instead. Here's what you got: length() puts its argument in scalar context and evaluates it as a string. An array in scalar context returns the number of elements it contains. A number that's used like a string is converted to a string. length() returns the number of characters in its string. So if you had ten elements in the list, you were doing the equivalent of length("10") which is 2.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      That was it. I changed exactly what you said and it works perfectly now.
      Thanks tons.

      BTW, the corresonding function to C<length()> is <scalar()> for a list (which forces scalar context, as already stated above).

      In case not already clear, this was meant for Andrew_Levenson.

        scalar() for a list

        No. scalar() imposes scalar context for whatever expression is its argument regardless of whether that expression is a list, array, scalar, function call, etc.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Source Grabber
by Cody Pendant (Prior) on Mar 30, 2006 at 00:42 UTC
    Don't know why you're having problems, but that's because we can't see your @list.

    You're trying to open a file with a name found in $list[$j] and it may have illegal characters in it for a start, like a slash. So if the first three succeed, what's in the fourth?

    Looking at your whole $i and $j business gets me worried. $i is your incrementing variable, and $j is the name of the file you want to write?

    But if you're only incrementing $i by one every time, aren't you going to attempt to get a website at a url which is really a filename next time? Like, if your file contains "http://www.yahoo.com" then "Yahoo", you'd get "http://www.yahoo.com" and write it to a file called "Yahoo.txt" on the first run of the loop, but then the next time you're going to try and get "Yahoo" as a URL, which won't work.

    Maybe you're doing some kind of poor-man's hash kind of thing? The key is in $list[$i] and the value is in $list[$j]? In which case, just use a hash!

    And, even more to the point, this is a very inefficient way to test if a website has changed, if you're really just going for "either it's changed or it hasn't". Just store the length of the content you get from LWP. If it's different, the site has changed.



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: Source Grabber
by Cody Pendant (Prior) on Mar 30, 2006 at 01:07 UTC
    I think this is more Perlish and neater:
    use strict; use warnings; use LWP::Simple; while (<DATA>) { my ( $url, $name ) = split( '\s+', $_, 2 ); # the 2 on the end stops it splitting on the space in "NY Times" chomp($name); print "Getting $name from URL $url\n"; my $source = get($url) || die "Can't get website $name from $url: $!"; open( LOG, ">$name.txt" ) || die "Can't write to file $name.txt: $!\n"; print LOG $source; close LOG; } __DATA__ http://www.yahoo.com/ Yahoo http://www.cnn.com/ CNN http://www.google.com/ Google http://www.nytimes.com/ NY Times


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
Re: Source Grabber
by gri6507 (Deacon) on Mar 29, 2006 at 23:59 UTC
    I don't see anything obvious right away, but you can modify your code to give more information. Normally, when you open a file, you want to make sure it was successful:
    open(LOG,">$mylogfile) || die "Can't open $mylogfile: $!\n";
      Okay, I understand that, it makes sense. Just one question. What is  $! and what does it do?
        From perldoc perlvar:

        "$! If used numerically, yields the current value of the C "errno" variable...If used as a string, yields the corresponding system error string."

Re: Source Grabber
by wazoox (Prior) on Mar 30, 2006 at 11:02 UTC

    A couple of advices :

    • for($i=0; $i<=$length; $i++){ isn't very perlish; you could use foreach my $url (@list) { instead.
    • You'd better use open(...) or die "...". It doesn't matter with open but in many cases || gets precedence over other operators and cause your program to fail unexpectedly.
    • Well actually, you don't really need looping twice, once to read the file, and twice to get the url; you could easily do it all in the same loop.

    So we could have:

    #!/usr/bin/perl use strict; use warnings; use LWP::Simple; my $counter=1; my $file="C:/Documents and Settings/Dad/Desktop/url checker.txt"; open FILE, "<$file" or die "Can't open the list of urls! \n"; while(<FILE>){ my $source=get($_); open LOG, ">C:/Documents and Settings/Dad/Desktop/Websites/Website + Source $counter.txt" or die "Can't open website sources.txt: $!\n"; print LOG $source; close LOG; $counter++; } close FILE; close LOG;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://540080]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found