Re^5: How to Save Fetched Web Files as "path/$string.xml"

Always use strict;. If that reports errors you don't understand, reduce the code to a simple example that shows the error and ask here about it. There are very few situations where you are ever likely to need to turn strictures off, and even for most of those there are likely to be better ways to achieve what you want.

Avoid referring to code on your scratch pad - it's transient but the node that referees to it will be around for a while. Include the code with your node.

In your scratch pad code you have:

print "\t\tfetching game: $game\n";
open(FILEHANDLE, ">","$outputdir/players.xml")
    or die "could not open file $game/players.xml: $|\n";    
print FILEHANDLE "$game\n";
[download]

which confuses me because you say you are opening a different file than that which you pass into the open. I'd be inclined to do something like:

my $filename = "$outputdir/players.xml";
print "\t\tfetching game: $game\n";
open(FILEHANDLE, ">", $filename)
    or die "could not open file $filename: $!\n";    
print FILEHANDLE "$game\n";
[download]

BTW, $! is the special variable that contains the last OS error.

DWIM is Perl's answer to Gödel

Comment on Re^5: How to Save Fetched Web Files as "path/$string.xml" Select or Download Code

Replies are listed 'Best First'.
Re^6: How to Save Fetched Web Files as "path/$string.xml" by nase (Novice) on Aug 23, 2007 at 06:53 UTC
Thanks again for your patience with this. I believe that I have successfully made the code strict with the help of others here. I just want to re-iterate what I am trying to achieve: In reference to the LWP module-based code excerpt below, I am trying to pull the files represented by my $plyrurl (line 24) below and save them as "$outputdir/$game_$players" represented by my $filename(line 31) (so a file might save as "$outputdir/gid_2007_08_06_quiaaa_yucaaa_1_112039.xml"). One point of confusion for me is that the original code author pulls content from 2 pages on his way to the destination page (line 9 and 22). This type of code is called a "spider" so maybe the code needs to pull data from each page on its path to the destination page? I just think it would be easier to find a solution with simpler code if anything can be taken out! That said, I tried to adapt your original reply using the batters example to this code but am getting lost with the character class abbreviations. Because of that, I am trying to concatenate the strings instead. The current code here is returning a "could not open file ./xxx_players/gid_2007_08_06_quiaaa_yucaaa_1/_112039.xml No such file or directory" error. Shouldn't the last part not have the backslash there before '_112039'? my $sourceurl = "http://gd2.mlb.com/components/game/aaa"; my $outputdir = "./xxx_players"; my $dayurl = "$sourceurl/year_$year/month_$mon/day_$mday/"; print "\t$dayurl\n"; my $response = $browser->get($dayurl); die "Couldn't get $dayurl: ", $response->status_line, "\n" unless $response->is_success; my $html = $response->content; my @games = @_; while($html =~ m/<a href=\"(gid_\w+\/)\"/g ){ push @games, $1;} # the loop that downloads data my $game; foreach $game (@games) { my $gameurl = "$dayurl/$game"; $response = $browser->get($gameurl); die "Couldn't get $gameurl: ", $response->status_line, "\n" unless $response->is_success; my $gamehtml = $response->content; if($gamehtml =~ m/<a href=\"players\.xml\"/ ) { my $plyrurl = "$dayurl/$game/players.xml"; $response = $browser->get($plyrurl); die "Couldn't get $plyrurl: ", $response->status_line, "\n" unless $response->is_success; my $plyrhtml = $response->content; my $players = 'players.xml'; my $filename = "$outputdir/$game" . "$players"; print "\t\tfetching game: ${game}_$players\n"; open(FILEHANDLE, ">","$filename" or die "could not open file $filename $!\n"; print FILEHANDLE "$game" . "$filename"; close FILEHANDLE; } else {my $players = 'players.xml'; print "warning: no player list for $game . $players\n"; } [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^6: How to Save Fetched Web Files as "path/$string.xml"
by nase (Novice) on Aug 23, 2007 at 06:53 UTC

Thanks again for your patience with this. I believe that I have successfully made the code strict with the help of others here. I just want to re-iterate what I am trying to achieve:

In reference to the LWP module-based code excerpt below, I am trying to pull the files represented by my $plyrurl (line 24) below and save them as "$outputdir/$game_$players" represented by my $filename(line 31) (so a file might save as "$outputdir/gid_2007_08_06_quiaaa_yucaaa_1_112039.xml"). One point of confusion for me is that the original code author pulls content from 2 pages on his way to the destination page (line 9 and 22). This type of code is called a "spider" so maybe the code needs to pull data from each page on its path to the destination page? I just think it would be easier to find a solution with simpler code if anything can be taken out!

That said, I tried to adapt your original reply using the batters example to this code but am getting lost with the character class abbreviations. Because of that, I am trying to concatenate the strings instead. The current code here is returning a "could not open file ./xxx_players/gid_2007_08_06_quiaaa_yucaaa_1/_112039.xml No such file or directory" error. Shouldn't the last part not have the backslash there before '_112039'?

my $sourceurl = "http://gd2.mlb.com/components/game/aaa";
my $outputdir = "./xxx_players";
my $dayurl = "$sourceurl/year_$year/month_$mon/day_$mday/";
    print "\t$dayurl\n";
    
my $response = $browser->get($dayurl); 
    die "Couldn't get $dayurl: ", $response->status_line,   
        "\n" unless $response->is_success;
my $html = $response->content;
my @games = @_;
    while($html =~ m/<a href=\"(gid_\w+\/)\"/g ){  
        push @games, $1;}

# the loop that downloads data
my $game; 
foreach $game (@games) {
    my $gameurl = "$dayurl/$game";
    $response = $browser->get($gameurl);
        die "Couldn't get $gameurl: ",                
            $response->status_line, "\n" unless         
                $response->is_success;
    my $gamehtml = $response->content;
    if($gamehtml =~ m/<a href=\"players\.xml\"/ ) {
        my $plyrurl = "$dayurl/$game/players.xml";
            $response = $browser->get($plyrurl);
        die "Couldn't get $plyrurl: ",  
                    $response->status_line, "\n"
                unless $response->is_success;
        my $plyrhtml = $response->content;
        my $players = 'players.xml';
        my $filename = "$outputdir/$game" . "$players";
        print "\t\tfetching game: ${game}_$players\n";
        open(FILEHANDLE, ">","$filename"
                or die "could not open file $filename $!\n";    
        print FILEHANDLE "$game" . "$filename";
        close FILEHANDLE;
    } else 
       {my $players = 'players.xml';
       print "warning: no player list for $game . $players\n";
    }
[download]

[reply]
[d/l]