in reply to Re^5: How to Save Fetched Web Files as "path/$string.xml"
in thread How to Save Fetched Web Files as "path/$string.xml"
Thanks again for your patience with this. I believe that I have successfully made the code strict with the help of others here. I just want to re-iterate what I am trying to achieve:
In reference to the LWP module-based code excerpt below, I am trying to pull the files represented by my $plyrurl (line 24) below and save them as "$outputdir/$game_$players" represented by my $filename(line 31) (so a file might save as "$outputdir/gid_2007_08_06_quiaaa_yucaaa_1_112039.xml"). One point of confusion for me is that the original code author pulls content from 2 pages on his way to the destination page (line 9 and 22). This type of code is called a "spider" so maybe the code needs to pull data from each page on its path to the destination page? I just think it would be easier to find a solution with simpler code if anything can be taken out!
That said, I tried to adapt your original reply using the batters example to this code but am getting lost with the character class abbreviations. Because of that, I am trying to concatenate the strings instead. The current code here is returning a "could not open file ./xxx_players/gid_2007_08_06_quiaaa_yucaaa_1/_112039.xml No such file or directory" error. Shouldn't the last part not have the backslash there before '_112039'?
my $sourceurl = "http://gd2.mlb.com/components/game/aaa"; my $outputdir = "./xxx_players"; my $dayurl = "$sourceurl/year_$year/month_$mon/day_$mday/"; print "\t$dayurl\n"; my $response = $browser->get($dayurl); die "Couldn't get $dayurl: ", $response->status_line, "\n" unless $response->is_success; my $html = $response->content; my @games = @_; while($html =~ m/<a href=\"(gid_\w+\/)\"/g ){ push @games, $1;} # the loop that downloads data my $game; foreach $game (@games) { my $gameurl = "$dayurl/$game"; $response = $browser->get($gameurl); die "Couldn't get $gameurl: ", $response->status_line, "\n" unless $response->is_success; my $gamehtml = $response->content; if($gamehtml =~ m/<a href=\"players\.xml\"/ ) { my $plyrurl = "$dayurl/$game/players.xml"; $response = $browser->get($plyrurl); die "Couldn't get $plyrurl: ", $response->status_line, "\n" unless $response->is_success; my $plyrhtml = $response->content; my $players = 'players.xml'; my $filename = "$outputdir/$game" . "$players"; print "\t\tfetching game: ${game}_$players\n"; open(FILEHANDLE, ">","$filename" or die "could not open file $filename $!\n"; print FILEHANDLE "$game" . "$filename"; close FILEHANDLE; } else {my $players = 'players.xml'; print "warning: no player list for $game . $players\n"; }
|
|---|