Here's my attempt. It includes a slowdown to prevent overloading their server.

UPDATE: This does not work completely. I found that some of the models have their pictures broken into 2 sets, on with a _behind added to the filename. My script skips those, and I guess you would need to scrape the links off of the various thumbnail pages to get all of them. But this will get you at least 300 photos, if that ain't enough..... you've got a problem :-)

UPDATE2: I fixed the script to switch filenames to get them all

300 wasn't enough for me. :-)

#!/usr/bin/perl use warnings; use strict; use LWP::UserAgent; my $a = int rand(9); my $a1 = int rand(9); #my $agent = "Mozilla/1.$a.$a1 (compatible; MSIE; NT 6.0 )"; my $agent = "Mozilla/1.7.11 (compatible; MSIE; NT 6.0 )"; my $ua = LWP::UserAgent->new( env_proxy => 0, timeout => 50, keep_alive => 1, agent => $agent, ); my $URL = 'http://i.a.cnn.net/si/features/2007_swimsuit/images/photos +/'; while (<DATA>) { chomp $_; my ($file,$start,$end) = split ' ',$_; my $sstart = sprintf('%.2d', $start); my $bcount = 1; # for filename switch # there is a total number of photos, but some models # change filename by adding _behind and resetting the count to 01 # this checks for normal filename, then switches to the _behind # filenames if needed for my $x ($sstart..$end){ my $expected_length; my $bytes_received = 0; my $geturl = $URL.$file."_$x.jpg"; my $filename = $file."_$x.jpg"; my $result = $ua->head($geturl); if ( $result->is_success ) { print "correct filename\n"; }else{ print "changed filename\n"; my $sbcount = sprintf('%.2d', $bcount); $geturl = $URL.$file."_behind_$sbcount.jpg"; $filename = $file."_behind_$sbcount.jpg"; $bcount++; } $result = $ua->head($geturl); if ( $result->is_success ) { open( IN, ">$filename" ) or warn $!; binmode(IN); my $response = $ua->request( HTTP::Request->new( GET => $geturl ), sub { my ( $chunk, $res ) = @_; $bytes_received += length($chunk); unless ( defined $expected_length ) { $expected_length = $res->content_length || 0; } if ($expected_length) { printf STDERR "%d%% - ", 100 * $bytes_received / $expected_length; } print STDERR "$bytes_received bytes received $filename +\r"; print IN $chunk; } ); print $response->status_line, "\n"; }else{ print "$filename ", $result->status_line, "\n" } close IN; sleep( 1 + rand(5) ); } } __DATA__ 07_brazil_group 1 13 07_aaraujo 1 24 07_abarros 1 21 07_beyonce 1 37 07_irina 1 39 07_bdecker 1 41 07_ydiazrahi 1 21 07_sebanks 1 51 07_jhenderson 1 26 07_mmiller 1 47 07_fmotta 1 21 07_anakashima 1 18 07_roliveira 1 18 07_oonweagba 1 33 07_tpraver 1 36 07_brefaeli 1 42 07_dsarahyba 1 21 07_ftavares 1 24 07_ytoscanini 1 40 07_av 1 67 07_vvarekova 1 25 07_jwhite 1 29 07_beyonce 1 37

Replies are listed 'Best First'.
Re: 2007-swimsuit-w-lwp
by cog (Parson) on Feb 15, 2007 at 16:39 UTC
    this will get you at least 300 photos, if that ain't enough..... you've got a problem

    I fixed the script ... to get them all. 300 wasn't enough for me

    Am I right to assume you have a problem? O:-)

      Yeah, I'm a member of the Church of Earthly Indulgence. I can't go to heaven until I get all this out of my system. :-)

      I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: 2007-swimsuit-w-lwp
by merlyn (Sage) on Feb 15, 2007 at 17:29 UTC
    Mine gets 830, although it misses the musicians (which seemed pointless and distracting) as well as the videos. I stared at the video scraping for about 15 minutes and went blind. So I'll not be doing that.

      If you assume people are lazy, and that they reuse the same file names for lots of things, it makes it easy to spot the pattern in less then 15 minutes

      There doesn't seem to be any truely downloadable formats of the videos, but here's a start at getting the URLs for the text containers that identify where the streams are for a WMV capable player...

      #!/usr/bin/perl -l use strict; use warnings; use LWP::Simple; my $base = "http://sportsillustrated.cnn.com"; my $path = "swimsuit/2007/02/03"; my $vidbase = "http://wmscnn.stream.aol.com.edgestreams.net/cnnsi"; my $vid_index = get "$base/features/2007_swimsuit/video/"; while ($vid_index =~ m{(/features/2007_swimsuit/video/.*?.html)}g) { my $page = get "$base/$1"; if ($page and $page =~ m{/video/$path/(.*?)/include\.js}) { my $code = $1; print "$vidbase/$path/$code/video.ws.wmv"; } }
      The only reason I posted mine, was to spur you to post a better one. You are a hard man to whip. :-)

      I'm not really a human, but I play one on earth. Cogito ergo sum a bum