2007-swimsuit-w-lwp

Here's my attempt. It includes a slowdown to prevent overloading their server.

UPDATE: This does not work completely. I found that some of the models have their pictures broken into 2 sets, on with a _behind added to the filename. My script skips those, and I guess you would need to scrape the links off of the various thumbnail pages to get all of them. But this will get you at least 300 photos, if that ain't enough..... you've got a problem :-)

UPDATE2: I fixed the script to switch filenames to get them all

300 wasn't enough for me. :-)

#!/usr/bin/perl use warnings; use strict; use LWP::UserAgent; my $a = int rand(9); my $a1 = int rand(9); #my $agent = "Mozilla/1.$a.$a1 (compatible; MSIE; NT 6.0 )"; my $agent = "Mozilla/1.7.11 (compatible; MSIE; NT 6.0 )"; my $ua = LWP::UserAgent->new( env_proxy => 0, timeout => 50, keep_alive => 1, agent => $agent, ); my $URL = 'http://i.a.cnn.net/si/features/2007_swimsuit/images/photos +/'; while (<DATA>) { chomp $_; my ($file,$start,$end) = split ' ',$_; my $sstart = sprintf('%.2d', $start); my $bcount = 1; # for filename switch # there is a total number of photos, but some models # change filename by adding _behind and resetting the count to 01 # this checks for normal filename, then switches to the _behind # filenames if needed for my $x ($sstart..$end){ my $expected_length; my $bytes_received = 0; my $geturl = $URL.$file."_$x.jpg"; my $filename = $file."_$x.jpg"; my $result = $ua->head($geturl); if ( $result->is_success ) { print "correct filename\n"; }else{ print "changed filename\n"; my $sbcount = sprintf('%.2d', $bcount); $geturl = $URL.$file."_behind_$sbcount.jpg"; $filename = $file."_behind_$sbcount.jpg"; $bcount++; } $result = $ua->head($geturl); if ( $result->is_success ) { open( IN, ">$filename" ) or warn $!; binmode(IN); my $response = $ua->request( HTTP::Request->new( GET => $geturl ), sub { my ( $chunk, $res ) = @_; $bytes_received += length($chunk); unless ( defined $expected_length ) { $expected_length = $res->content_length || 0; } if ($expected_length) { printf STDERR "%d%% - ", 100 * $bytes_received / $expected_length; } print STDERR "$bytes_received bytes received $filename +\r"; print IN $chunk; } ); print $response->status_line, "\n"; }else{ print "$filename ", $result->status_line, "\n" } close IN; sleep( 1 + rand(5) ); } } __DATA__ 07_brazil_group 1 13 07_aaraujo 1 24 07_abarros 1 21 07_beyonce 1 37 07_irina 1 39 07_bdecker 1 41 07_ydiazrahi 1 21 07_sebanks 1 51 07_jhenderson 1 26 07_mmiller 1 47 07_fmotta 1 21 07_anakashima 1 18 07_roliveira 1 18 07_oonweagba 1 33 07_tpraver 1 36 07_brefaeli 1 42 07_dsarahyba 1 21 07_ftavares 1 24 07_ytoscanini 1 40 07_av 1 67 07_vvarekova 1 25 07_jwhite 1 29 07_beyonce 1 37
[download]

Comment on 2007-swimsuit-w-lwp
Download Code

Replies are listed 'Best First'.
Re: 2007-swimsuit-w-lwp by cog (Parson) on Feb 15, 2007 at 16:39 UTC
this will get you at least 300 photos, if that ain't enough..... you've got a problem I fixed the script ... to get them all. 300 wasn't enough for me Am I right to assume you have a problem? O:-)	[reply]
Re^2: 2007-swimsuit-w-lwp by zentara (Cardinal) on Feb 15, 2007 at 16:55 UTC
Yeah, I'm a member of the Church of Earthly Indulgence. I can't go to heaven until I get all this out of my system. :-) I'm not really a human, but I play one on earth. Cogito ergo sum a bum	[reply]
Re: 2007-swimsuit-w-lwp by merlyn (Sage) on Feb 15, 2007 at 17:29 UTC
Mine gets 830, although it misses the musicians (which seemed pointless and distracting) as well as the videos. I stared at the video scraping for about 15 minutes and went blind. So I'll not be doing that. -- Randal L. Schwartz, Perl hacker	[reply]
Re^2: 2007-swimsuit-w-lwp by hossman (Prior) on Feb 16, 2007 at 08:53 UTC
If you assume people are lazy, and that they reuse the same file names for lots of things, it makes it easy to spot the pattern in less then 15 minutes There doesn't seem to be any truely downloadable formats of the videos, but here's a start at getting the URLs for the text containers that identify where the streams are for a WMV capable player... `#!/usr/bin/perl -l use strict; use warnings; use LWP::Simple; my $base = "http://sportsillustrated.cnn.com"; my $path = "swimsuit/2007/02/03"; my $vidbase = "http://wmscnn.stream.aol.com.edgestreams.net/cnnsi"; my $vid_index = get "$base/features/2007_swimsuit/video/"; while ($vid_index =~ m{(/features/2007_swimsuit/video/.?.html)}g) { my $page = get "$base/$1"; if ($page and $page =~ m{/video/$path/(.?)/include\.js}) { my $code = $1; print "$vidbase/$path/$code/video.ws.wmv"; } }` [download]	[reply] [d/l]
Re^2: 2007-swimsuit-w-lwp by zentara (Cardinal) on Feb 15, 2007 at 17:54 UTC
The only reason I posted mine, was to spur you to post a better one. You are a hard man to whip. :-) I'm not really a human, but I play one on earth. Cogito ergo sum a bum	[reply]