Here's my spider version for the current website, fetching everything from 1996 to 2004 (for now) and the ultimate and 50th edition images:
Amazingly enough, this also fetches the "private" images that you should only be able to get if you're registered. Apparently, although the HTML pages are protected with their login, the images themselves are not, and the image thumbnails give away the full image names. Cool.#!/usr/bin/perl use strict; $|++; use LWP::Simple; -d "RESULTS" or mkdir "RESULTS", 0755 or die "cannot mkdir RESULTS: $! +"; my $all_model_index = get "http://sportsillustrated.cnn.com/swimsuit/c +ollection/"; while ($all_model_index =~ /(\/swimsuit\/collection\/models\/[-\w]+\.h +tml)/g) { my $model_index = get "http://sportsillustrated.cnn.com/$1"; while ($model_index =~ /\"(http:\/\/i\.cnn\.net\/si\/pr\/subs\/swims +uit\/images\/)([-\w]+)t\.jpg\"/g) { my $url = "$1$2.jpg"; my $file = "RESULTS/$2.jpg"; print "$url => $file: "; if (-e $file) { print "skip\n"; } else { print mirror($url, $file), "\n"; } } }
And once you get the results, you can symlink them by person with this:
#!/usr/bin/perl use strict; $|++; -d "SORTED" or mkdir "SORTED" or die "mkdir SORTED: $!"; for (glob "RESULTS/*") { my($basename, $person) = /RESULTS\/(.*?_(.*?)_[\db]+\.jpg)$/ or die "$_"; my $dir = "SORTED/$person"; -d $dir or mkdir $dir or die "mkdir $dir: $!"; my $target = $basename; for ($target) { s/^9/199/ or s/^0/200/; # patch up years $_ = "$dir/$_"; } -e $target or symlink "../../$_", $target or die "ln -s ../../$_ $ta +rget: $!"; }
-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.
In reply to Re: Swimsuits2004
by merlyn
in thread Swimsuits2004
by zentara
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |