in reply to 2007-swimsuit-w-lwp

Mine gets 830, although it misses the musicians (which seemed pointless and distracting) as well as the videos. I stared at the video scraping for about 15 minutes and went blind. So I'll not be doing that.

Replies are listed 'Best First'.
Re^2: 2007-swimsuit-w-lwp
by hossman (Prior) on Feb 16, 2007 at 08:53 UTC

    If you assume people are lazy, and that they reuse the same file names for lots of things, it makes it easy to spot the pattern in less then 15 minutes

    There doesn't seem to be any truely downloadable formats of the videos, but here's a start at getting the URLs for the text containers that identify where the streams are for a WMV capable player...

    #!/usr/bin/perl -l use strict; use warnings; use LWP::Simple; my $base = "http://sportsillustrated.cnn.com"; my $path = "swimsuit/2007/02/03"; my $vidbase = "http://wmscnn.stream.aol.com.edgestreams.net/cnnsi"; my $vid_index = get "$base/features/2007_swimsuit/video/"; while ($vid_index =~ m{(/features/2007_swimsuit/video/.*?.html)}g) { my $page = get "$base/$1"; if ($page and $page =~ m{/video/$path/(.*?)/include\.js}) { my $code = $1; print "$vidbase/$path/$code/video.ws.wmv"; } }
Re^2: 2007-swimsuit-w-lwp
by zentara (Cardinal) on Feb 15, 2007 at 17:54 UTC
    The only reason I posted mine, was to spur you to post a better one. You are a hard man to whip. :-)

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum