comment on

Both of the replies above are good, and I would go with them first; however, I decided to toy with your question, and created the following script. If nothing else, it might give you something to play with.

A few notes regarding the code:

Pass in your favorite mirror, or change line 192016 to your favorite mirror site for CPAN, to be nice to the primary CPAN repository.
The script uses several modules not in the core, such as Sort::Versions for sorting files by version number.
The script downloads the find-ls.gz file in the indices directory, and processes it to determine the newest version of each file.
~~Filenames from the find-ls.gz file are handled manually, rather than by a module, so that is one potential improvement that could be made.~~ The script doesn't make allowances for symlinks, however. (See UPDATE below.)
$directory_match allows you to process only some directories-in the case below, only the modules/ subtree. $file_match allows you to only get files with specific patterns, such as only those ending in .tar.gz and .tgz.
$target_path is the directory to store the files under.
You can set $agent_id as you like.
The script uses the mirror() function from the LWP module, so files will not be downloaded if you already have them. The script does not, however, attempt to remove the old version you may have if a newer one is out.
While it seems to work for me, I can't guarantee it-but you already knew this.

#!/usr/bin/perl -w

use strict;

use Compress::Zlib;
use File::Glob ':glob';
use File::Spec;
use HTTP::Request;
use LWP;
use LWP::Simple;
use LWP::UserAgent;
use Sort::Versions;

my $DEBUG            = 0;
my $agent_id         = 'CPANChecker.pl/0.1 ';
# my $cpan_site_root   = $ARGV[0] || 'ftp://ftp.cpan.org/pub/CPAN';
my $cpan_site_root   = $ARGV[0] || 'http://www.perl.com/CPAN';
my $directory_match  = qr!modules/!;
my $find_ls_filename = 'indices/find-ls.gz';
my $file_match       = qr!(\.tar\.gz|\.tgz)!;
my $target_path      = '~/reference/cpan';
$target_path = bsd_glob( $target_path, GLOB_TILDE | GLOB_ERR );

my $ua = LWP::UserAgent->new;
$ua->agent($agent_id);

my $req =
  HTTP::Request->new(
    GET => join( '/', $cpan_site_root, $find_ls_filename ) );
$req->header( Accept => "text/html, */*;q=0.1" );

my $res = $ua->request($req);
if ( $res->is_success ) {
    my $find_ls_file = $res->content;
    my (%files);
    my @filelist = grep( /$directory_match/,
        split( /\n/, Compress::Zlib::memGunzip($find_ls_file) )
    );
    print( scalar(@filelist), "\n" ) if ($DEBUG);
    foreach ( 0 .. $#filelist ) {
        $filelist[$_] =~ s/\r//g;
        my @parts = split( /\s+/, $filelist[$_] );
        my $filepath = $parts[8];
        next unless $filepath =~ /^$directory_match/;
        next unless $filepath =~ /$file_match$/;
        {
            my ($path);
            my ($file);

            # @parts = split( /\//, $filepath );
            ( undef, $path, $file ) =
              File::Spec->splitpath($filepath);

            # my $path = join( '/', @parts[ 0 .. ( $#parts - 1 ) ] );
            # my $file = $parts[$#parts];
            @parts = split( /-/, $file );
            pop(@parts);
            my $module = join( '-', @parts );
            push( @{ $files{$path}{$module} }, $file );
        }
    }

    foreach my $k ( sort( keys(%files) ) ) {
        {
            my $pathname = $target_path;
            my @parts = split( /\//, $k );
            foreach my $p (@parts) {
                # $pathname = join( '/', $pathname, $p );
                $pathname = File::Spec->catfile( $pathname, $p );
                if ( !-e $pathname ) {
                    print "Creating $pathname...\n";
                    mkdir($pathname) or die("Error: $!\n");
                }
            }
        }
        print( $k, "\n" );
        foreach my $m ( sort( keys( %{ $files{$k} } ) ) ) {
            my @parts =
              sort(
                { versioncmp( $b, $a ) } @{ $files{$k}{$m} } );
            # print(
            #     'key: ',    $k,        "\n",
            #     'module: ', $m,        "\n",
            #     "\t",       $parts[0], "\n"
            #   )
            #   if ($DEBUG);
            print(
                'key: ', $k, "\n",
                'module: ',
                $m, "\n", "\t",
                File::Spec->catfile(
                    $target_path, $k, $parts[0]
                ),
                "\n"
              )
              if ($DEBUG);

            $req =
              HTTP::Request->new( GET =>
                  join( '/', $cpan_site_root, $k, $parts[0] ) );
            $req->header( Accept => "text/html, */*;q=0.1" );

            # $res =
            #   $ua->mirror(
            #     join( '/', $cpan_site_root, $k, $parts[0] ),
            #     join( '/', $k, $parts[0] ) );
            # $res = $ua->mirror(
            #     join( '/', $cpan_site_root, $k, $parts[0] ),
            #     File::Spec->catfile( $k, $parts[0] )
            # );
            $res = $ua->mirror(
                join( '/', $cpan_site_root, $k, $parts[0] ),
                File::Spec->catfile(
                    $target_path, $k, $parts[0]
                )
            );
             if ( is_success($res) ) {
                print( "...", $parts[0], "\n" );
            }
            if ( is_error($res) ) {
                print( "Error in retrieving $k/$parts[0] : ",
                    $res->status_line, "\n" );
            }
        }
    }
}
else {
    print(http://www.perl.com/CPAN/
"Error in retrieving $cpan_site_root/$find_ls_filename : ",
        $res->status_line, "\n"
    );
}
[download]

Update: 06 Mar 2004: Thanks to leira's suggestions, modified the code above to use the File::Spec module for dealing with the filenames. Still may be room for improvement with it, but getting there.

Update: 06 Mar 2004: Fixed $target_path to handle relative locations (with thanks to castaway for suggesting a module that could handle ~).

Update: 15 Mar 2004: Fixed apparent error in mirror statement when using relative paths. Changed several places where paths were assembled manually to use File::Spec->catpath().

Update: 02 May 2004: Changed $cpan_site_root 's default value to point to the CPAN Multiplexer (http://www.perl.com/CPAN/).

In reply to Re: code to grab the latest version of a module from CPAN? by atcroft
in thread code to grab the latest version of a module from CPAN? by perrin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.