glasswalk3r has asked for the wisdom of the Perl Monks concerning the following question:

Hello follow monks,

Are you aware of any technique to reduce the size of a CPAN::Mini repository?

I was checking my local mirror here and it is around 4.9Gb on a OpenBSD 6 box with FFS.

Taking a look at my ID under authors directory, I see that there are tarballs over there that are not even listed on my PAUSE account anymore (might be some issue regarding mirror synchronization). Anyway, I guess that for my purposes I could use only the latest available distribution for everybody.

A quick check on CPAN::Mini and minicpan documentation doesn't show anything that would help with that.

Is there any trick to take care of it?

Thanks!

UPDATED

I tried this:

use warnings; use strict; use DBI; use Set::Tiny; use CPAN; CPAN::HandleConfig->load; CPAN::Shell::setup_output; CPAN::Index->force_reload; my $dbfile = '/home/vagrant/.cpan/cpandb.sql'; my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","",""); my $query_distros = $dbh->prepare(q{select A.dist_file from dists A jo +in auths B on A.auth_id = B.auth_id where B.cpanid = ?}); my $query_authors = $dbh->prepare(q{select cpanid from auths}); $query_authors->execute(); my $removed = 0; while (my $row = $query_authors->fetchrow_arrayref()) { my $distros = get_distro_files($dbh, $query_distros, $row->[0]); my $path = '/minicpan/authors/id/' . substr($row->[0],0,1) . '/' . + substr($row->[0],0,2) . '/' . $row->[0]; next unless ( -d $path ); opendir(DIR,$path) or die "Cannot read $path: $!"; my @files = readdir(DIR); close(DIR); shift(@files); shift(@files); foreach my $distro_file(@files) { next if $distro_file eq 'CHECKSUMS'; my $to_remove = $path . '/' . $distro_file; next unless (-f $to_remove); unless ($distros->has($distro_file)) { my $to_remove = $path . '/' . $distro_file; print "$to_remove can be removed\n"; unlink( $to_remove ) or warn "could not remove $to_remove: + $!"; $removed++; } } } $dbh->disconnect(); print "Total removed: $removed\n"; sub get_distro_files { my ($dbh, $sth, $author) = @_; $sth->bind_param(1, $author); $sth->execute(); my @distros; while ( my $row = $sth->fetchrow_arrayref() ) { push(@distros, $row->[0]); } return Set::Tiny->new(@distros); }

Still pending to validate if it didn't break anything...

Anyway, I was able to reduce it to 3.0Gb from the initial 4.9Gb...

Alceu Rodrigues de Freitas Junior
---------------------------------
"You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

Replies are listed 'Best First'.
Re: CPAN::Mini on a diet
by marto (Cardinal) on Mar 22, 2017 at 10:40 UTC

    I'm working on something as a background task, something way down my priority list to make managing CPAN::Mini easier to configure. In the meantime check out CPAN::Mini::NoLargeFiles, CPAN::Mini::LatestDistVersion and make extensive use of the filtering options CPAN::Mini has, e.g. ACME*, if you don't work on Windows, don't mirror Win32*, experiment with ditching App*.

      Thanks for the tips, I'll take a look specially at CPAN::Mini::LatestDistVersion. I used my script for cleanup, but as soon as I run another minicpan to update the mirror, I got download those old versions (which is OK, since this is the expected behavior).

      Maybe CPAN::Mini should include the features of CPAN::Mini::LatestDistVersion as well.

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill

        " but as soon as I run another minicpan to update the mirror, I got download those old versions (which is OK, since this is the expected behavior)."

        For clarity, are you running minicpan -c CPAN::Mini::LatestDistVersion when updating to use the CPAN::Mini::LatestDistVersion class?

Re: CPAN::Mini on a diet
by Anonymous Monk on Mar 21, 2017 at 23:30 UTC

    Hi,

    How long have you had this minicpan repository? Do you have skip_cleanup turned on?

    What is your ID?

    Check the indexes (02packages.details.txt.gz or 03modlist.data.gz) they can get wonky sometimes ( CPAN not indexing?)

      Answering your questions in order:

      1. About a week, no more than that.
      2. Nope, below is the config I'm using.
      3. ARFREITAS
      4. Based on values available on the SQLite DB and the files available on the minicpan mirror, it is OK.
      -bash-4.3$ cat .minicpanrc local: /minicpan remote: http://mirror.nbtelecom.com.br/CPAN exact_mirror: 1 also_mirror: indices/find-ls.gz
      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill