http://qs1969.pair.com?node_id=11137164


in reply to Re^2: grepping CPAN?
in thread grepping CPAN?

You really have to filter this now, there's just so much bloat, it's a far cry from fitting it onto a CDROM :)

Replies are listed 'Best First'.
Re^4: grepping CPAN?
by LanX (Saint) on Oct 01, 2021 at 13:35 UTC
    wasn't there recently a call to authors to delete old versions of their modules, or do I misremember ...

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      You remember correctly. However, some authors are unreachable and/or apathetic and the number of dists constantly increases as well.


      🦛

      That's not the issue, minicpan, unless you're doing something weird, should only pull back the latest releases required to build distributions. Over the years people have uploaded many modules, and some very large in the App space (including vast bundles of other software). Unless you configure it to ignore bloat then you won't avoid this, and even then I've come across legitimate modules that have a dependency on ACME modules (for 'test' data).

        How do you define bloat in a filterable way?

        (Did I miss a bloat flag in the meta files? ;)

        FWIW: For the purpose of this thread downloading only pure text like Perl code should be fine. (or just excluding any binary)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        > Unless you configure it to ignore bloat then you won't avoid this

        For the aim of parsing all Perl&POD source locally I'd need to pull all text and ignore binaries and other "bloat" (to be defined) to save disc space.

        But this won't be faster in net-load, since AFAIK does filtering happen after downloading the full dist's tgz. °

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        °) well probably avoiding extracting certain files from the tgz might speed up things a little tho.