Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Advice on building a CPAN mirror

by rah (Monk)
on Jun 28, 2002 at 02:27 UTC ( [id://177910]=perlquestion: print w/replies, xml ) Need Help??

rah has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

Can anyone offer any advice, assistance, or even an RTFM about how to create a CPAN mirror? I'm looking for the practical stuff, beyond the FAQs on CPAN itself. My concern was the initial load. That concern turned out to be justified. Using the FTP mirror script pointed to from the CPAN FAQ, I was able to start pulling the archive. However, my FTP sessions would invariably die before a very significant portion was transferred.

Later I tried rsync (which has gotten me into hot water with our security group, who didn't want to open the firewall to a protocol they weren't familiar with). This clearly was a better, more efficient approach. But the initial load is huge. The MOTD displayed on the rsync to the main CPAN archive even warns against trying to pull the entire archive.

My question is; have any brother monks, set up a CPAN mirror, and if so, what advice could they offer on the best approach for the initial load? By author seems to be the primary means by which the archive is organized. Thus far, I have been stepping through subsets of the alphabet, to try and transfer more manageable sized pieces. Is this the best approach? Suggestions?

TIA,
-rah

Replies are listed 'Best First'.
•Re: Advice on building a CPAN mirror
by merlyn (Sage) on Jun 28, 2002 at 03:01 UTC
    I'd recommend that wherever you suck it from, you get permission. I got permission to poll daily using rsync from the master site for a private archive on stonehenge.com, so that I can mirror it down to my laptop on a regular basis. We coordinated it so that it'd be during a traditionally low-usage time period.

    And I'd definitely recommend rsync over FTP. Rsync just plain rocks. I'd be happy to talk to your admins to let them know that rsync servers are no less secure than FTP servers.

    -- Randal L. Schwartz, Perl hacker

Re: Advice on building a CPAN mirror
by crazyinsomniac (Prior) on Jun 28, 2002 at 10:17 UTC
Re: Advice on building a CPAN mirror
by ehdonhon (Curate) on Jun 28, 2002 at 13:39 UTC

    I actually set up a CPAN mirror earlier this year.

    Unless you find that you can get significantly better network performance on a closer mirror, I recommend just going with funet.fi. From what I've gathered in my communications with them, they don't seem to mind, and it really is the most reliable. (You should still ask first). I started with a local mirror, but then switched over to funet whenever that other mirror dropped off the face of the planet without a trace.

    As for the MOTD, you need to understand that ftp.funet.fi rsync server is hosting more than just CPAN. That MOTD is saying don't try to download their entire site. If all you are doing is rsync'ing the CPAN repository, that is ok.

    I definately suggest rsync over ftp. If your security people don't know what rsync is, work hard to extend them a clue. As of this morning the repository size is 1.2 gigs. My last rsync (running twice a day) used less than 9 megs of traffic and took about 4 minutes for all of CPAN (your speed results will vary depending on your bandwidth).

    Other advice:

  • If you are setting up a cron job to update your mirror, make sure you mail yourself the results so that you can monitor everything. I'm glad I did this or I would have never known that the mirror I was maintaining went stale when the other mirror disappeared.
  • Make sure when you pubilish the mirror, you make it available in /pub/CPAN and not in your document root. This will make things easier on you later if you ever decide you want to participate in load balancing.

    If you have any other questions, feel free to msg me, or email me.. cpan@pair.com. :)

Re: Advice on building a CPAN mirror
by rah (Monk) on Jun 29, 2002 at 00:28 UTC
    Thanks all for the tips.

    I combined the best of most of them and got the job done. I contacted funet.fi (although I only got the auto-reply before I started pulling data). I used rsync and got around my firewall issues by getting around my firewall, plugging my laptop into a switch between our router and our ISP. Sshhhh! Don't tell the security guys.

    After a somewhat painful lesson on how rsync works, I got all the data downloaded fairly quickly. (The other lesson I learned is check out your mirror sources closely. The FTP site I had pulled my preliminary load from was stale by 3 weeks!) I did pull the data down gradually, walking through the alphabetical author/id dirs. Then pulling the whole archive to make sure I had everything and all the links were intact. They were, for the most part, 20 minutes and that step was done!

    Once I had the archive complete on the laptop, I rsync-ed my linux box to the laptop. Then I set about the task of making it available to my other hosts (that's where i learned what I did wrong with rsync, and had I RTFM more carefully, I should have caught that mistake. Don't forget the trailing slash on the destination directory!) Set up anonymous FTP on the linux box. Changed the default_site in CPAN.pm on my test box. By end of day I was able to use my mirror to upgrade CPAN on my test host. Cool.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://177910]
Approved by mrbbking
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-18 05:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found