Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Google like scalability using perl?

by toma (Vicar)
on Oct 11, 2009 at 18:21 UTC ( [id://800559]=note: print w/replies, xml ) Need Help??

in reply to Google like scalability using perl?

Another way to do this is to run Apache and mod_perl on different machines, and split the large hash between them. The hash and code stays in memory with mod_perl.

You might also try memcached, as suggested above. The combination of memcached and mod_perl performs better than I expected.

For doing a large batch job by farming work out to lots of nodes, the trick is to use something like Amazon's Simple Queue Service, which keeps the work from overlapping. I don't know of a perl module that implements this, so if your code has this feature it would make a great CPAN module by itself.

It should work perfectly the first time! - toma
  • Comment on Re: Google like scalability using perl?

Replies are listed 'Best First'.
Re^2: Google like scalability using perl?
by dpavlin (Friar) on Oct 14, 2009 at 22:26 UTC

    This is great solution if you can afford this kind of deployment. I'm really happy user of mod_perl, but this time I didn't really had machines for deployment at all :-)

    All I had where 12 machines dedicated to be web kiosks. This is why I'm trying to avoid disk activity. Besides short bursts of CPU activity, which can be controlled by shard size, I won't affect normal usage of machine which are dual core anyway, so I can use only one core if that becomes problem.

    In fact, i went so far to require only core perl modules so I can depend only on perl which is standard on Debian installs anyway since packet manager uses it) and ssh (for which I use dropbear).

    I also noticed that automatic deployment of new version and restart is somewhat of challenge if you do it by hand, so Sack can push code update to nodes (using cpio over ssh since I don't really have scp or rsync) and re-exec itself.

    Which was nice, but re-exec required me to re-feed data onto each node on restart. This is also nice way to get recovery for nodes or some kind of load migration (if one set of machines becomes busy I should be able to move shards to other machines or increase shard size of idle machines). None of that exists yet, unfortunately.

    I have looked into messaging solutions, but my preference is to have queue locally (nodes are part of intranet network) and although they do have Internet connectivity, I would love them not to leave intranet.

    I would generally prefere something like RestMS (so, blame CouchDB REST influence on me) or even CouchDB with RabbitMQ before trying to develop for some external service.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://800559]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-13 10:44 GMT
Find Nodes?
    Voting Booth?

    No recent polls found