Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

All your observations are totally correct, and I would suggest to everyone to re-read your reply before deciding on architecture for system like this one.

I had different end-goal: make map over perl structure scale across commodity low-end desktop machines. To be honest I was hoping to be able to do it on single machine, in-memory but it didn't work out for me.

Dataset isn't huge: 540 Mb with 121887 records with about 4.5K per record.

dpavlin@klin:~$ ls -alh /data/isi/full.txt -rw-r--r-- 1 dpavlin dpavlin 546M 2009-10-05 14:15 /data/isi/full.txt <code> Structure is quote simple with repetable field names: <code> PT J AU RABIN, DL KALIMO, E MABRY, JH TI WORLD-HEALTH-ORGANIZATION INTERNATIONAL COLLABORATIVE STUDY OF MEDICAL-CARE UTILIZATION - SUMMARY OF METHODOLOGICAL STUDIES AND PRELIMINARY FINDINGS SO SOCIAL SCIENCE & MEDICINE LA English DT Article C1 GEORGETOWN UNIV,SCH MED,DEPT COMMUNITY MED & INT HLTH,WASHINGTON,DC + 20007. SOCIAL INSUR INST,RES INST SOCIAL SECUR,HELSINKI,FINLAND. UNIV VERMONT,COLL MED,DEPT COMMUNITY MED,BURLINGTON,VT. CR 1970, WHO INT COLLABORATIV 1972, MILBANK MEMORIAL F 2, V50 BICE TW, 1969, CROSS NATIONAL MEASU BICE TW, 1971, SOC SCI MED, V5, P283 BICE TW, 1972, MILBANK MEM FUND Q, V50, P57 KALIMO E, 1970, BRIT J PREVENTIVE SO, V24, P229 KALIMO E, 1972, 1 WHO ICS MCU OCC RE KALIMO E, 1972, MED CARE, V10, P95 KOHN R, IN PRESS LOGAN RFL, 1972, MILBANK MEMORIAL F 2, V50, P45 RABIN DL, 1972, MILBANK MEMORIAL F 2, V50, P19 SCHACH E, 1972, MILBANK MEM FD Q, V50, P65 VUKMANOVIC C, 1972, MILBANK MEMORIAL F 2, V50, P5 WHITE KL, 1967, NEW ENGL J MED, V277, P516 WHITE KL, 1969, INT COMPARISONS MEDI WHITE KL, 1972, MILBANK MEMORIAL F 2, V50, P31 NR 16 TC 2 PU PERGAMON-ELSEVIER SCIENCE LTD PI OXFORD PA THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD, ENGLAND OX5 1GB SN 0277-9536 J9 SOC SCI MED JI Soc. Sci. Med. PY 1974 VL 8 IS 5 BP 255 EP 262 PG 8 SC Public, Environmental & Occupational Health; Social Sciences, Biome +dical GA T5180 UT ISI:A1974T518000003 ER
My basic goal was to write something like this for query:
# $rec - input record # $out - generated data foreach ( @{ $rec->{C1} } ) { my $country = $1 if m{,\s?([^,]+)\.$}; $country =~ s{^.+USA$}{USA}; $country =~ s{^\w\w\s\d{5}$}{USA}; $country =~ s{^\w\w$}{USA}; $out->{'C1_country+'}->{ uc $country }++; }
And I wanted to run this query as fast as possible on all my data (which affected my decision not to use disk for processing), so simpliest possible solution I could come up with is to load it all into perl hash.

This worked well for most fields, until I discovered that I have 4.5 milion entries for CR running following code:

$out->{'CR+'}->{ $_ }++ foreach @{ $rec->{CR} };
Perl hash structures are nice, and code is clean and concise, but memory usage for results pushed me to swap (confirming that my anti-disk bias is correct). Having said that, I have used swap very effectively as first line of on-disk offload before, but with a result set which has random access pattern it doesn't mix well (disks are tipically SATA drives, so fast in linear reads, but slow for almost anything else.

To work around this problem, I implemented conversion from long fields names in CR (err... longer than 4 bytes for int ignoring perl decorations) into simple integer which enabled me to put 4.5 million CR records onto single machine.

I decided to use MD5 hash for each key, keep mapping of md5 to integer in-memory and replace (memory hungry) key with integer while preserving value. This way, I pushed full key names to disk in form of int -> full_name mapping.

On disk storage had two iterations, in first one I used BerkeleyDB. It saved so much memory that whole database file could fit in /dev/shm :-)

But, storing fields on disk did come with huge performance penalty to my query time. It took longer than 3 seconds to complete query and I wasn't prepared to admit defeat. To speed it up, it was logical to shard it across machines. Even better, I could control worst possible query time by changing size of shard for each node to adjust for different systems.

To implement communication between nodes, I first implemented fancy protocol, but then decided to just ship Storable object directly to socket. Well, not quite directly, since I'm using ssh with compression to speed up network transfer. Since it's mostly bulk (send data to node or receive results) it improves performance on my 100Mb/s network for about 30%.

Whole idea is to have fast throw-away calculations on data which comes from semi-formatted text files (e.g. Apache logs) so your note about limited lifespan is so very right. This was design decision. With this model, I can start on single machine until I fill up memory (or query becomes too slow) and then spread it across other machines until I have whole dataset available or scale out for query speed.


2share!2flame...

In reply to Re^4: Google like scalability using perl? by dpavlin
in thread Google like scalability using perl? by dpavlin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-28 21:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found