Re: Reg: Performance

Sounds like an ideal job for Tie::File::AsHash ?

Anyway, you really want to avoid reading DUMP_B millions of times, once for every line in DUMP_A. It should be possible to read in both files once, and create a hash for each, keyed by the ID field, then iterate through the hash for DUMP_A and look up the corresponding entry in the hash for DUMP_B - this is the reason I suggested the above module (haven't actually tried it, shame on me) - you should be able to treat each dump file like a hash, and split on whatever character you want.

Comment on Re: Reg: Performance

Replies are listed 'Best First'.
Re^2: Reg: Performance by use perl::always (Initiate) on Oct 28, 2010 at 09:02 UTC
Greetings, I realize that it is your intention to use perl for this task, and while I haven't seen either dump_a \|\| dump_b I can't help but wonder if cat \| sed \| sort \| uniq might not be of great help here. I run a _huge_ RBL with _millions_ of IP addresses. I constantly need to parse logs, and add/remove results from the block lists. While I began my strategy using perl scripts. I ultimately found that cat \| sed \| sort \| uniq would accomplish the task in seconds as opposed to minutes/hours. Perhaps it's my perl skills. But I just thought it was worth mentioning. HTH --Chris Shameless self pronotion follows PerlWatch	[reply]

Replies are listed 'Best First'.

Re^2: Reg: Performance
by use perl::always (Initiate) on Oct 28, 2010 at 09:02 UTC

Greetings,

I realize that it is your intention to use perl for this task, and while I haven't seen either dump_a || dump_b

I can't help but wonder if

cat | sed | sort | uniq

might not be of great help here.

I run a _huge_ RBL with _millions_ of IP addresses.

I constantly need to parse logs, and add/remove results from the block lists.

While I began my strategy using perl scripts. I ultimately found that

cat | sed | sort | uniq

would accomplish the task in seconds as opposed to minutes/hours.

Perhaps it's my perl skills. But I just thought it was worth mentioning.

HTH

--Chris

Shameless self pronotion follows
PerlWatch

[reply]