in reply to Performance Question

That's a tough question to answer without knowing a few more variables.

Jumping ahead to a solution, I would probably slice the monster file into pieces (lots of ways to do that) then process a couple of pieces in paralell. The way I would test that would be to take a 1G slice of the file and pretend that's the big file, and try various different piece counts.

Failing that, write a program in C (something I've done many times) to suck the file in, 64K chunks at a time (or whatever size chunks your system can manage), then process the lines individually. The processed lines go into a 64K buffer, and when it gets full, you write it to the output file. Piece of cake. :) And you should get great performance doing it in C, better than Perl.

--t. alex

"Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny

Replies are listed 'Best First'.
Re: Re: Performance Question
by BUU (Prior) on May 08, 2002 at 13:57 UTC
    Would you really get a sizable performance increase by using c instead of perl to manipulate/print text? (honestly wondering)
      Depends on how good a C programmer you are. If you're reasonably good, yes. Probably a factor of two to four if the transforms are simple. More, possibly, depending on the IO subsystem. (It doesn't matter if your C program could run 50 times faster than the perl one if you've already maxed out your IO channel going twice as fast. You'll just twiddle your thumbs more)

      On the other hand it may take 5-10 times as long to write and debug the program, and maintenance/debugging it'll be a major pain relative to perl.

      A valid question. My guess is yes, but that's based on tuning the custom C program based on what system it runs on. It also depends if this is a one-time job or a weekly/monthly thing, as my initial post said. For a one-time thing, definitely go Perl. For a weekly job, it's worth the investment to write a really well-tuned, optimized C program.

      --t. alex

      "Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny