Re^2: reading compressed data

Also, aside from keeping everything perl 'inside perl', do you know whethere there are likely to be any performance gains to be had from using the first approach you mention? I'm working with a rather large dataset (5.3GB of text in gzip format), so any speed-up would be nice. didn't even realize that i can use standard pipeline options in my open() statements! thanks!

Comment on Re^2: reading compressed data

Replies are listed 'Best First'.
Re^3: reading compressed data by graff (Chancellor) on Dec 13, 2006 at 03:47 UTC
I haven't done / don't recall seeing any benchmarks comparing PerlIO::gzip against the pipeline open, and I wouldn't hazard a guess that one of them is likely to be significantly faster than the other. If it's just a one-shot pass over 5.3 GB, don't sweat it and use whichever one strikes you as more fun. But if this will be an ongoing, oft-repeated process working on lots of data, it might be worth your while to set up a simple test to see if there might be a speed difference. In that case, I'd advise against test scripts that only do the i/o. Contrast two versions of the script such that both do everything that needs to be done, and they differ only in the i/o method. If one is faster than the other, you'll get a clear idea of how important the difference is in the context of evertyhing else the script does. UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).	[reply]

Replies are listed 'Best First'.

Re^3: reading compressed data
by graff (Chancellor) on Dec 13, 2006 at 03:47 UTC

If it's just a one-shot pass over 5.3 GB, don't sweat it and use whichever one strikes you as more fun. But if this will be an ongoing, oft-repeated process working on lots of data, it might be worth your while to set up a simple test to see if there might be a speed difference.

In that case, I'd advise against test scripts that only do the i/o. Contrast two versions of the script such that both do everything that needs to be done, and they differ only in the i/o method. If one is faster than the other, you'll get a clear idea of how important the difference is in the context of evertyhing else the script does.

UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

[reply]