Benchmark: timing 30 iterations of REGEXP, SPLIT... REGEXP: 10 wallclock secs ( 9.73 usr + 0.25 sys = 9.98 CPU) SPLIT: 47 wallclock secs (47.20 usr + 0.28 sys = 47.48 CPU)
The script didn't pass muster with "-w" when I was trying to print matching lines to "NUL" (file handle hadn't been opened). I changed it simply to count matching lines. "-w" is now happy. (Note to self: something else for later study: how to print only to "NUL" w/out complaint from "-w".)
The source (CSV) file is 13,576 lines long (1,703,397 bytes). Each record has 12 fields; the average length per record is 124 characters. The task is to print only lines whose fourth fields contain "MAPI".
use strict; use Benchmark; timethese( 30, { REGEXP => 'UsingRegExp', SPLIT => 'UsingSplit' } ); sub UsingRegExp { my $file = 'r:\csv\test.csv'; my $field; my $count = 0; open FH, $file or die "\n $file: $!\n"; while ( <FH> ) { # WANT 4TH FIELD. (NOTE: SOME FIELDS _MIGHT_ BE EMPTY.) ($field) = /^[^,]+,[^,]*,[^,]*,\s*([^,]+)\s*,/; $count++ if lc($field) eq "mapi"; # IGNORE CASE } close FH or die "\n $file: $!\n"; } sub UsingSplit { my $file = 'r:\csv\test.csv'; my @record; my $count = 0; open FH, $file or die "\n $file: $!\n"; while ( <FH> ) { @record = split /\s*,\s*/; $count++ if lc($record[3]) eq "mapi"; # IGNORE CASE } close FH or die "\n $file: $!\n"; }
In reply to RE: RE: From one beginner to others . . .
by greenhorn
in thread From one beginner to others . . .
by greenhorn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |