My dataset is around 4MB. My laptop is an old laptop, 64 MB RAM, 200 MHz CPU, running Linux. The biggest single difference though is that my data set is a version of the script catted against itself until I got up to around 4 MB. So most lines match some error.bash-2.01$ perl scanall.pl scantst Benchmark: timing 5 iterations of BIG_REGEX, CODE_REGEX, MANY_REGEXES, + NO_REGEX... BIG_REGEX: 138 wallclock secs (133.78 usr + 0.96 sys = 134.74 CPU) CODE_REGEX: 532 wallclock secs (517.61 usr + 2.68 sys = 520.29 CPU) MANY_REGEXES: 644 wallclock secs (626.63 usr + 2.96 sys = 629.59 CPU) NO_REGEX: 176 wallclock secs (170.91 usr + 1.41 sys = 172.32 CPU) bash-2.01$ p56 scanall.pl scantst Benchmark: timing 5 iterations of BIG_REGEX, CODE_REGEX, MANY_REGEXES, + NO_REGEX... BIG_REGEX: 167 wallclock secs (160.56 usr + 2.00 sys = 162.56 CPU) @ + 0.03/s (n=5) CODE_REGEX: 171 wallclock secs (166.17 usr + 1.49 sys = 167.66 CPU) @ + 0.03/s (n=5) MANY_REGEXES: 241 wallclock secs (232.62 usr + 2.10 sys = 234.72 CPU) + @ 0.02/s (n=5) NO_REGEX: 175 wallclock secs (169.77 usr + 1.34 sys = 171.11 CPU) @ + 0.03/s (n=5) bash-2.01$
Therefore your numbers mainly test how quickly it can discard a line, mine how quickly it can recognize that there is one and locate it. My regex, being so complex, gives virtually no chance for optimizations to throwing away lines. It also hits Ilya's new tests for some very slow REs and so slowed down moving forward, while the other tests sped up quite considerably. (Basically the big RE is testing for signs of excessive backtracking. Well the RE is designed specifically with avoiding backtracking in mind, but loses time on the testing.)
In reply to RE (tilly) 6: SAS log scanner
by tilly
in thread SAS log scanner
by nop
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |