Again, many thanks for the assistance.. and methinks I need to knuckle-down and get organized to study some regex *properly*..!
Anyway, here are the two functions I'm comparing (at the bottom of this posting); my original 'TestMCa' and a new sub based on kcott's suggestion, 'TestMCb'.
I know there's a lot of repetition of the same thing being done with these sorts of subs (I repeatedly work-out the '1st level' of the old $master string each time I call the sub, which could be done differently, if I was really worried about it) ...but I'll probably re-work that at some time... but for the moment, both of these subs work Ok.
So, I've been trying them both in my 'real' code... but you could just as easily check them out with some code that reads a file of Usenet groups, as I mentioned before... and the whole record is the 'category'.
I did some testing with each sub... matching categories on the '1st level' only ('one word')... and processing about 700 records. I ran the program a few times with each option, so as to get rid of any variances of caching, program RAM space, etc... and the results follow:-
TestMCa: ELAPSED: 0.0775 0.0767 0.0751 0.0758
TestMCb: ELAPSED: 0.0658 0.0643 0.0641 0.0659
Saving: 15% 16% 15% 13%
I'd used the Time::HiRes module to simply check the elapsed time to run each sub, so the results are in seconds.
Like I so often see in Perl, you can write something pretty awful and something pretty clever and the performance will not be very different... but I guess the difference can be significant if you're doing an operation a few million times. In my application, the difference is practically nothing... and I'll only invoke the program a handful of times in some analysis.. so it's really a moot point, I guess, to have even worried about the rubbishy sub... *hmmm*
Still, it's all good to learn... and it gives me another boot to get onto working with regex 'seriously' :)
Oh... kcott - a couple of notes for you... It was the whole point of the exercise to get the code to determine the strings to match; hence, providing a 'level' argument to the subs. I take your point about 'use strict' and 'use warnings' ... and I admit I'm slack about that. ...and true enough about the "C-like feel" in my coding -- but my home node might provide some insight into where that comes from :)
Thanks again, everyone, for all your help. I appreciate it a lot.
sub TestMCa { my ($test_cat, $rec_cat, $level) = @_; my ($num_m, $buf, $i, $retstat, $master); my (@m); $test_cat .= "."; $rec_cat .= "."; $master = $test_cat; @m = split(/\./, $master); $num_m = @m; $buf = ""; for ($i = 0; $i < $level; $i++) { if ($i >= $num_m) { last; } $buf .= $m[$i] . "."; } if ($rec_cat =~ /^$buf/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCa sub TestMCb { my ($test_cat, $rec_cat, $level) = @_; my $re = '^' . join('\.' => (split /\./, $test_cat, $level + 1)[0 .. $le +vel - 1]); $re .= $re =~ /\.$/ ? '[^.]' : '(?:[.]|$)'; if ($rec_cat =~ /$re/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCb
In reply to Re: I think regex Should Help Here... but How!?
by ozboomer
in thread I think regex Should Help Here... but How!?
by ozboomer
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |