in reply to I think regex Should Help Here... but How!?

Again, many thanks for the assistance.. and methinks I need to knuckle-down and get organized to study some regex *properly*..!

Anyway, here are the two functions I'm comparing (at the bottom of this posting); my original 'TestMCa' and a new sub based on kcott's suggestion, 'TestMCb'.

I know there's a lot of repetition of the same thing being done with these sorts of subs (I repeatedly work-out the '1st level' of the old $master string each time I call the sub, which could be done differently, if I was really worried about it) ...but I'll probably re-work that at some time... but for the moment, both of these subs work Ok.

So, I've been trying them both in my 'real' code... but you could just as easily check them out with some code that reads a file of Usenet groups, as I mentioned before... and the whole record is the 'category'.

I did some testing with each sub... matching categories on the '1st level' only ('one word')... and processing about 700 records. I ran the program a few times with each option, so as to get rid of any variances of caching, program RAM space, etc... and the results follow:-

    TestMCa: ELAPSED: 0.0775  0.0767  0.0751  0.0758 
    TestMCb: ELAPSED: 0.0658  0.0643  0.0641  0.0659 
              Saving:    15%     16%     15%     13%

I'd used the Time::HiRes module to simply check the elapsed time to run each sub, so the results are in seconds.

Like I so often see in Perl, you can write something pretty awful and something pretty clever and the performance will not be very different... but I guess the difference can be significant if you're doing an operation a few million times. In my application, the difference is practically nothing... and I'll only invoke the program a handful of times in some analysis.. so it's really a moot point, I guess, to have even worried about the rubbishy sub... *hmmm*

Still, it's all good to learn... and it gives me another boot to get onto working with regex 'seriously' :)

Oh... kcott - a couple of notes for you... It was the whole point of the exercise to get the code to determine the strings to match; hence, providing a 'level' argument to the subs. I take your point about 'use strict' and 'use warnings' ... and I admit I'm slack about that. ...and true enough about the "C-like feel" in my coding -- but my home node might provide some insight into where that comes from :)

Thanks again, everyone, for all your help. I appreciate it a lot.

sub TestMCa { my ($test_cat, $rec_cat, $level) = @_; my ($num_m, $buf, $i, $retstat, $master); my (@m); $test_cat .= "."; $rec_cat .= "."; $master = $test_cat; @m = split(/\./, $master); $num_m = @m; $buf = ""; for ($i = 0; $i < $level; $i++) { if ($i >= $num_m) { last; } $buf .= $m[$i] . "."; } if ($rec_cat =~ /^$buf/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCa sub TestMCb { my ($test_cat, $rec_cat, $level) = @_; my $re = '^' . join('\.' => (split /\./, $test_cat, $level + 1)[0 .. $le +vel - 1]); $re .= $re =~ /\.$/ ? '[^.]' : '(?:[.]|$)'; if ($rec_cat =~ /$re/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCb

Replies are listed 'Best First'.
Re^2: I think regex Should Help Here... but How!?
by kcott (Archbishop) on Feb 28, 2014 at 12:40 UTC

    I've just revisited this thread and noticed your new post. (As it was a reply to your OP, I didn't see it earlier.)

    "...and true enough about the "C-like feel" in my coding -- but my home node might provide some insight into where that comes from :)"

    I check home nodes before replying: it oftens gives a hint as to how to frame the answer. And, yes, I did see "Programming in C since 1989, ..." as well as the Location :-)

    -- Ken