Again, many thanks for the assistance.. and methinks I need to knuckle-down and get organized to study some regex *properly*..!

Anyway, here are the two functions I'm comparing (at the bottom of this posting); my original 'TestMCa' and a new sub based on kcott's suggestion, 'TestMCb'.

I know there's a lot of repetition of the same thing being done with these sorts of subs (I repeatedly work-out the '1st level' of the old $master string each time I call the sub, which could be done differently, if I was really worried about it) ...but I'll probably re-work that at some time... but for the moment, both of these subs work Ok.

So, I've been trying them both in my 'real' code... but you could just as easily check them out with some code that reads a file of Usenet groups, as I mentioned before... and the whole record is the 'category'.

I did some testing with each sub... matching categories on the '1st level' only ('one word')... and processing about 700 records. I ran the program a few times with each option, so as to get rid of any variances of caching, program RAM space, etc... and the results follow:-

    TestMCa: ELAPSED: 0.0775  0.0767  0.0751  0.0758 
    TestMCb: ELAPSED: 0.0658  0.0643  0.0641  0.0659 
              Saving:    15%     16%     15%     13%

I'd used the Time::HiRes module to simply check the elapsed time to run each sub, so the results are in seconds.

Like I so often see in Perl, you can write something pretty awful and something pretty clever and the performance will not be very different... but I guess the difference can be significant if you're doing an operation a few million times. In my application, the difference is practically nothing... and I'll only invoke the program a handful of times in some analysis.. so it's really a moot point, I guess, to have even worried about the rubbishy sub... *hmmm*

Still, it's all good to learn... and it gives me another boot to get onto working with regex 'seriously' :)

Oh... kcott - a couple of notes for you... It was the whole point of the exercise to get the code to determine the strings to match; hence, providing a 'level' argument to the subs. I take your point about 'use strict' and 'use warnings' ... and I admit I'm slack about that. ...and true enough about the "C-like feel" in my coding -- but my home node might provide some insight into where that comes from :)

Thanks again, everyone, for all your help. I appreciate it a lot.

sub TestMCa { my ($test_cat, $rec_cat, $level) = @_; my ($num_m, $buf, $i, $retstat, $master); my (@m); $test_cat .= "."; $rec_cat .= "."; $master = $test_cat; @m = split(/\./, $master); $num_m = @m; $buf = ""; for ($i = 0; $i < $level; $i++) { if ($i >= $num_m) { last; } $buf .= $m[$i] . "."; } if ($rec_cat =~ /^$buf/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCa sub TestMCb { my ($test_cat, $rec_cat, $level) = @_; my $re = '^' . join('\.' => (split /\./, $test_cat, $level + 1)[0 .. $le +vel - 1]); $re .= $re =~ /\.$/ ? '[^.]' : '(?:[.]|$)'; if ($rec_cat =~ /$re/i) { $retstat = 1; } else { $retstat = 0; } return($retstat); } # end TestMCb

In reply to Re: I think regex Should Help Here... but How!? by ozboomer
in thread I think regex Should Help Here... but How!? by ozboomer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.