in reply to Re: How to count substitutions on an array
in thread How to count substitutions on an array

Marshall,

 

Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Consider the following solution concept, which is what I had before, that I don't think is nearly so fast (though I didn't time it precisely).

foreach $replace (@substitutionlist) { ($oldline, $newline) = split(/\t/, $replace); $newline = s/\s/_SPACE_/g; #….more code here foreach $line (@array) { $count += s/\b$oldline\b/$spliced/eg for @array; #….more code here } } s/_SPACE_/ /g for @array;

Consider that @substitutionlist has over 10,000 lines, and @array has over 30,000, with many lines requiring multiple substitution replacements on a single line. Is the foreach setup outlined above really just as efficient?

Replies are listed 'Best First'.
Re^3: How to count substitutions on an array
by AnomalousMonk (Archbishop) on Aug 13, 2016 at 14:29 UTC
    $count += s/\b$oldline\b/$spliced/eg for @array;

    A small point: I don't understand why you're using the  /e modifier in this substitution. It "works" (i.e., produces the same result) just as well with it as without, but if you're concerned about speed, I don't see how firing up the interpreter for each and every substitution is going to help. Is  $spliced just a placeholder for significant code that you don't want to show?


    Give a man a fish:  <%-{-{-{-<

Re^3: How to count substitutions on an array
by AnomalousMonk (Archbishop) on Aug 13, 2016 at 18:23 UTC
    foreach $line (@array) {
         $count += s/\b$oldline\b/$spliced/eg for @array;
         # ... more code here
    }

    Another point to consider with this block of code is that the
        $count += s/\b$oldline\b/$spliced/eg for @array;
    statement is executed for each and every element of the
        foreach $line (@array) { ... }
    loop, but unless something tricky is going on in the  # ... more code here section, every execution of
        $count += s/.../.../eg for @array;
    after the first will have nothing to do: every substitution will have already been made on the first execution, i.e., with the processing of the very first element of the outer loop.


    Give a man a fish:  <%-{-{-{-<

      You have pointed out well that I drafted that sample code too quickly. I have used similar code in other projects, but in this project I had in mind to do the substitution for the entire array…and that caused the Freudian? slip with that. I did not mean to structure the nested loop to act on the entire array with each substitution. It should have looked more like:

      @array = @array2; @array2 = (); foreach $line (@array) { $count = $line =~ s/\b$oldline\b/$spliced/eg; $totalcount += $count; push @array2, $line; }

      I had assumed that the variable on the replacement side would need to be evaluated, but it appears you are correct and the "e" is unnecessary. Removing that has sped up the script by about 3%. I learned something. Thank you!

Re^3: How to count substitutions on an array
by Marshall (Canon) on Aug 14, 2016 at 14:55 UTC
    The OP (Original Post) showed 2 lines of code. Neither one of which tallied the number of substitutions properly. My examples showed how to tally that for an individual line in very explict terms as I thought that was the "problem".

    Both of your OP's example lines contain a "foreach" loop. "for" is just a shorthand for "foreach". "map" is a kind of a foreach loop. My post and Limbic~Region's are similar in advice. "A foreach is fine", means that you save nothing by "disguising" the foreach by writing it on a single line. The "foreach" is still there.

    Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Limbic~Region's "for" statement IS a "foreach loop".

    The OP didn't mention anything about running 10K regex'es on 30K lines! Although, it sounds like from subsequent posts that you have a solution that meets your needs in terms of performance. I don't really understand your application, but if this is some type of word for word substitution situation, a hash based approach would be faster. But that is mute if you are happy with what you have.