in reply to How to count substitutions on an array

Consider this to count the number of substitutions:
#!/usr/bin/perl use warnings; use strict; my $x = "xyzzyblahxyzzymoreBLahxyzzyasdfouyXYZZY"; print "Input = $x\n"; my $count = $x =~ s/xyzzy//gi; print "count = $count\n"; print "result = $x\n\n"; $x = "blahxyzzy"; print "Input = $x\n"; $count = $x =~ s/xyZzy//gi; print "count = $count\n"; print "result = $x\n"; __END__ PRINTS: Input = xyzzyblahxyzzymoreBLahxyzzyasdfouyXYZZY count = 4 result = blahmoreBLahasdfouy Input = blahxyzzy count = 1 result = blah
A foreach loop over each array element is fine.
Do not mistake fewer source lines with more efficiency.

Replies are listed 'Best First'.
Re^2: How to count substitutions on an array
by Anonymous Monk on Aug 13, 2016 at 05:09 UTC

    Marshall,

     

    Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Consider the following solution concept, which is what I had before, that I don't think is nearly so fast (though I didn't time it precisely).

    foreach $replace (@substitutionlist) { ($oldline, $newline) = split(/\t/, $replace); $newline = s/\s/_SPACE_/g; #….more code here foreach $line (@array) { $count += s/\b$oldline\b/$spliced/eg for @array; #….more code here } } s/_SPACE_/ /g for @array;

    Consider that @substitutionlist has over 10,000 lines, and @array has over 30,000, with many lines requiring multiple substitution replacements on a single line. Is the foreach setup outlined above really just as efficient?

      $count += s/\b$oldline\b/$spliced/eg for @array;

      A small point: I don't understand why you're using the  /e modifier in this substitution. It "works" (i.e., produces the same result) just as well with it as without, but if you're concerned about speed, I don't see how firing up the interpreter for each and every substitution is going to help. Is  $spliced just a placeholder for significant code that you don't want to show?


      Give a man a fish:  <%-{-{-{-<

      foreach $line (@array) {
           $count += s/\b$oldline\b/$spliced/eg for @array;
           # ... more code here
      }

      Another point to consider with this block of code is that the
          $count += s/\b$oldline\b/$spliced/eg for @array;
      statement is executed for each and every element of the
          foreach $line (@array) { ... }
      loop, but unless something tricky is going on in the  # ... more code here section, every execution of
          $count += s/.../.../eg for @array;
      after the first will have nothing to do: every substitution will have already been made on the first execution, i.e., with the processing of the very first element of the outer loop.


      Give a man a fish:  <%-{-{-{-<

        You have pointed out well that I drafted that sample code too quickly. I have used similar code in other projects, but in this project I had in mind to do the substitution for the entire array…and that caused the Freudian? slip with that. I did not mean to structure the nested loop to act on the entire array with each substitution. It should have looked more like:

        @array = @array2; @array2 = (); foreach $line (@array) { $count = $line =~ s/\b$oldline\b/$spliced/eg; $totalcount += $count; push @array2, $line; }

        I had assumed that the variable on the replacement side would need to be evaluated, but it appears you are correct and the "e" is unnecessary. Removing that has sped up the script by about 3%. I learned something. Thank you!

      The OP (Original Post) showed 2 lines of code. Neither one of which tallied the number of substitutions properly. My examples showed how to tally that for an individual line in very explict terms as I thought that was the "problem".

      Both of your OP's example lines contain a "foreach" loop. "for" is just a shorthand for "foreach". "map" is a kind of a foreach loop. My post and Limbic~Region's are similar in advice. "A foreach is fine", means that you save nothing by "disguising" the foreach by writing it on a single line. The "foreach" is still there.

      Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Limbic~Region's "for" statement IS a "foreach loop".

      The OP didn't mention anything about running 10K regex'es on 30K lines! Although, it sounds like from subsequent posts that you have a solution that meets your needs in terms of performance. I don't really understand your application, but if this is some type of word for word substitution situation, a hash based approach would be faster. But that is mute if you are happy with what you have.