Re^2: How to count substitutions on an array

Marshall,

Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Consider the following solution concept, which is what I had before, that I don't think is nearly so fast (though I didn't time it precisely).

foreach $replace (@substitutionlist) {

     ($oldline, $newline) = split(/\t/, $replace);
     $newline = s/\s/_SPACE_/g;
    #….more code here

     foreach $line (@array) {
          $count += s/\b$oldline\b/$spliced/eg for @array;
          #….more code here
     }
}
s/_SPACE_/ /g for @array;
[download]

Consider that @substitutionlist has over 10,000 lines, and @array has over 30,000, with many lines requiring multiple substitution replacements on a single line. Is the foreach setup outlined above really just as efficient?

Comment on Re^2: How to count substitutions on an array Download Code

Replies are listed 'Best First'.
Re^3: How to count substitutions on an array by AnomalousMonk (Archbishop) on Aug 13, 2016 at 14:29 UTC
`$count += s/\b$oldline\b/$spliced/eg for @array;` A small point: I don't understand why you're using the `/e` modifier in this substitution. It "works" (i.e., produces the same result) just as well with it as without, but if you're concerned about speed, I don't see how firing up the interpreter for each and every substitution is going to help. Is `$spliced` just a placeholder for significant code that you don't want to show? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: How to count substitutions on an array by AnomalousMonk (Archbishop) on Aug 13, 2016 at 18:23 UTC
`foreach $line (@array) {` `$count += s/\b$oldline\b/$spliced/eg for @array;` `# ... more code here` `}` Another point to consider with this block of code is that the `$count += s/\b$oldline\b/$spliced/eg for @array;` statement is executed for each and every element of the `foreach $line (@array) { ... }` loop, but unless something tricky is going on in the `# ... more code here` section, every execution of `$count += s/.../.../eg for @array;` after the first will have nothing to do: every substitution will have already been made on the first execution, i.e., with the processing of the very first element of the outer loop. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: How to count substitutions on an array by Anonymous Monk on Aug 13, 2016 at 19:11 UTC
You have pointed out well that I drafted that sample code too quickly. I have used similar code in other projects, but in this project I had in mind to do the substitution for the entire array…and that caused the Freudian? slip with that. I did not mean to structure the nested loop to act on the entire array with each substitution. It should have looked more like: `@array = @array2; @array2 = (); foreach $line (@array) { $count = $line =~ s/\b$oldline\b/$spliced/eg; $totalcount += $count; push @array2, $line; }` [download] I had assumed that the variable on the replacement side would need to be evaluated, but it appears you are correct and the "e" is unnecessary. Removing that has sped up the script by about 3%. I learned something. Thank you!	[reply] [d/l]
Re^3: How to count substitutions on an array by Marshall (Canon) on Aug 14, 2016 at 14:55 UTC
The OP (Original Post) showed 2 lines of code. Neither one of which tallied the number of substitutions properly. My examples showed how to tally that for an individual line in very explict terms as I thought that was the "problem". Both of your OP's example lines contain a "foreach" loop. "for" is just a shorthand for "foreach". "map" is a kind of a foreach loop. My post and Limbic~Region's are similar in advice. "A foreach is fine", means that you save nothing by "disguising" the foreach by writing it on a single line. The "foreach" is still there. Are you saying that a foreach loop would be just as fast as what Limbic-Region was suggesting for an array? Limbic~Region's "for" statement IS a "foreach loop". The OP didn't mention anything about running 10K regex'es on 30K lines! Although, it sounds like from subsequent posts that you have a solution that meets your needs in terms of performance. I don't really understand your application, but if this is some type of word for word substitution situation, a hash based approach would be faster. But that is mute if you are happy with what you have.	[reply]