I've rewritten this script dozens of times, but I can't seem to spot my mistake
There's no SSCCE in there. Creating one shows the mistake quite clearly:
use strict; use warnings; use Test::More tests => 2; my %mycorpus = ( a => "<time datetime=2017-09-03T23:17:53Z> blah blah ##soft and softly is as softly does## bar", b => "<time datetime=2017-09-03T23:17:53Z> blah blah ##Not so SOFT now, eh?## foo", c => "<time datetime=2017-09-04T23:17:53Z> blah ##Mr. Soft in the soft-play area## baz" ); my %counts; foreach my $filename (sort keys %mycorpus) { my $date; my $hashtags = ''; if ($mycorpus{$filename} =~ /(?<==)(\d{4}-\d{2}-\d{2})(?=T)/g) +{ $date = $1; } if ($mycorpus{$filename} =~ /(?<=##)(.*)(?=##)/g){ $hashtags = $1; } if ($hashtags =~ /\bsoft/gi){ $counts{$date}++; } } is ($counts{'2017-09-03'}, 4, "2017-09-03 tally correct"); is ($counts{'2017-09-04'}, 2, "2017-09-04 tally correct");
Your assignment to $counts{$date} is wrong - it only adds one count irrespective of how many matches there are in that line/file. Here's the fixed version:
use strict; use warnings; use Test::More tests => 2; my %mycorpus = ( a => "<time datetime=2017-09-03T23:17:53Z> blah blah ##soft and softly is as softly does## bar", b => "<time datetime=2017-09-03T23:17:53Z> blah blah ##Not so SOFT now, eh?## foo", c => "<time datetime=2017-09-04T23:17:53Z> blah ##Mr. Soft in the soft-play area## baz" ); my %counts; foreach my $filename (sort keys %mycorpus) { my $date; my $hashtags = ''; if ($mycorpus{$filename} =~ /(?<==)(\d{4}-\d{2}-\d{2})(?=T)/g) +{ $date = $1; } if ($mycorpus{$filename} =~ /(?<=##)(.*)(?=##)/g){ $hashtags = $1; } if (my $matches =()= $hashtags =~ /\bsoft/gi){ $counts{$date} += $matches; } } is ($counts{'2017-09-03'}, 4, "2017-09-03 tally correct"); is ($counts{'2017-09-04'}, 2, "2017-09-04 tally correct");
In reply to Re: Counting instances of a string in certain sections of files within a hash
by hippo
in thread Counting instances of a string in certain sections of files within a hash
by Maire
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |