The output I get from running this latest code is different from yours. I don't think your output is right at all - look at the dates. Here's what I get:
1..2 not ok 1 - 2017-09-04 tally correct # Failed test '2017-09-04 tally correct' # at 1202625.pl line 53. # got: '1' # expected: '4' not ok 2 - 2017-09-30 tally correct # Failed test '2017-09-30 tally correct' # at 1202625.pl line 54. # got: undef # expected: '2' # Looks like you failed 2 tests of 2.
Given that it worked fine in your examples, I think it is likely that I'm making a basic mistake or didn't convey something important about my data in my original post.
It's the latter. In this new data set you have multiple instances of the double-hash-delimited strings in each hash value. Your code is only checking for the first such one in each value, hence the numbers I see in my output here.
TIMTOWTDI for how to fix this but the simplest is a loop. This will work fine with your existing regular expressions but I've cleaned those up as well as an illustration of how to simplify them.
use strict; use warnings; use Test::More tests => 2; my %mycorpus = ( a => "<p><time datetime=2017-09-04T05:23:39Z>04/09/17 06:23: +39</time> Irrelevant text that may feature the word soft, so +ftest, or softly. ar##*whispers softly* don\'t## ##very soft## ##the softest even## <97> 164 notes", b => "p><time datetime=2017-09-30T18:20:56Z>30/09/17 19:20 +:56</time> Irrelevant text that may feature the word soft, softest, o +r softly. 4r##skam## rr##isak valtersen## rr##even bech nęsheim## dr##god## r##they're so soft## sr##my heart is bursting## ##This is the softest## <97> 379 notes Irrelevant text that may feature the word soft, softest, o +r softly.", c => "<p><time datetime=2017-09-04T05:27:03Z>04/09/17 06:2 +7:03</time> ##SKSNSKXBXKXND## r##I LOVE THESE## ##such soft boyfriend<99>## ##you're my sunshine## <97> 180 notes Irrelevant text that may feature the word soft, softest, or softly." ); my %counts; foreach my $filename (sort keys %mycorpus) { my $date; my $hashtags = ''; if ($mycorpus{$filename} =~ /(\d{4}-\d{2}-\d{2})T/g){ $date = $1; } while ($mycorpus{$filename} =~ /##(.*)##/g){ $hashtags = $1; if (my $matches =()= $hashtags =~ /\bsoft/gi){ $counts{$date} += $matches; } } } is ($counts{'2017-09-04'}, 4, "2017-09-04 tally correct"); is ($counts{'2017-09-30'}, 2, "2017-09-30 tally correct");
In reply to Re^5: Counting instances of a string in certain sections of files within a hash
by hippo
in thread Counting instances of a string in certain sections of files within a hash
by Maire
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |