comment on

The output I get from running this latest code is different from yours. I don't think your output is right at all - look at the dates. Here's what I get:

1..2
not ok 1 - 2017-09-04 tally correct
#   Failed test '2017-09-04 tally correct'
#   at 1202625.pl line 53.
#          got: '1'
#     expected: '4'
not ok 2 - 2017-09-30 tally correct
#   Failed test '2017-09-30 tally correct'
#   at 1202625.pl line 54.
#          got: undef
#     expected: '2'
# Looks like you failed 2 tests of 2.
[download]

Given that it worked fine in your examples, I think it is likely that I'm making a basic mistake or didn't convey something important about my data in my original post.

It's the latter. In this new data set you have multiple instances of the double-hash-delimited strings in each hash value. Your code is only checking for the first such one in each value, hence the numbers I see in my output here.

TIMTOWTDI for how to fix this but the simplest is a loop. This will work fine with your existing regular expressions but I've cleaned those up as well as an illustration of how to simplify them.

use strict;
use warnings;
use Test::More tests => 2;

my %mycorpus = (
          a => "<p><time datetime=2017-09-04T05:23:39Z>04/09/17 06:23:
+39</time>
                    Irrelevant text that may feature the word soft, so
+ftest, or softly.
ar##*whispers softly* don\'t##
##very soft##
##the softest even##
 <97> 164 notes",

            b => "p><time datetime=2017-09-30T18:20:56Z>30/09/17 19:20
+:56</time>
            Irrelevant text that may feature the word soft, softest, o
+r softly.
4r##skam##
rr##isak valtersen##
rr##even bech næsheim##
dr##god##
r##they're so soft##
sr##my heart is bursting##
##This is the softest##
 <97> 379 notes
            Irrelevant text that may feature the word soft, softest, o
+r softly.",

            c => "<p><time datetime=2017-09-04T05:27:03Z>04/09/17 06:2
+7:03</time>
##SKSNSKXBXKXND##
r##I LOVE THESE##
##such soft boyfriend<99>##
##you're my sunshine##
 <97> 180 notes
 Irrelevant text that may feature the word soft, softest, or softly."
);

my %counts;
foreach my $filename (sort keys %mycorpus) {
        my $date;
        my $hashtags = '';

        if ($mycorpus{$filename} =~ /(\d{4}-\d{2}-\d{2})T/g){
            $date = $1;
        }

        while ($mycorpus{$filename} =~ /##(.*)##/g){
            $hashtags = $1;

            if (my $matches =()= $hashtags =~ /\bsoft/gi){
                $counts{$date} += $matches;
            }
        }
}


is ($counts{'2017-09-04'}, 4, "2017-09-04 tally correct");
is ($counts{'2017-09-30'}, 2, "2017-09-30 tally correct");
[download]

In reply to Re^5: Counting instances of a string in certain sections of files within a hash by hippo
in thread Counting instances of a string in certain sections of files within a hash by Maire

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.