Thank you, I really appreciate your help with this!

I was making a mistake with the code that I demonstrated in UPDATE 2 above. The second part of the script was (I think) attempting to print out the regex matches before all of the data had been "dumped" out of the hash and into alldata.txt. When I ran the second part of the code separate from the first, it successfully matched all of the data it was supposed to, demonstrating (I think) that the regex is not the problem here, either. Sorry for wasting your time with that: I should have double checked that my code was right before posting!

I am, however, still having trouble getting the main code to count the instances of "soft" per day. I'm using the corrections that you very kindly made to my original script -- the only things that I've changed is that I've substituted your examples for my own data and I've also taken the lookarounds out of the regex, following your and kcott's advice:

use strict; use warnings; use Test::More tests => 2; my %mycorpus = ( a => "<p><time datetime=2017-09-04T05:23:39Z>04/09/17 06:23: +39</time> Irrelevant text that may feature the word soft, +softest, or softly. ar##*whispers softly* don\'t## ##very soft## ##the softest even## — 164 notes", b => "p><time datetime=2017-09-30T18:20:56Z>30/09/17 19:20 +:56</time> Irrelevant text that may feature the word soft, softest, o +r softly. 4r##skam## rr##isak valtersen## rr##even bech næsheim## dr##god## r##they're so soft## sr##my heart is bursting## ##This is the softest## — 379 notes Irrelevant text that may feature the word soft, softest, +or softly.", c => "<p><time datetime=2017-09-04T05:27:03Z>04/09/17 06:2 +7:03</time> ##SKSNSKXBXKXND## r##I LOVE THESE## ##such soft boyfriend™## ##you're my sunshine## — 180 notes Irrelevant text that may feature the word soft, softest, or softly." ); my %counts; foreach my $filename (sort keys %mycorpus) { my $date; my $hashtags = ''; if ($mycorpus{$filename} =~ /(?<==)(\d{4}-\d{2}-\d{2})(?=T)/g) +{ $date = $1; } if ($mycorpus{$filename} =~ /[#][#](.*)[#][#]/g){ $hashtags = $1; } if (my $matches =()= $hashtags =~ /\bsoft/gi){ $counts{$date} += $matches; } } is ($counts{'2017-09-04'}, 4, "2017-09-04 tally correct"); is ($counts{'2017-09-30'}, 2, "2017-09-30 tally correct");
This script produces the following output:
1..2 not ok 1 - 2017-09-03 tally correct # Failed test '2017-09-03 tally correct' # at C:\Users\li\test18.pl line 52. # got: undef # expected: '4' not ok 2 - 2017-09-04 tally correct # Failed test '2017-09-04 tally correct' # at C:\Users\li\test18.pl line 53. # got: '1' # expected: '2' # Looks like you failed 2 tests of 2.

If it makes any difference, I think that it is the very first instance of "soft" (in the line "ar##*whispers softly* don\'t##" that it actually captures.

Given that it worked fine in your examples, I think it is likely that I'm making a basic mistake or didn't convey something important about my data in my original post.


In reply to Re^4: Counting instances of a string in certain sections of files within a hash by Maire
in thread Counting instances of a string in certain sections of files within a hash by Maire

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.