Thank you, I really appreciate your help with this!
I was making a mistake with the code that I demonstrated in UPDATE 2 above. The second part of the script was (I think) attempting to print out the regex matches before all of the data had been "dumped" out of the hash and into alldata.txt. When I ran the second part of the code separate from the first, it successfully matched all of the data it was supposed to, demonstrating (I think) that the regex is not the problem here, either. Sorry for wasting your time with that: I should have double checked that my code was right before posting!
I am, however, still having trouble getting the main code to count the instances of "soft" per day. I'm using the corrections that you very kindly made to my original script -- the only things that I've changed is that I've substituted your examples for my own data and I've also taken the lookarounds out of the regex, following your and kcott's advice:
This script produces the following output:use strict; use warnings; use Test::More tests => 2; my %mycorpus = ( a => "<p><time datetime=2017-09-04T05:23:39Z>04/09/17 06:23: +39</time> Irrelevant text that may feature the word soft, +softest, or softly. ar##*whispers softly* don\'t## ##very soft## ##the softest even## — 164 notes", b => "p><time datetime=2017-09-30T18:20:56Z>30/09/17 19:20 +:56</time> Irrelevant text that may feature the word soft, softest, o +r softly. 4r##skam## rr##isak valtersen## rr##even bech næsheim## dr##god## r##they're so soft## sr##my heart is bursting## ##This is the softest## — 379 notes Irrelevant text that may feature the word soft, softest, +or softly.", c => "<p><time datetime=2017-09-04T05:27:03Z>04/09/17 06:2 +7:03</time> ##SKSNSKXBXKXND## r##I LOVE THESE## ##such soft boyfriend™## ##you're my sunshine## — 180 notes Irrelevant text that may feature the word soft, softest, or softly." ); my %counts; foreach my $filename (sort keys %mycorpus) { my $date; my $hashtags = ''; if ($mycorpus{$filename} =~ /(?<==)(\d{4}-\d{2}-\d{2})(?=T)/g) +{ $date = $1; } if ($mycorpus{$filename} =~ /[#][#](.*)[#][#]/g){ $hashtags = $1; } if (my $matches =()= $hashtags =~ /\bsoft/gi){ $counts{$date} += $matches; } } is ($counts{'2017-09-04'}, 4, "2017-09-04 tally correct"); is ($counts{'2017-09-30'}, 2, "2017-09-30 tally correct");
1..2 not ok 1 - 2017-09-03 tally correct # Failed test '2017-09-03 tally correct' # at C:\Users\li\test18.pl line 52. # got: undef # expected: '4' not ok 2 - 2017-09-04 tally correct # Failed test '2017-09-04 tally correct' # at C:\Users\li\test18.pl line 53. # got: '1' # expected: '2' # Looks like you failed 2 tests of 2.
If it makes any difference, I think that it is the very first instance of "soft" (in the line "ar##*whispers softly* don\'t##" that it actually captures.
Given that it worked fine in your examples, I think it is likely that I'm making a basic mistake or didn't convey something important about my data in my original post.
In reply to Re^4: Counting instances of a string in certain sections of files within a hash
by Maire
in thread Counting instances of a string in certain sections of files within a hash
by Maire
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |