in reply to Re: using File::Find to grep for text
in thread using File::Find to grep for text

There is no need for File::Find

That part is becoming clear. Now that I've seen how you treated this, I used your treatment to get closer to what I want to do here. The part I'm changing out begins with this comment:

## see what we got my $phrase = "round is 4"; my @sought = qw ( 2.miST 3.texTech 1.va 5.auburn); print "OUTFILES*******************\n"; foreach my $filename ( glob("./my_data/$first_second/*") ) { open my $fh, '<', $filename or die "can't print $filename! $!"; while ( my $line = <$fh> ) { if ( $line =~ m/$phrase/g ) { say "filename is $filename"; } } close $fh; }

So, I'd like to assemble stats on "who made the final four?" As we see here at official ncaa link, there are only 4 teams left, corresponding to round 4 of the tourney. Indeed, they are enumerated in the @sought variable that I assign in the new script.

Again, I find myself looking for some elbow grease in dealing with these data:

round is 4 3.lsu 3.texTech 2.tn 1.nc finals are 3.lsu 1.nc tournament winner is 1.nc ---------------- OUTFILES******************* filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.1.txt filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.13.txt filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.17.txt ... $

So, I would like this to match on 3.texTech and report that it matched on one correct team for the final four. The order will always be the same in that 2.miST will be only on the line that follows "round is 4", 3.texTech on the next, 1.va the next, and 5.auburn on the line before the line that starts with "tournament". If I have 15 trials, which one got the most correct in the final four?

Why this UTF-8 stuff?

No good reason other than that I'm used to it. (I don't know what to cut out without having the wheels fall off.) Thanks for your comments,

Replies are listed 'Best First'.
Re^3: using File::Find to grep for text
by Marshall (Canon) on Apr 04, 2019 at 08:37 UTC
    Still not quite sure about this, but consider this for further improvement:
    #!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); $|=1; #turn off stdout buffering foreach my $directory (glob "my_data/*") { print "Directory: $directory\n"; my %best4; foreach my $file (glob "$directory/*.txt") { open(my $fh, "<", "$file") or die "Can't open < $file: $!"; while (<$fh>) { statsRound4($fh,\%best4) if /^round is 4/; } close $fh; } dumpBest4(\%best4); #pp \%best4; #uncomment this to see what it does - good tool } sub statsRound4 #add next 4 team lines to stat table #all of these guys made it to Round 4 { my ($fh, $hash_ref) = @_; for (1..4) { my $team_round4 = <$fh>; chomp $team_round4; $hash_ref->{$team_round4}++; } } sub dumpBest4 #print highest keys/values sorted by descending value { my $hash_ref = shift; my @top_teams = sort{my $myA = $hash_ref->{$a}; my $myB = $hash_ref->{$b}; $myB <=> $myA }keys %$hash_ref; foreach my $team (@top_teams[0..3]) { print "$team\t $hash_ref->{$team}\n"; } } __END__ Directory: my_data/03-04-2019-22-09-24 1.va 12 1.duke 12 1.gonzaga 8 1.nc 4 Directory: my_data/03-04-2019-22-14-09 1.duke 9 1.gonzaga 9 1.nc 8 1.va 7 Directory: my_data/03-04-2019-22-14-55 1.gonzaga 10 1.va 7 1.duke 7 1.nc 7 Directory: my_data/03-04-2019-22-16-53 1.va 8 1.nc 8 1.gonzaga 8 2.ky 5 Directory: my_data/03-04-2019-22-20-00 1.gonzaga 9 1.va 9 1.duke 8 1.nc 7 Directory: my_data/03-04-2019-22-21-01 1.nc 7 1.gonzaga 7 3.lsu 6 1.duke 6 Directory: my_data/03-04-2019-22-24-54 1.duke 9 1.va 7 1.nc 7 1.gonzaga 7 Directory: my_data/03-04-2019-22-26-59 1.va 11 1.duke 9 1.nc 8 3.texTech 5 Directory: my_data/03-04-2019-22-28-51 1.duke 12 2.ky 6 1.va 6 1.gonzaga 5
      Still not quite sure about this, but consider this for further improvement:

      Thanks, Marshall, this puts me on the right track. When I uncomment so as to see %best4, I see what I'd want to look at to compare data. I think my number one's have too much weight on them now. It wouldn't be hard to make the probabilities closer to a coin flip.

      In order to follow up on this, I'd have to do some comparing with other years. So far, I've only come upon one other bracket: wayback machine for 2006 bracket

      I might have more time for this, but I'm still hooping it up, trying to reclaim my former glory.

      Directory: my_data/06-04-2019-14-41-34 1.gonzaga 10 1.va 8 1.nc 7 2.ky 6 { "1.duke" => 6, "1.gonzaga" => 10, "1.nc" => 7, "1.va" => 8, "2.ky" => 6, "2.mi" => 2, "2.miST" => 4, "2.tn" => 1, "3.houston" => 1, "3.lsu" => 3, "3.purdue" => 2, "3.texTech" => 1, "4.flaST" => 1, "4.ksST" => 1, "4.vaTech" => 1, "5.marquette" => 1, "5.msST" => 1, "6.iowaST" => 1, "6.nova" => 1, "7.cincy" => 1, "9.ok" => 1, } $

      Auburn versus Virginia just began, so we have it queueing, and I will live in a world bereft of sports news until I get to see it. DVR and perl are great things that happened in the digital world....