in reply to using File::Find to grep for text

I don't understand how you claim that this works?:
## see what we got use File::Find; # Get $dirname from first command-line argument my $dirname = $parent; find( \&do_process, $dirname ); my ( $a, $b ); sub do_process { if ( -r $_ ) { my $file_name = $_; open( my $fh, '<', $file_name ); # Use three-arg open! while (<$fh>) { chomp(); if (/\btournament winner is 1.gonzaga\b/i) { $a = "$file_name: +$_"; } if (/\btournament winner is 2.miSTb/i) { $b = "$file_name: +$_"; } } } }
Please see: perl doc File::Find
File::Find will traverse starting down from a list of directories, calling do_process() for each file name encountered. Note that links and directories are just special kinds of files as far as the directory system is concerned. This -r test looks meaningless to me in a normal sense. Perhaps you mean -f? for example:
find( \&do_process, ($dirname) ); sub do_process { my $file_name = File::Fine::name; #there is no $_ or @_ context her +e .... }
To set $a or $b repeatedly with a different value makes no sense. You need a more complex data structure to save all values. BTW: I would not use $a or $b as scalar values in a user script. These names have special meanings to Perl like in sort() and other places. I would not do anything complicated at all in a do_process() subroutine. Minimize the chance of a "blow up". File::Find does cd's to traverse the directory tree. If you "blow up" at some random point, you are in a different directory than where you started from and that can cause various problems.

Read the FAQ and then try again showing a simplified example.

This is good:  open( my $fh, '<', $file_name );    # Use three-arg open! This is better:
open( my $fh, '<', $file_name ) or die "Can't open $file_name for read $!";

Update:
I would make this change:
Don't use an O/S specific command where there is no need for it. I'm running Windows and your program bombed because of this. Also, system() will launch another process (an "expensive" cpu thing) where there is no need for that either (slows things down).

say "-------system out---------"; #system("cat $out_file"); open my $fh, '<', $out_file or die "can't print $out_file! $!"; print while (<$fh>); close $fh; say "----------------";

I did get your script to run. It creates files underneath a "my_data" directory.
There is no need for File::Find
You have a single directory of directories:
For example:

print "OUTFILES*******************\n"; foreach my $directory (glob "my_data/*") { print "Directory: $directory\n"; foreach my $file (glob "$directory/*") { print " File=$file\n"; } }
=prints
OUTFILES******************* Directory: my_data/03-04-2019-22-09-24 File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.1.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.13.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.17.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.21.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.25.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.29.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.33.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.37.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.41.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.45.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.49.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.5.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.53.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.57.txt File=my_data/03-04-2019-22-09-24/03-04-2019-22-09-24.9.txt Directory: my_data/03-04-2019-22-14-09 File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.1.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.13.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.17.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.21.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.25.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.29.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.33.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.37.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.41.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.45.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.49.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.5.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.53.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.57.txt File=my_data/03-04-2019-22-14-09/03-04-2019-22-14-09.9.txt Directory: my_data/03-04-2019-22-14-55 File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.1.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.13.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.17.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.21.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.25.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.29.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.33.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.37.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.41.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.45.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.49.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.5.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.53.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.57.txt File=my_data/03-04-2019-22-14-55/03-04-2019-22-14-55.9.txt Directory: my_data/03-04-2019-22-16-53 File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.1.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.13.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.17.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.21.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.25.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.29.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.33.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.37.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.41.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.45.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.49.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.5.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.53.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.57.txt File=my_data/03-04-2019-22-16-53/03-04-2019-22-16-53.9.txt Directory: my_data/03-04-2019-22-20-00 File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.1.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.13.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.17.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.21.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.25.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.29.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.33.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.37.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.41.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.45.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.49.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.5.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.53.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.57.txt File=my_data/03-04-2019-22-20-00/03-04-2019-22-20-00.9.txt Directory: my_data/03-04-2019-22-21-01 File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.1.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.13.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.17.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.21.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.25.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.29.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.33.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.37.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.41.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.45.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.49.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.5.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.53.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.57.txt File=my_data/03-04-2019-22-21-01/03-04-2019-22-21-01.9.txt Directory: my_data/03-04-2019-22-24-54 File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.1.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.13.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.17.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.21.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.25.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.29.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.33.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.37.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.41.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.45.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.49.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.5.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.53.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.57.txt File=my_data/03-04-2019-22-24-54/03-04-2019-22-24-54.9.txt Directory: my_data/03-04-2019-22-26-59 File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.1.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.13.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.17.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.21.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.25.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.29.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.33.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.37.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.41.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.45.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.49.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.5.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.53.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.57.txt File=my_data/03-04-2019-22-26-59/03-04-2019-22-26-59.9.txt Directory: my_data/03-04-2019-22-28-51 File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.1.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.13.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.17.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.21.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.25.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.29.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.33.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.37.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.41.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.45.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.49.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.5.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.53.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.57.txt File=my_data/03-04-2019-22-28-51/03-04-2019-22-28-51.9.txt =cut
PS: Why this UTF-8 stuff? I don't see the need for that complication.

Replies are listed 'Best First'.
Re^2: using File::Find to grep for text
by Aldebaran (Curate) on Apr 04, 2019 at 06:48 UTC
    There is no need for File::Find

    That part is becoming clear. Now that I've seen how you treated this, I used your treatment to get closer to what I want to do here. The part I'm changing out begins with this comment:

    ## see what we got my $phrase = "round is 4"; my @sought = qw ( 2.miST 3.texTech 1.va 5.auburn); print "OUTFILES*******************\n"; foreach my $filename ( glob("./my_data/$first_second/*") ) { open my $fh, '<', $filename or die "can't print $filename! $!"; while ( my $line = <$fh> ) { if ( $line =~ m/$phrase/g ) { say "filename is $filename"; } } close $fh; }

    So, I'd like to assemble stats on "who made the final four?" As we see here at official ncaa link, there are only 4 teams left, corresponding to round 4 of the tourney. Indeed, they are enumerated in the @sought variable that I assign in the new script.

    Again, I find myself looking for some elbow grease in dealing with these data:

    round is 4 3.lsu 3.texTech 2.tn 1.nc finals are 3.lsu 1.nc tournament winner is 1.nc ---------------- OUTFILES******************* filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.1.txt filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.13.txt filename is ./my_data/03-04-2019-23-07-21/03-04-2019-23-07-21.17.txt ... $

    So, I would like this to match on 3.texTech and report that it matched on one correct team for the final four. The order will always be the same in that 2.miST will be only on the line that follows "round is 4", 3.texTech on the next, 1.va the next, and 5.auburn on the line before the line that starts with "tournament". If I have 15 trials, which one got the most correct in the final four?

    Why this UTF-8 stuff?

    No good reason other than that I'm used to it. (I don't know what to cut out without having the wheels fall off.) Thanks for your comments,

      Still not quite sure about this, but consider this for further improvement:
      #!/usr/bin/perl use strict; use warnings; use Data::Dump qw(pp); $|=1; #turn off stdout buffering foreach my $directory (glob "my_data/*") { print "Directory: $directory\n"; my %best4; foreach my $file (glob "$directory/*.txt") { open(my $fh, "<", "$file") or die "Can't open < $file: $!"; while (<$fh>) { statsRound4($fh,\%best4) if /^round is 4/; } close $fh; } dumpBest4(\%best4); #pp \%best4; #uncomment this to see what it does - good tool } sub statsRound4 #add next 4 team lines to stat table #all of these guys made it to Round 4 { my ($fh, $hash_ref) = @_; for (1..4) { my $team_round4 = <$fh>; chomp $team_round4; $hash_ref->{$team_round4}++; } } sub dumpBest4 #print highest keys/values sorted by descending value { my $hash_ref = shift; my @top_teams = sort{my $myA = $hash_ref->{$a}; my $myB = $hash_ref->{$b}; $myB <=> $myA }keys %$hash_ref; foreach my $team (@top_teams[0..3]) { print "$team\t $hash_ref->{$team}\n"; } } __END__ Directory: my_data/03-04-2019-22-09-24 1.va 12 1.duke 12 1.gonzaga 8 1.nc 4 Directory: my_data/03-04-2019-22-14-09 1.duke 9 1.gonzaga 9 1.nc 8 1.va 7 Directory: my_data/03-04-2019-22-14-55 1.gonzaga 10 1.va 7 1.duke 7 1.nc 7 Directory: my_data/03-04-2019-22-16-53 1.va 8 1.nc 8 1.gonzaga 8 2.ky 5 Directory: my_data/03-04-2019-22-20-00 1.gonzaga 9 1.va 9 1.duke 8 1.nc 7 Directory: my_data/03-04-2019-22-21-01 1.nc 7 1.gonzaga 7 3.lsu 6 1.duke 6 Directory: my_data/03-04-2019-22-24-54 1.duke 9 1.va 7 1.nc 7 1.gonzaga 7 Directory: my_data/03-04-2019-22-26-59 1.va 11 1.duke 9 1.nc 8 3.texTech 5 Directory: my_data/03-04-2019-22-28-51 1.duke 12 2.ky 6 1.va 6 1.gonzaga 5
        Still not quite sure about this, but consider this for further improvement:

        Thanks, Marshall, this puts me on the right track. When I uncomment so as to see %best4, I see what I'd want to look at to compare data. I think my number one's have too much weight on them now. It wouldn't be hard to make the probabilities closer to a coin flip.

        In order to follow up on this, I'd have to do some comparing with other years. So far, I've only come upon one other bracket: wayback machine for 2006 bracket

        I might have more time for this, but I'm still hooping it up, trying to reclaim my former glory.

        Directory: my_data/06-04-2019-14-41-34 1.gonzaga 10 1.va 8 1.nc 7 2.ky 6 { "1.duke" => 6, "1.gonzaga" => 10, "1.nc" => 7, "1.va" => 8, "2.ky" => 6, "2.mi" => 2, "2.miST" => 4, "2.tn" => 1, "3.houston" => 1, "3.lsu" => 3, "3.purdue" => 2, "3.texTech" => 1, "4.flaST" => 1, "4.ksST" => 1, "4.vaTech" => 1, "5.marquette" => 1, "5.msST" => 1, "6.iowaST" => 1, "6.nova" => 1, "7.cincy" => 1, "9.ok" => 1, } $

        Auburn versus Virginia just began, so we have it queueing, and I will live in a world bereft of sports news until I get to see it. DVR and perl are great things that happened in the digital world....