Griegomas has asked for the wisdom of the Perl Monks concerning the following question:

Thanks for all the help guys, I really appreciate it. I have gotten the script to work almost entirely except for one problem; it seems to be interpreting the . at the beginning of each extension as any character. This results in an extension like .h finding all files that end with h. I believe the solution is to use glob pattern matching but I am unsure how to incorporate that into my find statements. Here is my updated (and better indented) code:

#!/usr/bin/perl use warnings; use File::Find; sub countThem { $dir = '.'; $filecount=0; $ext1=$_[2]; find(sub{$filecount++ if $File::Find::name =~ /$ext1$/}, $dir); $filebytes=0; if ($filecount > 0 ){ #`find . -name "*$_[2]" -print`; my @f; find ( sub { return unless /$ext1$/; push @f, $File::Find::name; },$dir); chomp(@f); foreach $a (@f){ $fbytes=`cat $a | wc -c`; $filebytes=$filebytes+$fbytes; }} $_[0]=$filecount; $_[1]=$filebytes; } foreach $ext (@ARGV){ $tmpfilecount=0; $tmpfilebytes=0; countThem ($tmpfilecount, $tmpfilebytes, $ext); if ( $tmpfilecount > 0 ){ print STDOUT ("EXTENSION $ext, FILE COUNT: $tmpfilecount, FILE + CHARS: $\ tmpfilebytes\n"); } }

any ideas?

  • Comment on Help needed with Perl script designed to find files by extension and count the number of chars
  • Download Code

Replies are listed 'Best First'.
Re: Help needed with Perl script designed to find files by extension and count the number of chars
by toolic (Bishop) on Apr 30, 2015 at 12:35 UTC

    Tip #2 from the Basic debugging checklist: print. The reason you get no output is that $tmpfilecount is 0. $tmpfilecount is 0 because you set it to 0 in your foreach loop, then you never change its value. The only thing you do with the variable is pass it as an input to your countThem sub.

      Hello toolic,

      The only thing you do with the variable is pass it as an input to your countThem sub.

      Yes, but the variable $tmpfilecount is a package global (no my in sight, as you note), and within sub countThem the variable $_[0] is an alias to the variable passed as the first argument. Therefore the line:

      $_[0]=$filecount;

      within the sub actually should change $tmpfilecount to the value of $filecount. Proof of concept:

      0:42 >perl -wE "sub f { ++$_[0] } $x = 42; f($x); say $x;" 43 0:43 >

      So the problem is that $filecount is never incremented.

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Help needed with Perl script designed to find files by extension and count the number of chars
by Athanasius (Archbishop) on Apr 30, 2015 at 15:18 UTC

    Hello Griegomas, and welcome to the Monastery!

    I have a hunch it is the "" around ext1 in my find statements...

    Yes, exactly. Suppose the value of $ext1 is txt. Then the regular expression /\"$ext1"$/ will match a filename such as example."txt" but not a filename such as example.txt.

    ...but the first $ was causing problems without it and I wasn't sure how else to resolve the issue.

    There shouldn’t be any problem:

    1:11 >perl -wE "my $ext = 'txt'; my $file = 'example.txt'; say 'match +es' if $file =~ /$ext$/;" matches 1:12 >

    Regular expressions interpolate (except when single quotes are used as the delimiters) — see perlop#Quote-and-Quote-like-Operators. Please explain the problem you were seeing (give details).

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks Anthanasius, that makes sense to me. This is the error I was seeing: Name "main::ext1" used only once: possible typo at ./perl4.pl line 8.

Re: Help needed with Perl script designed to find files by extension and count the number of chars
by jeffa (Bishop) on Apr 30, 2015 at 17:06 UTC

    Others have answered what was wrong, i wanted to show you a completely better way to solve the problem:

    use strict; use warnings; use File::Find::Rule; my @found = File::Find::Rule ->file() ->name( map "*$_", @ARGV ) ->in( '.' ) ; print scalar( @found ), $/;
    I covered the first requirement, all you need is a loop to process the char count from the files found. But ... why do you need this information? Wouldn't du -h provide the information you need without having to develop another solution?

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      The memory efficient version

      #!/usr/bin/perl -- use strict; use warnings; use File::Find::Rule qw/ find rule /; my $count = 0; my $size = 0; rule( file => name => [ map {"*$_"} @ARGV ], exec => sub { ## my( $shortname, $path, $fullname ) = @_; $size += -s _; ## or use $_ $count++; return !!0; ## means discard filename }, )->in('.');
Re: Help needed with Perl script designed to find files by extension and count the number of chars
by Athanasius (Archbishop) on May 01, 2015 at 02:24 UTC

    Hello again Griegomas,

    I see you have updated your post to incorporate some of the advice given (but still no strict). In itself, that’s good. However, your re-write of the OP has the unfortunate effect of removing context from the following replies, which makes it hard for monks to follow the thread. For small changes, you can use <strike>...</strike> tags, but for large-scale changes such as this, it would be better to put the update (clearly marked as such) at the foot of the original code. Better still would be to start a new node, since your update includes a new question.

    This results in an extension like .h finding all files that end with h.

    That’s because the . has a special meaning within a regular expression. To remove that special meaning — in other words, to get whatever text is in $ARGV[0] to be treated literally — you need to enclose the text within the escape sequence \Q ... \E:

    12:04 >perl -we "my @files = qw(yes.h noh); for (@files) { printf qq[% +s: %s\n], $_, /$ARGV[0]$/ ? 'match' : 'no match'; }" .h yes.h: match noh: match 12:07 >perl -we "my @files = qw(yes.h noh); for (@files) { printf qq[% +s: %s\n], $_, /\Q$ARGV[0]\E$/ ? 'match' : 'no match'; }" .h yes.h: match noh: no match 12:07 >

    See More on characters, strings, and character classes in perlretut.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Help needed with Perl script designed to find files by extension and count the number of chars
by Laurent_R (Canon) on Apr 30, 2015 at 17:44 UTC
    Hi Griegomas, this does not relate directly to your question, but I would really suggest that you indent your code much more consistently. The way the code is laid-out right now is almost unreadable, or at least very difficult to follow both for us, poor monks, and for yourself. I can guarantee that you will save a lot of debugging time by taking the little time that is required to format your code properly.

    If needed you could use a code prettifier, such as Perltidy, a Perl script which indents and reformats Perl scripts to make them easier to read.

    Je suis Charlie.