Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

grep and find file weirdness

by reasonablekeith (Deacon)
on Jun 20, 2007 at 15:49 UTC ( #622288=perlquestion: print w/replies, xml ) Need Help??

reasonablekeith has asked for the wisdom of the Perl Monks concerning the following question:

I guess the answer will end up being 'obvious', but I'm banging my head against the wall here...

I have a simple script which should list all the files in a given directory, then read them one by one, checking for dos format. That list is then printed out. My problem is, after the grep, the @files array ends up containing lines from the files I'm reading, when it originally contains the paths to the files.

I'm guessing this is because (inside the grep )$_ is an alais to the array value, so the file read is automagically smashing this, and making it back into my array.

That would be a pretty bad gotcha, and one I'd be surprised I'd not run across until now. If that's is indeed the case, could someone please suggest an alternative/ways to avoid this?

Anywho, here's the distilled example...

#!/usr/bin/perl use strict; use File::Find; my $directory = "/tmp/rja/find_test"; my @files = (); find(sub { $File::Find::name =~ m/${directory}(.*)$/; push @files, $1} +, $directory); @files = grep { is_dos_format("$directory/$_") } @files; foreach my $file (@files) { print "FAILED FILE - $file\n"; } #============================================================== sub is_dos_format { my $file_path = shift; open RJA, $file_path or die("Couldn't open file for reading ($file +_path): $!"); while (<RJA>) { if ($_ =~ m/\r\n/) { return 1; } } return; }
NB: This has a byte size on my PC of 666, make of that what you will :S
---
my name's not Keith, and I'm not reasonable.

Replies are listed 'Best First'.
Re: grep and find file weirdness
by japhy (Canon) on Jun 20, 2007 at 15:58 UTC
    Yes, the is_dos_format() is smashing $_ in your grep block. The easy fix is to either local()ize $_ in the function or use a different variable in your while loop:
    sub is_dos_format { local $_; ... } # or sub is_dos_format { ... while (my $line = <RJA>) { ... }

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: grep and find file weirdness
by blazar (Canon) on Jun 20, 2007 at 19:40 UTC
    Anywho, here's the distilled example...

    In addition to the other good suggestions you got (I hope not to repeat too many of them!) here are a few others:

    my @files = ();

    No need for the initialization and it doesn't add to clarity IMHO.

    find(sub { $File::Find::name =~ m/${directory}(.*)$/; push @files, $1} +, $directory); @files = grep { is_dos_format("$directory/$_") } @files;

    Rather than collecting filenames in @files to process the latter with grep later, why don't you do so in the first place as files are being searched? If the files are many, then your program should be more responsive. Also, I don't understand the logic of stripping the base directory only to reinsert it later...

    foreach my $file (@files) { print "FAILED FILE - $file\n"; }

    Oh, and why another loop too?!? (grep is one in disguise: you're doing it twice when there's no logical reason to.)

    if ($_ =~ m/\r\n/) {

    The whole point of the $_ pronoun is that it is the topicalizer: you either use an explicit variable name and the binding operator or just the match.

    All in all I believe your code may be rewritten like:

    #!/usr/bin/perl use strict; use warnings; use File::Find; my $directory = '/tmp/rja/find_test'; find { no_chdir => 1, wanted => sub { return unless -f; open my $fh, $_ or die "Can't open `$_': $!\n"; my $file=$_; /\r\n/ and print "FAILED FILE - $file\n" and return while <$fh>; } }, $directory; __END__
Re: grep and find file weirdness
by EvanCarroll (Chaplain) on Jun 20, 2007 at 16:19 UTC
    Here is some advice
    • Use File::Spec->catfile for building the directory locations and pathnames
    • Dont use constant open form, see open ( my $fh, $rw, $perm ) format
    • I think reading in the whole file is probably better for whatever you want, on a modern system
    • Don't grep for a file_path if the call requires it to be there
    try something like:
    #!/usr/bin/perl -l use strict; use File::Find; use File::Spec; use constant DIRECTORY => '/tmp/rja/find_test'; my @files; find( \&findsub, __PACKAGE__->DIRECTORY ); @files = grep is_dos_format($_), @files; foreach my $file (@files) { print "FAILED FILE - $file"; } sub findsub { push @files, $File::Find::name; } sub is_dos_format { my $abs_path = shift; open ( my $fh, '<', $abs_path ) or die "open $abs_path: $!" ; if ( grep m/\r\n/s, <$fh> ) { return 1; } return undef; }


    Evan Carroll
    www.EvanCarroll.com
      I think reading in the whole file is probably better for whatever you want, on a modern system

      I don't think so. It may not do a big difference, on a modern system. But under certain circumstances it may even there. After all he wants to quit searching as soon as /\r\n/ matches: what benefit would he have going on instead?

Re: grep and find file weirdness
by graff (Chancellor) on Jun 21, 2007 at 04:23 UTC
    Since you seem to be on a *n*x machine, seems like this would be a lot simpler (assuming a bash shell):
    $ find /tmp/rja/find_test -type f -print0 | perl -0 -ne 'chomp; $t=do{local $/; open(T,"<",$_);<T>}; print "$_ is completely CRLF\n" unless ( $t =~ /(?<!\r)\n/ )'

    But then, I've never been a fan of File::Find, ever. It's just too weird and too slow.

    BTW, are you looking for files that are completely CRLF, as I assumed in my snippet, or rather for files that have any CRLF in them? If the latter, the last part of the snippet would be  if ( $t =~ /\r\n/ ) update: and I'd change "is completely" to "has".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://622288]
Approved by grep
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2022-08-08 07:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?