http://qs1969.pair.com?node_id=597051

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks, i am getting a list of files in an array and i want to search for a pattern inside these files without opening these files. i am greping the list but it is not grepping inside the file.
my @files = grep (/^source*/, @arr1);
The "source" is the pattern i need to find inside the files,and if it matches i need to put in another array for post processing. Thanks

Replies are listed 'Best First'.
Re: search for a pattern in file without opening the file
by davorg (Chancellor) on Jan 29, 2007 at 10:00 UTC

    It's impossible to access the contents of a file without opening it. Sometimes you don't need to _explicitly_ open the files as a library or the operating system will do that for you, but if you're accessing the contents of a file in any way, then that file _is_ being opened by someone.

    Looking at your code, it seems you've been confused by the fact that Perl's grep function doesn't work the same way as the Unix grep command. The Unix command works on a list of filenames (Unix opens the files and searches them for you) but the Perl function works on a list of strings. The Perl function is actually far more powerful and flexible than the Unix command. In fact, using it to emulate the Unix command is probably the one thing that it's a bit clunky at :-)

    So your best solution is to do it the obvious way. Open each file and read it to see if it contains the test you're looking for. You could probably do something clever with @ARGV and <>, but I don't think you'd gain much.

    Update: Actually, I probably wouldn't use grep in this case. The trouble with grep is that it always checks the whole list. And with a large list where your search term appears near the beginning, that can be inefficient. I'd do something like this:

    sub contains { my ($file, $search) = @_; open my $fh, '<', $file or die $!; while (<$fh>) { return 1 if /$search/; } return; } my @files = grep { contains($_, $search_str) } @arr1;
Re: search for a pattern in file without opening the file
by virtualsue (Vicar) on Jan 29, 2007 at 10:07 UTC
    I'm not saying this is the answer to your question, but perhaps you should have a look at File::Grep.

    File::Grep mimics the functionality of the grep function in perl, but applying it to files instead of a list. This is similar in nature to the UNIX grep command, but more powerful as the pattern can be any legal perl function.

    For those who were quick with the tedious wisecracks -- perhaps the OP meant that he didn't want to have to do the file open in his own code.

Re: search for a pattern in file without opening the file
by Corion (Patriarch) on Jan 29, 2007 at 10:01 UTC

    Well, you need to search through the files' contents instead of searching through the files' names. For example by using the external grep utility or by actually opening the file, reading its contents and looking for the stuff you want. There is no way to find out about the contents of a file without opening it and reading it.

    use strict; sub contains_pattern { my ($file,$pattern) = @_; open my $fh, "<", $file or die "Couldn't read '$file': $!"; grep { /$pattern/ } <$fh>; }; my @files = grep { contains_pattern $_, qr/^source*/ } @arr1;

    I also notice that you use /^source*/ as your search pattern. This most likely doesn't do what you want unless you want to match all strings starting with sourc (no e).

    Update: Added the part about opening the file

Re: search for a pattern in file without opening the file
by tinita (Parson) on Jan 29, 2007 at 09:57 UTC
    i want to search for a pattern inside these files without opening these files.
    well... you have to use the module Crystal::Ball which is very good in guessing what's in a file.

    seriously, how do you want to know the contents of a file without opening it?

      Shame on you tinita, -- for such bad advice...

      He'd be much better of using Quantum::Schrodingers_Cat. The major plus side of using this module is that you can just choose the result you want. He may not even need to run the script, which has obvious efficiency advantages. :p

      ---
      my name's not Keith, and I'm not reasonable.

        Hmmm, Quantum::Schrodingers_Cat sounds like a Damian module. This computer is alive or dead depending on the XOR of this entire file - a new variant to the halting problem.

        How about another application: "This file/email attachment contains a virus" without opening it.

        --

        Oh Lord, won’t you burn me a Knoppix CD ?
        My friends all rate Windows, I must disagree.
        Your powers of persuasion will set them all free,
        So oh Lord, won’t you burn me a Knoppix CD ?
        (Missquoting Janis Joplin)

      In the spirit of "thinking outside the box"...*cough*...

      You would need to:
      1) know what type of file-system your files reside on.
      2) Open the device (note: not opening the file!)
      3) interpret the file system structure to find the blocks corresponding to your file on disk
      4) retrieve the file blocks corresponding to the file and perform your search on the data in memory.
      ---
      Of course you can search multiple files on the same disk, using the same device handle, so of course you save yourself the overhead of multiple "open" calls.

      Note: this likely would not work if the file was a remote (networked) file.
      Note 2: This would likely be quite painful, but if you're into that sort of thing...:-)

      As others have said, within the definition of your problem is an "inherent" open to be able to look into the contents of the file. What are you trying to do that you want to avoid opening the file in perl?

      FWIW, I'd change Ted's answer, above, to use "pcregrep" instead of "grep" to maintain use of perl-compatible regular expressions :-).

      p.s. -- sometimes you just have to put on your hat of "literalness" :-)

Re: search for a pattern in file without opening the file
by Tanktalus (Canon) on Jan 29, 2007 at 09:58 UTC

    Unfortunately, perl is completely missing the 'Telepathy::File' module, last I checked, and will thus need to actually open files to read them, just like every other computer program would.

    What do you need to put in another array for post processing? The filename? Maybe something like this:

    my @files = grep { open my $fh, '<', $_ or die "Failed to open $_: $!"; my $rc = 0; my $l; while (!$rc && defined($l = <$fh>)) { $rc++ if $l =~ /^source/; } $rc } @arr1;
    (Untested.) Maybe. Hard to tell without more info from you.

Re: search for a pattern in file without opening the file
by Melly (Chaplain) on Jan 29, 2007 at 10:23 UTC

    Can you rephrase the question? i.e. "I want to check for matches in a file without using an 'open' on the file"..

    Is something like

    while(<STDIN>){ print $_; }
    any help? (which can be run like: perl_script < a_file.txt and will iterate over the lines in a_file.txt)

    <update>It also occurs to me that you might be asking "how do I grep a file that I don't have read-access to?" - in which case you are up the proverbial creek..</update>

    map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
    Tom Melly, pm@tomandlu.co.uk
Re: search for a pattern in file without opening the file
by tcf03 (Deacon) on Jan 29, 2007 at 14:44 UTC
    If you don't want to open the files in your code you could use a system call.
    my $success = ( system("grep $searchpat $file >/dev/null") ) ? 0 : 1;
    as long as its a simple pattern.
    Ted
    --
    "That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
      --Ralph Waldo Emerson
      If the grep finds a match it returns 0 or 1?

        Yes, as explained in the Exit Status section of the man page.

Re: search for a pattern in file without opening the file
by GrandFather (Saint) on Jan 29, 2007 at 09:59 UTC

    That is rather similar to the problem I have with a sealed black box. I want to know what is in it, but I'm not allowed to open it. I do know the address information on the outside of the box and how large it is - stuff like that, but I've no idea what is inside it. Perhaps if you solve your problem (using a clairvoyant perhaps?) you could tell me how you did it and I can use the same technique for my problem?

    Actually, solve that problem and we can sell the solution to kids who want to find out what is in their birthday presents!

    In the mean time you may be interested in reading Perl's grep documentation. It is not a *nix grep (which does "open the files" btw).


    DWIM is Perl's answer to Gödel
Re: search for a pattern in file without opening the file
by codeacrobat (Chaplain) on Jan 30, 2007 at 07:18 UTC
    Let perl do the dirty work and use the implicit loop options -n ( see perlrun ).

    Check out Tim Mahers Minimal Perl if you are a unixy person and want perl scripts to work similar to unix tools.

    /usr/bin/perl -n BEGIN{ @ARGV=qw(the files you want to process); } push @files, $ARGV and close ARGV if /^pattern/; END{ print "matching files @files\n" }
Re: search for a pattern in file without opening the file (mem efficient)
by ikegami (Patriarch) on Jan 30, 2007 at 22:51 UTC

    Ignoring "without opening these files", what do you want returned: the name of the files with a matching pattern, or the matching lines from the files?

    Version that outputs the name of the files with a matching pattern:

    use File::Glob qw( bsd_glob ); # Inputs my @files = bsd_glob('*'); my $re = qr/^source*/; my @matching_files; foreach my $file_name (@files) { if (!open(my $fh, '<', $file_name)) { warn("Unable to open file \"$file_name\": $!\n"); next; } while (<$fh>) { if (/$re/) { push(@matching_files, $file_name); last; } } } # Output print("$_\n") foreach @matching_files;

    Version that outputs the line that match the pattern.

    use File::Glob qw( bsd_glob ); # Inputs my @files = bsd_glob('*'); my $re = qr/^source*/; my @matching_lines; foreach my $file_name (@files) { if (!open(my $fh, '<', $file_name)) { warn("Unable to open file \"$file_name\": $!\n"); next; } while (<$fh>) { if (/$re/) { push(@matching_lines, $_); } } } # Output print foreach @matching_lines;

    A combination:

    use File::Glob qw( bsd_glob ); # Inputs my @files = bsd_glob('*'); my $re = qr/^source*/; my @matches; foreach my $file_name (@files) { if (!open(my $fh, '<', $file_name)) { warn("Unable to open file \"$file_name\": $!\n"); next; } while (<$fh>) { if (/$re/) { # Save the file name, the line number and the line content. push(@matches, [ $file_name, $., $_ ]); } } } # Output print($_->[0], ',', $_->[1], ': ', $_->[2]) foreach @matches;

    These solutions are memory efficient.
    (Returning the results as an iterator would be even more!)

      Here's a version that only keeps one line in memory at a time (as opposed to every match). It is done elegantly (from the perspective of the function's user) using an iterator.

      use strict; use warnings; use File::Glob qw( bsd_glob ); sub search_files { my ($re, @files) = @_; my $fh; my $file_name; return sub { START: if (not defined $file_name) { return () if not @files; $file_name = shift(@files); if (!open($fh, '<', $file_name)) { warn("Unable to open file \"$file_name\": $!\n"); undef $file_name; goto START; } } while (defined(my $line = <$fh>)) { if ($line =~ /$re/) { return ( $file_name, $., $line ); } } undef $fh; undef $file_name; goto START; }; } { my @files = bsd_glob('*'); my $re = qr/^source*/; my $iter = search_files($re, @files); while (my ($file_name, $line_num, $line) = $iter->()) { print("$file_name,$line_num: $line"); } }