Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Search List

by pglenski (Beadle)
on Dec 20, 2005 at 16:07 UTC ( [id://518102]=perlquestion: print w/replies, xml ) Need Help??

pglenski has asked for the wisdom of the Perl Monks concerning the following question:

I want to search a file for text (no wildcard searching) using a search list. This list will contain over a 1000 entries. Here's more or less what I want (of course this doesn't work):
chomp(@searchlist = `ls`); open(IN,"test.txt"); while (<IN>) { chomp; if ($_ =~ @searchlist) {print "Match Found"} }
Any ideas? Thanks!

Replies are listed 'Best First'.
Re: Search List
by holli (Abbot) on Dec 20, 2005 at 16:14 UTC
    You could use Regex::Assemble to build a big regex from the list. But I would recommend using a hash:
    %searchlist = map { $_ => 1 } @searchlist; open(IN,"test.txt"); while (<IN>) { chomp; if ( $searchlist{$_} ) {print "Match Found"} }


    holli, /regexed monk/
      Your suggestion would find any matching item on a line by itself. It is not clear from the OP whether the file to be searched is laid out that way. If (as I suspect) it's paragraphs, then it would need to be broken down into words to be checked one-by-one, or they'd need to use your Regex::Assemble solution.

      Caution: Contents may have been coded under pressure.
      Aside from note by Roy Johnson, I would change
      if ( $searchlist{$_} ) {print "Match Found"}
      to
      if ( exists $searchlist{$_} ) {print "Match Found"}
      since your test would add a entry to the hash on each word which isn't there.
      Update: This is wrong, see reply.
        That's wrong. Just looking up a hash key does not autovivfy that key. Consider
        my %h = (); print "found" if $h{somekey}; print %h;
        which prints nothing. The key "somekey" is not autovivied.


        holli, /regexed monk/
Re: Search List
by sachmet (Scribe) on Dec 20, 2005 at 16:27 UTC
    If your test.txt file is long, it may make sense to pre-compile the entries for faster searching. Also, you want to make sure to use \Q and \E to make sure wildcards in the file names don't get expanded.

    Also, please check return codes on system calls such as open().

    Consider something like the following:
    my @patterns; foreach my $pattern (`ls`) { chomp $pattern; push @patterns, qr/\Q$pattern\E/; } open(IN,"test.txt") or die "Can't open test.txt: $!\n"; while(my $line = <IN>) { chomp $line; foreach my $pattern (@patterns) { print "Match found\n" if $line =~ $pattern; } }
      Or even
      print "mf" if grep { $line =~ $_ } @patterns;
      instead of
      foreach my $pattern (@patterns) { print "Match found\n" if $line =~ $pattern; }


      holli, /regexed monk/
Re: Search List
by ikegami (Patriarch) on Dec 20, 2005 at 16:43 UTC

    Additionally, why use ls?

    opendir(my $dh, '.') or die("Unable to list directory: $!\n"); my @searchlist = readdir($dh);

    or

    opendir(my $dh, '.') or die("Unable to list directory: $!\n"); while (defined(my $file = readdir($dh))) { ... }

    Combined with sachmet's solution:

    opendir(my $dh, '.') or die("Unable to list directory: $!\n"); my @patterns = map { qr/\Q$_\E/ } grep { /^\.\.?$/ } readdir($dh); open(my $fh_in, 'test.txt') or die("Unable to open input file test.txt: $!\n"); while (my $line = <$fh_in>) { chomp $line; foreach my $pattern (@patterns) { print("Match found\n") if $line =~ $pattern; } }

    Combined with holli's solution:

    opendir(my $dh, '.') or die("Unable to list directory: $!\n"); my %searchlist = map { $_ => 1 } grep { /^\.\.?$/ } readdir($dh); open(my $fh_in, 'test.txt') or die("Unable to open input file test.txt: $!\n"); while (my $line = <$fh_in>) { chomp $line; print("Match found\n") if $searchlist{$line}; }

    Update: Fixed copy & paste error map { qr/\Q$pattern\E/ } => map { qr/\Q$_\E/ }.

      One subtle difference between your use of readdir and the OP's use of ls is that the latter will not list dot-files (e.g., .profile) by default. Conveniently, glob('*') mimics that behavior and shortens the necessary code:
      my %searchlist = map {$_ => 1} glob('*');

      Caution: Contents may have been coded under pressure.

        Aye. The non-glob alternative would be to change
        grep { /^\.\.?$/ }
        to
        grep { /^\./ }

Re: Search List
by l3v3l (Monk) on Dec 20, 2005 at 18:00 UTC
    just a quick stab at this one :
    perl -e 'chomp(@l=(<*>));open(FH,"<test.txt") or die;while(<FH>){for $ +w(@l){/\b$w\b/ ? print: next}}'
    does this do all that you want? or is there a chance that you would need to manipulate/search through more than one line at a time?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://518102]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2024-04-23 10:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found