newbie2perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm new to Perl and just joined this site in hopes of having some questions answered. What I am trying to do is very simple, but I can't quite seem to get the hang of it. I want to ask the user to type in a filename, find the file, open it, search the file for a string and if the string is found store it in an array. Right now my program will do all of these things, however the match that's occurring is greedy, so it's finding all instances of each searchstring in the file. I only want it to store the string once in the array if it finds it, not multiple times for each time that it finds the string in the file. Below is an example of the code I'm using, perhaps someone can guide me as to how best to modify this so that it will stop searching once it has found the string once:
use strict; my ($Filename, @array, $searchstring, @stringsfound, @file); print "Please enter the name of the file:"; $Filename = <STDIN>; chomp($Filename); $Filename = "C:\\Perl\\".$Filename; open IN, $Filename or die "Could not open file $!"; @file = <IN>; close(IN); @array = ('file', 'this', 'dog', 'forward'); foreach (@file){ foreach $searchstring (@array) { if (m/(.*?)$searchstring/i) { push @stringsfound, $searchstring; } } } print @stringsfound;
Thanks in advance for any help on this, look forward to hearing what you have to say!

Replies are listed 'Best First'.
Re: Match a string only once in a file
by tirwhan (Abbot) on Mar 08, 2005 at 00:11 UTC
    Slurp the file into a scalar and then do a single match on it per value you're looking for. Like so:
    local $/; my $file = <IN>; close IN; @array=('file', 'this', 'dog', 'forward'); foreach $searchstring(@array) { if ($file=~m/$searchstring/i) { push @stringsfound,$searchstring; } }
    The foreach block can also be written as
    @stringsfound=grep {$file=~/$_/i} @array;
    I changed your regular expression, omitting the (.*)? since it didn't seem to be fulfilling any useful function (ordinarily this would capture any characters before $searchstring into $1, but you're not using $1 for anything).
Re: Match a string only once in a file
by shemp (Deacon) on Mar 07, 2005 at 23:41 UTC
    Keep track of which ones you've found in a hash.
    # assume %matches is declared before the loops ... if ( /$searchstring/i && ! exists($matches{$searchstring}) ) { push @stringsfound, $searchstring; $matches{$searchstring} = 1; } ...
    If you dont care about the order the strings were found, you could eliminate the array entirely by initializing the %matches hash with the strings as keys, and the values as false, and set them to true as you find the strings.
    Then change my above logic a little.
Re: Match a string only once in a file
by tall_man (Parson) on Mar 08, 2005 at 00:40 UTC
    Here is another way that reads one line at a time and stops early when all strings are found:
    # The first part is the same as the original. open IN, $Filename or die "Could not open file $!"; @array = ('file', 'this', 'dog', 'forward'); my %guard; READLOOP: while (<IN>) { foreach $searchstring (@array) { if (m/$searchstring/i) { if (! exists $guard{$searchstring}) { push @stringsfound, $searchstring; last READLOOP if @stringsfound == @array; $guard{$searchstring} = 1; } } } } close IN; # a somewhat nicer output print join(" ",@stringsfound),"\n";
      It tidies up nicely and does less searching if you modify the foreach loop:
      foreach $searchstring (grep {!exists $guard{$_}} @array) { if (m/$searchstring/i) { push @stringsfound, $searchstring; $guard{$searchstring} = undef; } }

      Caution: Contents may have been coded under pressure.
Re: Match a string only once in a file
by Roy Johnson (Monsignor) on Mar 08, 2005 at 01:05 UTC
    You are searching each line of the file for each of the searchstrings, capturing (and then discarding) the portion of the line preceding the searchstring, and then pushing the searchstring.

    Your match is not greedy. You're just looking for a match on every line of every file. Get rid of the parenthesized portion of your pattern, and use a hash to keep track of what you've already matched.


    Caution: Contents may have been coded under pressure.
Re: Match a string only once in a file
by Zaxo (Archbishop) on Mar 08, 2005 at 02:32 UTC

    There is also index. It returns -1 if the string is not found.

    local $/; my $contents = <IN>; close IN; my @found; for (@array) { next if -1 == index $contents, $_; push @found, $_; }
    or my @found = grep {-1 != index $content, $_} @array;

    After Compline,
    Zaxo

Re: Match a string only once in a file
by Roy Johnson (Monsignor) on Mar 08, 2005 at 01:49 UTC
    The obligatory sick version:
    my @array = ('file', 'this', 'dog', 'forward'); my %guard; open IN, $Filename or die "$!: $Filename\n"; print join(' ', map { my $line = $_; grep {!exists $guard{$_} and $line =~/$_/ and $guard{$_}=1} @array +; } <IN>), "\n"; close IN;

    Caution: Contents may have been coded under pressure.
Re: Match a string only once in a file
by hsinclai (Deacon) on Mar 07, 2005 at 23:59 UTC
    Or..
    my @array = ('file', 'this', 'dog', 'forward' ); my $searchstring = join '|',@array; foreach ( @file ) { if ( /($searchstring)/i ) { push @stringsfound,$1 unless grep { $_ eq $1 } @stringsfound; } }
      What if "this dog" appears? The pattern will match "this" and move on to the next line. You would need to to a global (/g) match on the line, and then sift through what you found.

      One interesting way to use the superterm approach is to remove each term you find from it as you go:

      my @file = <DATA>; my @array = ('file', 'this', 'dog', 'forward'); my $superterm = join '|', @array; my @stringsfound; my %guard; for (@file) { while ($superterm ne '' and /($superterm)/gi) { print "Found <$1>\n"; # To watch it work push @stringsfound, $1; $guard{$1} = undef; $superterm = join '|', grep {!exists $guard{$_}} @array; } } print map "$_\n", @stringsfound; __DATA__ No one could find this dog dog dog a young boy came forward to claim this file but his file was a mile long

      Caution: Contents may have been coded under pressure.
        Really cool idea - very instructional - thanks!
Re: Match a string only once in a file
by webengr (Pilgrim) on Mar 08, 2005 at 00:16 UTC
    If you want to keep the code you have, try adding "last;" after the push.


    PCS
      That doesn't work at all. I think you must have misread the problem statement.