mercuryshipz has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl #use strict; use warnings; sub search_pattern { my $file_name = $_[0]; my $search = $_[1]; open(LOGFILE, $_[0]) or die("Error: cannot open file '$_[0]' +\n"); while (<LOGFILE>) { if ( $_ =~ /$search/ ) { my $val = $`; #Matches Everything after pattern $val =~ s/^\s+//; #remove leading spaces $val =~ s/\s+$//; #remove trailing spaces $val =~ s/\D//g; #Just has the digits. All other charcters are filter +ed. #print "$val\n"; print "\nFirst Occurence:$val \n"; my $line = $.; print "Line number:$line\n"; #$temp = $line; #print "$temp"; #print "$."; last; } } } my $file_n ="test.txt"; my $search_p = "This is phrase 2"; &search_pattern($file_n, $search_p);
OUTPUT
------

First Occurence:90
Line number:3

Hi guys,

My code above searches for the first occurence of a phrase and returns the line number...
What if i wanna search for multiple search phrases in a single file and return the line numbers.

For example:
test.txt

1.This is phrase 1
2.This is phrase 3
3.This is phrase 2
4.This is phrase 3

any text in between

5.This is phrase 5

any text in between

6.This is phrase 4
7.This is phrase 1
8.This is phrase 2

9.This is phrase 3
10.This is phrase 6
11.This is phrase 7
12.This is phrase 1
13.This is phrase 2
14.This is phrase 3
15.This is phrase 4
16.This is phrase 8
17.This is phrase 1

.................
.................

i have given the line numbers for my reference in reality, it is not present...

The arguments passed are file_name and Search phrases (for eg: this is phrase1, this is phrase 2, This Phrase 3, phrase 5...)

The number of arguments passed may vary... But the first argument is always the file_name.

first it will check for phrase 1 and then from that line number phrase 2 (even though in our case Phrase 3 come in between phrase 2 ie., line 2 we need phrase 2 first and then phrase 3) and then phrase 3 and return phrase 3.

lets say we have 4 arguments including file name...

if (phrase 1 exists) ---------- if (phrase 1 doesnt exists)

from that line number ----------- search for phrase 2

search for phrase 2 ---------- if (phrase 2 also doesnt exists)
from that line number ---------- search for phrase 2 and return phrase 3

---------if (phrase 3 also doesnt exists)
----------- return phrase 2
----------- else phrase 1

search for phrase 3 ----------- if none of them exists display a message

return the phrase 3


the reason iam keeping track of line number is so that the file is not parsed from line 1.

if i know the number of arguments passed then its fine...

But the thing is what if i dont know the number of arguments presents.

can anybody suggest how do i proceed for this problem.


Replies are listed 'Best First'.
Re: search pattern and arrays
by apl (Monsignor) on Jan 23, 2008 at 18:43 UTC
    Load the list of phrases into my %Phrases;

    Then, instead of doing

    if ( $_ =~ /$search/ )
    use
    if ( defined $Phrases{$_} )

    By the way, $. is the current input line-number; you don't have to keep track yourself.

    Revised (for clarity): A variable prefaced with a percent sign is a hash, the equivalent of an array using a string (rather than a number) as an index. The defined function says "Is there an element of the hash having the specified index?".
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: search pattern and arrays
by johngg (Canon) on Jan 23, 2008 at 20:31 UTC
    I think this is what you are after although I may be misreading your post. If I understand correctly, you search for your phases in a certain order and you want to find the first occurence of each phrase ignoring any phrases that are out of sequence, e.g. a "phrase2" that is before the first "phrase1" is ignored.

    To do this I read the lines into an array. If your file is very large this might not be feasible. The adjustment to $lineNo is because arrays are zero-based and I'm assuming you number your lines from 1.

    use strict; use warnings; use List::Util q{first}; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die qq{open: $!\n}; 1:gash line 2:phrase4 3:akjdakj 4:fwefkwe 5:phrase5 6:phrase1 7:adsfwfw 8:phrase3 9:phrase5 10:jkjd wsekjw wiu 11:phrase2 12:wewefwefwf 13:another line 14:dsjwjk 15:adsfwfw 16:phrase3 17:another line 18:adsfwfw 19:phrase5 20:phrase6 21:ertgerher EOD my @lines = <$inFH>; close $inFH or die qq{close: $!\n}; my @phrases = map { qq{phrase$_} } 1 .. 6; my $cumulativeOffset = 0; foreach my $phrase ( @phrases ) { my $rxPhrase = qr{$phrase}; my $lineNo = first { $lines[ $_ ] =~ $rxPhrase } 0 .. $#lines; unless ( defined $lineNo ) { print qq{$phrase: not found in sequence\n}; next; } $lineNo ++; $cumulativeOffset += $lineNo; print qq{$phrase: $cumulativeOffset\n}; splice @lines, 0, $lineNo; }

    The output.

    phrase1: 6 phrase2: 11 phrase3: 16 phrase4: not found in sequence phrase5: 19 phrase6: 20

    I hope I have guessed right and this is of use.

    Cheers,

    JohnGG

      thats great John... thanks a lot ....
      u almost got my question...
      but a few ellaborations of my posting,

      1. the file should not be inside the program, its read with the given location as in my program.

      2. the file doesnt contain the line number.

      3. we dont know the exact number of arguments which will be passed. (in ur program 1..6)

      4. Phrase 1, Phrase 2... was jus an example, it could be anything. for eg:

      total Laptops produced: 60
      total mice produced: 40
      total cpu sold : 57
      total printers produced: 98
      total monitors produced: 10

      .......................

      phrases like these could be present any number of times in the file. but our function should search in the order the phrases are passed.

      actually the phrase passed will be like, for eg "total cpu sold :" but it shud return the whole line, cuz the value 57 is important.

      to give u clear idea...
      the file " test.txt" contains:

      total Laptops produced: 60
      total cpu sold : 57
      total mice produced: 40
      total cpu sold : 45
      total Laptops produced: 68
      total mice produced: 48
      total cpu sold : 51
      total printers produced: 19
      total monitors produced: 149

      -------

      for eg: this is given


      $a= "total Laptops produced:";
      $b ="total mice produced:";
      $c = "total cpu produced:";

      &search_phrase($filename, $a, $b, $c);

      this function shud return 45.

      i thank u again for ur support...
        To take your points in order:

        1. I put the file inside the script to keep everything together. Another way of doing this would be to place the data at the end of the script after a __END__ or __DATA__ tag and read the DATA filehandle that the interpreter opens for you. However, I wanted to show you how to use the three-argument form of open which is considered best practice these days. Just substitute your variable containing your file to be read for the \ <<EOD ...

        2. I just put the line numbers in the data to show that the script was giving the "right" answers. Having them there did not affect how the script ran.

        3. Put something like my ( $file, @phrases ) = @ARGV; at the top of your script so that you don't have to worry how many phrases are being sought.

        4. If you are calling your script from the command line with a file and a series of phrases then I imagine you will enclose each phrase in single-quotes. To avoid the problem where the phrase might contain regex metacharacters change the line compiling the regex to my $rxPhrase = qr{\Q$phrase\E};

        I'm not sure how you arrive at an answer of 45; did you mean to say $c = "total cpu sold:"?

        Cheers,

        JohnGG

Re: search pattern and arrays
by toolic (Bishop) on Jan 23, 2008 at 19:09 UTC
    When I run your code, I do not get the output that you get. I get this:
    First Occurence: Line number:3

    What is the significance of "90" in your output?

    Another observation: you declare $file_name, but never use it.

Re: search pattern and arrays
by poolpi (Hermit) on Jan 25, 2008 at 08:58 UTC
    This code give you the line numbers and each value for the passed phrase : (see output)
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $p = [ q{total Laptops produced:}, q{total mice produced:} ]; sub search_phrase { my $total; $total->{ lc( (split)[1] ) } = undef for @{ $_[0] }; my $i = 1; map { my ( $item, $count ) = (split)[ 1, 3 ]; $item = lc $item; push @{ $total->{$item} }, $i . q{:} . $count if exists $total->{$item}; $i++; } <DATA>; return $total; } print Dumper search_phrase($p); __DATA__ total Laptops produced: 60 total cpu sold: 57 total mice produced: 40 total cpu sold: 45 total Laptops produced: 68 total mice produced: 48 total cpu sold: 51 total printers produced: 19 total monitors produced: 149
    Output: $VAR1 = { 'mice' => [ '3:40', '6:48' ], 'laptops' => [ '1:60', '5:68' ] };

    hth,
    PooLpi