search pattern with digits

mercuryshipz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: search pattern with digits by hipowls (Curate) on Feb 14, 2008 at 19:25 UTC
You are looking for two different things so you need to have two alternations `my $reject_rx = qr{ total [ ] rows [ ] rejected: [ ] (\d+) \| (\d+) [ ] rows [ ] rejected }x; if ( $line =~ /$reject_rx/ ) { my $count = defined $1? $1 : $2; print $count, "\n"; }` [download] or using perl 5.10 `use 5.010_00 my $reject_rx = qr{ (?\| # either match will be in $1 total [ ] rows [ ] rejected: (\d+) \| (\d+) [ ] rows [ ] rejected ) }x; if ( my ($count) = $line =~ /$reject_rx/ ) { say $count; }` [download] the /x means that white space is ignored and comments can be put in. That is why I had to put to match a literal space.	[reply] [d/l] [select]
Re^2: search pattern with digits by mercuryshipz (Acolyte) on Feb 14, 2008 at 20:12 UTC
thanks a lot hipowls... the search phrase is actually given as arguments, cuz it could be anything each time, we can't use that inside the program... Can you explain about the [] operator? thanks again.	[reply]
Re^3: search pattern with digits by hipowls (Curate) on Feb 14, 2008 at 20:35 UTC
Regular expressions can have a /x qualifier. It allows embedded comments and for the regular expression to be formatted for easy reading. To get a literal space you have to either backsladh escape it: "\ ", or put it in a chacter class: "[ ]" for example to match "one two" you have `m/ one \ two/x` `m/ one [ ] two /x` I use the latter as the space is easier to see with a mark both sides. I used /x to make it easier for you to see the alternations. You can remove it along with the comment (# to end of line), white space not in a character class and then change "[ ]" to " ". I am now a little confused about what you are matching. You say you are given the string to match as an argument but you have two different strings "total rows rejected: number" and "number rows rejected". If you do put the argument into a regular expression then you are correct to use \Q \E.	[reply] [d/l] [select]
Re^4: search pattern with digits by mercuryshipz (Acolyte) on Feb 14, 2008 at 21:07 UTC
Re^5: search pattern with digits by hipowls (Curate) on Feb 14, 2008 at 22:22 UTC
Re: search pattern with digits by Narveson (Chaplain) on Feb 14, 2008 at 20:51 UTC
I gather that your search phrase will be determined at run time, and not only the phrase, but also whether the data comes before or after the phrase. How is your program going to be notified of the search phrase? Is there some way it can also be notified of the positioning of the numerical data? If you have any control over the earlier stage where the search phrase is identified, ask to have the search phrase specified as a regular expression complete with capturing parentheses. `sub capture_numbers { my ( $search_phrase ) = @_; croak "Search phrase $search_phrase lacks capturing parentheses" if $search_phrase !~ /\(/; my @results; while <LOG_FILE> { my ( $desired_number ) = /$search_phrase/o; push @results, $desired_number; } }` [download] This sub accepts either of your two example search phrases with their differing syntax: `total rows rejected: (\d+)` `(\d+) rows rejected`	[reply] [d/l] [select]
Re^2: search pattern with digits by mercuryshipz (Acolyte) on Feb 14, 2008 at 22:31 UTC
im posting the code to give a clear idea... #!/usr/bin/perl # use strict; #use warnings; use List::Util q{first}; sub search_phrase{ my @array; my ( $inFile, @phrases ) = @_; my $lastPhrase = $phrases[ -1 ]; open my $inFH, q{<}, $inFile or die qq{open: $inFile: $!\n}; my @lines = <$inFH>; close $inFH or die qq{close: $!\n}; foreach my $phrase ( @phrases ) { my $rxPhrase = qr{\Q$phrase\E}; my $lineNo = first { $lines[ $_ ] =~ $rxPhrase } 0 .. $#lines; unless ( defined $lineNo ) { next; } print "" if ($lines[ $lineNo ] =~ m{\Q$lastPhrase\E\s(\d)}); push (@array,$1); $lineNo ++; splice @lines, 0, $lineNo; } return (@array); } my $file_n = "test.txt"; my $phrase1 = "total rows rejected:"; my $phrase2 = "total rejected recors:"; my $phrase3 = "rows rejected for sub"; my $phrase4 = "total rejected rows:"; my @newarray=search_phrase($file_n, $phrase1, $phrase2, $phrase3, $phr +ase4); my $count=($#newarray); $newarray[$count]=~ s/\s+//g; if ((($#newarray+1)>=1) && ($newarray[$count]gt 0)) { print "$newarray[$count]\n"; } else { print "-1\n"; } [download] the log file is searched in the sequence the phrases are given and the last phrase's value is returned. if the last phrase is not present in the log file or not in that sequence it returns -1. my problem is, if u look at the log file "rows rejected for sub" if this is given as the last phrase, (the number is present at the beginning of the search phrase) the number present at the beginning must be returned. this program works only if the last phrase (search phrase) has the number after that not at the beginning. and once again, the sequence or the phrases given for search varies everytime according to the log file. log file `This file is to check the number of occurences of the word reject total rows rejected: 80 this file just contains the phrase reject. reject:3 reject 3 reject 4 total rejected rows: 100 total rejected rows: 90 total rejected rows: 60 total rejected rows:40 total rejected rows:40 90 rows rejected for sub 999 rows rejected for sub 100 rows rejected for sub Reject_Ao total rejected recors: 60 total rejected rows:49 reject:1 390 rows rejected for sub` [download] thanks.	[reply] [d/l] [select]
Re^3: search pattern with digits by Narveson (Chaplain) on Feb 15, 2008 at 09:47 UTC
Thanks for supplying the code and the sample log file. I saved the log file as 'test.txt' and ran your code. The subroutine returns an array of three undefined values, one for each of the first three search phrases, after which the text file is exhausted, so nothing (not even an `undef` array entry) is returned for the final search phrase. The code is way too busy. You don't need to read a file into an array - you can just iterate one line at a time with `while <$inFH>`. You almost never to use an array index. Finally, if your problem is to extract the number from the matching line wherever the number may be, why don't you just use `/(\d+)/` to extract the number after you have matched the search phrase? #!/usr/bin/perl use strict; use warnings; sub search_phrase { my ( $inFile, @phrases ) = @_; open my $inFH, q{<}, $inFile or die qq{open: $inFile: $!\n}; my $line; PHRASE: foreach my $phrase ( @phrases ) { my $rxPhrase = qr{\Q$phrase\E}; # keep reading down the file while ($line = <$inFH>) { # when one phrase matches, jump to the next next PHRASE if $line =~ /$rxPhrase/; } # end of file, and we haven't matched the last phrase return; } # We have just matched the last phrase. # The number we want is somewhere in $line. my ($number) = $line =~ /(\d+)/; return $number; } my $file_n = "test.txt"; my $phrase1 = "total rows rejected:"; my $phrase2 = "total rejected recors:"; my $phrase3 = "rows rejected for sub"; my $result = search_phrase($file_n, $phrase1, $phrase2, $phrase3); if (defined $result) { print "search_phrase subroutine found $result\n" } else { print "search_phrase subroutine didn't find a number." } [download] With your sample data, this returns `search_phrase subroutine found 390` [download]	[reply] [d/l] [select]
Re^4: search pattern with digits by mercuryshipz (Acolyte) on Feb 15, 2008 at 16:23 UTC