in reply to search pattern with digits

You are looking for two different things so you need to have two alternations

my $reject_rx = qr{ total [ ] rows [ ] rejected: [ ] (\d+) | (\d+) [ ] rows [ ] rejected }x; if ( $line =~ /$reject_rx/ ) { my $count = defined $1? $1 : $2; print $count, "\n"; }
or using perl 5.10
use 5.010_00 my $reject_rx = qr{ (?| # either match will be in $1 total [ ] rows [ ] rejected: (\d+) | (\d+) [ ] rows [ ] rejected ) }x; if ( my ($count) = $line =~ /$reject_rx/ ) { say $count; }
the /x means that white space is ignored and comments can be put in. That is why I had to put to match a literal space.

Replies are listed 'Best First'.
Re^2: search pattern with digits
by mercuryshipz (Acolyte) on Feb 14, 2008 at 20:12 UTC
    thanks a lot hipowls...

    the search phrase is actually given as arguments, cuz it could be anything each time, we can't use that inside the program...
    Can you explain about the [] operator?

    thanks again.

      Regular expressions can have a /x qualifier. It allows embedded comments and for the regular expression to be formatted for easy reading. To get a literal space you have to either

      • backsladh escape it: "\ ", or
      • put it in a chacter class: "[ ]"
      for example to match "one two" you have
      • m/ one \  two/x
      • m/ one [ ] two /x
      I use the latter as the space is easier to see with a mark both sides.

      I used /x to make it easier for you to see the alternations. You can remove it along with the comment (# to end of line), white space not in a character class and then change "[ ]" to " ".

      I am now a little confused about what you are matching. You say you are given the string to match as an argument but you have two different strings "total rows rejected: number" and "number rows rejected". If you do put the argument into a regular expression then you are correct to use \Q \E.

        thanks for the responses...

        lemme make it clear ... the argument(s) which are given is jus the phrase. for example
        total rejected rows

        or
        rejected number of rows

        in the log file which im gonna parse it with the script contains:

        log.txt
        total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 3000 rejected number of rows 8700 rejected number of rows 65000 rejected number of rows 1200 rejected number of rows 4300 rejected number of rows total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 54000 rejected number of rows 4000 rejected number of rows


        the programs works only for "total rejected rows:", ie., if the number (desired result) is at the end of the search phrase, but if it is at the beginning for example, "rejected number of rows" the number present at the beginning is not returned... but always the search phrase has the digits either at the beginning or at the end... if u have more questions plz lemme kno...

        thanks.