in reply to Re: search pattern with digits
in thread search pattern with digits

thanks a lot hipowls...

the search phrase is actually given as arguments, cuz it could be anything each time, we can't use that inside the program...
Can you explain about the [] operator?

thanks again.

Replies are listed 'Best First'.
Re^3: search pattern with digits
by hipowls (Curate) on Feb 14, 2008 at 20:35 UTC

    Regular expressions can have a /x qualifier. It allows embedded comments and for the regular expression to be formatted for easy reading. To get a literal space you have to either

    • backsladh escape it: "\ ", or
    • put it in a chacter class: "[ ]"
    for example to match "one two" you have
    • m/ one \  two/x
    • m/ one [ ] two /x
    I use the latter as the space is easier to see with a mark both sides.

    I used /x to make it easier for you to see the alternations. You can remove it along with the comment (# to end of line), white space not in a character class and then change "[ ]" to " ".

    I am now a little confused about what you are matching. You say you are given the string to match as an argument but you have two different strings "total rows rejected: number" and "number rows rejected". If you do put the argument into a regular expression then you are correct to use \Q \E.

      thanks for the responses...

      lemme make it clear ... the argument(s) which are given is jus the phrase. for example
      total rejected rows

      or
      rejected number of rows

      in the log file which im gonna parse it with the script contains:

      log.txt
      total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 3000 rejected number of rows 8700 rejected number of rows 65000 rejected number of rows 1200 rejected number of rows 4300 rejected number of rows total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 total rejected rows: 1000 total rejected rows: 1254 54000 rejected number of rows 4000 rejected number of rows


      the programs works only for "total rejected rows:", ie., if the number (desired result) is at the end of the search phrase, but if it is at the beginning for example, "rejected number of rows" the number present at the beginning is not returned... but always the search phrase has the digits either at the beginning or at the end... if u have more questions plz lemme kno...

      thanks.

        1. How is the search string given?
        2. What search string is given?
        3. How the input data vary?
        4. How will what you search for depend on the input data?
        5. Will you always be capturing the same data?

        For the example you have given where you always want to capture a number after or before "rows rejected"

        qr{(?| \Q$string\E [^\d]* (\d+) | (\d+) [^\d]* \Q$string\E ) }x
        will capture the last number before a given search string or if there isn't one the first number after the string. Note given
        records processed 23456, total rows rejected 567
        it will match 23456.

        What is the input when one row is rejected? Is it

        1 rows rejected
        or
        1 row rejected
        Know your data;)