bobafett has asked for the wisdom of the Perl Monks concerning the following question:

Need to grep a tab separated data as shown and list lines matching a pattern.
Data in array @new_content_lines; 1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,, Regular expression to check : Search first four comma or tab separated alpha numeric values then sea +rch for three blank comma separated values and finally search for (\d +),0,(\d) pattern in a line (0 always exist between the numbers in the + last match) Grep output expected : 1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,,
Not able to get this to work any help appreciated.
my @grep_output = grep { /^(?:^,*,){4},{3 }(\d),0,(\d)/ } @new_content_lines;

Thanks
Bobafett

Replies are listed 'Best First'.
Re: regular expression help
by ikegami (Patriarch) on Jul 24, 2008 at 23:13 UTC
    Get rid of the space after the 3. It was added to by the CB line breaker. And by the way, the parens around \d are useless and slow down the match.
    print grep /^(?:[^,]*,){4},{3}\d,0,\d/, <DATA>; __DATA__ 1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,,
    1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,,
      Hello Ikegami,

      Thanks for the solution it works.
      If the data is \t (tab) separated instead of comma separated, should I be replacing all the commas in the reg expression to \s as shown.
      print grep /^(?:^\s*\s){4}\s{3}\d\s0\s\d/, <DATA>; Thanks
      Bobafett
        print grep /^(?:[^\t]*\t){4}\t{3}\d\t0\t\d/, <DATA>;
Re: regular expression help
by broomduster (Priest) on Jul 25, 2008 at 01:02 UTC
    Your original question said
    Search first four comma or tab separated alpha numeric values
    implying that your input might have mixed delimiters. If so, and if a given line is either comma-delimited or tab-delimited, then you need an alternation of the ikegami and Cristoforo regexes from above, like so:
    print grep / ^ (?: (?:[^,]*,){4},{3}\d,0,\d | (?:[^\t]*\t){4}\t{3}\d\t0\t\d ) /x, <DATA>;
    If the two delimiters can be mixed on the same line, then:
    print grep /^(?:[^,\t]*[,\t]){4}[,\t]{3}\d[,\t]0[,\t]\d/, <DATA>;

    Updated:No need for capturing in the alternation.

Re: regular expression help
by chrism01 (Friar) on Jul 25, 2008 at 04:31 UTC
    If that's you're real data, or at least an accurate representation, it seems to me that the lines you want all have 'Test' (mixed case) in them and the others don't ...
    Which would simplify your matching enormously.