regular expression help

bobafett has asked for the wisdom of the Perl Monks concerning the following question:

Need to grep a tab separated data as shown and list lines matching a pattern.

Data in array @new_content_lines;

1,2,0,First Test,,,,0,0,7,,,,,,,,,,,
1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,,
1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,,
1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,,
1,2,0,Second Test,,,,0,0,7,,,,,,,,,,,
1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,,
1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,,
1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,,

Regular expression to check :

Search first four comma or tab separated alpha numeric values then sea
+rch for three blank comma separated values and finally search for (\d
+),0,(\d) pattern in a line (0 always exist between the numbers in the
+ last match)

Grep output expected :

1,2,0,First Test,,,,0,0,7,,,,,,,,,,,
1,2,0,Second Test,,,,0,0,7,,,,,,,,,,,
[download]

Not able to get this to work any help appreciated.
my @grep_output = grep { /^(?:^,*,){4},{3 }(\d),0,(\d)/ } @new_content_lines;

Thanks
Bobafett

Comment on regular expression help Download Code

Replies are listed 'Best First'.
Re: regular expression help by ikegami (Patriarch) on Jul 24, 2008 at 23:13 UTC
Get rid of the space after the 3. It was added to by the CB line breaker. And by the way, the parens around \d are useless and slow down the match. print grep /^(?:[^,]*,){4},{3}\d,0,\d/, <DATA>; __DATA__ 1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Starting madvise bss tests,1,buffer,G,1,1,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,2,buffer,G,1,2,0,Y,,,P,G,,,,,, 1,2,0,Starting madvise bss tests,3,buffer,G,1,3,0,Y,,,P,G,,,,,, [download] `1,2,0,First Test,,,,0,0,7,,,,,,,,,,, 1,2,0,Second Test,,,,0,0,7,,,,,,,,,,,` [download]	[reply] [d/l] [select]
Re^2: regular expression help by bobafett (Initiate) on Jul 24, 2008 at 23:34 UTC
Hello Ikegami, Thanks for the solution it works. If the data is \t (tab) separated instead of comma separated, should I be replacing all the commas in the reg expression to \s as shown. print grep /^(?:^\s*\s){4}\s{3}\d\s0\s\d/, <DATA>; Thanks Bobafett	[reply]
Re^3: regular expression help by Cristoforo (Curate) on Jul 24, 2008 at 23:42 UTC
`print grep /^(?:[^\t]*\t){4}\t{3}\d\t0\t\d/, <DATA>;`	[reply] [d/l]
Re: regular expression help by broomduster (Priest) on Jul 25, 2008 at 01:02 UTC
Your original question said Search first four comma or tab separated alpha numeric values implying that your input might have mixed delimiters. If so, and if a given line is either comma-delimited or tab-delimited, then you need an alternation of the ikegami and Cristoforo regexes from above, like so: `print grep / ^ (?: (?:[^,],){4},{3}\d,0,\d \| (?:[^\t]\t){4}\t{3}\d\t0\t\d ) /x, <DATA>;` [download] If the two delimiters can be mixed on the same line, then: `print grep /^(?:[^,\t][,\t]){4}[,\t]{3}\d[,\t]0[,\t]\d/, <DATA>;` [download] Updated:*No need for capturing in the alternation.	[reply] [d/l] [select]
Re: regular expression help by chrism01 (Friar) on Jul 25, 2008 at 04:31 UTC
If that's you're real data, or at least an accurate representation, it seems to me that the lines you want all have 'Test' (mixed case) in them and the others don't ... Which would simplify your matching enormously.	[reply]