Regex text extraction b/w first intance of pattern X and third instance of pattern Y.

cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by BrowserUk (Patriarch) on Mar 31, 2010 at 05:22 UTC
The "third instance of c" really means "everything not a c plus a c, plus everything not a c plus c, plus everything not a c upto but excluding the next c": `"abcbcbc" =~ m[ a ( [^c]* c [^c]* c [^c]* ) c ]x and print "'$1'";; 'bcbcb' ## or "abcbcbc" =~ m[a((?:[^c]c){2}[^c])c] and print "'$1'";; 'bcbcb'` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply] [d/l]
Re: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by ikegami (Patriarch) on Mar 31, 2010 at 05:14 UTC
You'll have better luck with regex if you think in terms of what you do want to match rather than what you don't want to match or what's around what you want to match. `my ($match) = $body =~ /(?<=a)([^c]c[^c]c[^c])(?=c)/;` [download] The above can also be written as `my ($match) = $body =~ /(?<=a)([^c](?:c[^c]){2})(?=c)/;` [download] Or maybe you want `my ($match) = $body =~ /(?<=a)([^ac](?:c[^ac]*){2})(?=c)/;` [download]	[reply] [d/l] [select]
Re: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. (ass u me) by tye (Sage) on Mar 31, 2010 at 13:57 UTC
I won't assume that "b is a variable" doesn't imply that "c" isn't a regex pattern instead of just a fixed string. For my example, I'll use your use of the term "body" to make a wild guess. `my $a= qr{<table[^>]>}; my $c= qr{<tr[^>]>}; my $d= qr{</table>}; for( $body ) { die "No $a" if ! /$a/g; my $start= pos(); die "No 1st $c" if ! /$c/g; die "No 2nd $c" if ! /$c/g; die "No 3rd $c" if ! /(?=$c)/g; my $end= pos(); pos()= $start; die "No 3rd $c before $d" if /$d/g && pos() < $end; return substr( $_, $start, $end-$start ); }` [download] Also, the first two responses don't enforce "first instance of $a" and so may cause problems if the requirements have the temerity to evolve (oh, I also pre-evolved them for you in my example). ;) - tye P.S. My real (wild) guess is that you are parsing e-mail bodies.	[reply] [d/l]
Re: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by JavaFan (Canon) on Mar 31, 2010 at 10:43 UTC
Unlike the two previous posters, I won't make the mistake of assuming you are using 'a', 'b' and 'c' here in another role as placeholders. In particular, I presume 'c' here to be some string, which can even be longer than a single character. Sometimes, it's easier to do things without a complicated regexp. Why not use index to find a starting position of the third appearance of 'c' first?: `my $index2 = -1; foreach (1 .. 3) { $index2 = index($body, "c", $index2 + 1); die "No third 'c'" if $index2 < 0; }` [download] Then use pos() to find where 'a' finished matching (or use index() as well): `$body =~ /a/g or die "No first 'a'"; my $index1 = pos($body); # my $index1 = index($body, 'a') + length('a');` [download] Then grab what's in between using the offsets: `my $inbetween = substr($body, $index1, $index2 - $index1);` [download] A few things are left as an exercise to the reader: What to do if the third occurrence of 'c' appears before the first 'a'. Use a match instead of index if 'c' is a pattern instead of a string. The above code counts overlapping occurrences. It's easy to adapt if you don't want to count overlaps. The code above assumes you want the third occurrence of 'c' from the start of the string. It's easy to adapt to get the third occurrence of 'c' following the 'a'.	[reply] [d/l] [select]
Re^2: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by BrowserUk (Patriarch) on Mar 31, 2010 at 10:56 UTC
I won't make the mistake of assuming you are using 'a', 'b' and 'c' here Why is it a mistake to assume that he is doing what he says he is doing, and what his example shows he is doing? I presume 'c' here to be some string, which can even be longer than a single character. A why is it better for you to presume that he means something other than what he says? I think that your re-interpretation of the question asked, is a valid and interesting adjunct to it, but why the baseless snide narrative? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
Re^3: Regex text extraction b/w first intance of pattern X and third instance of /pattern Y/. (snide) by tye (Sage) on Mar 31, 2010 at 14:09 UTC
`/a(.c.c.)c/ # then` [download] - tye	[reply] [d/l]
Re^3: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by roboticus (Chancellor) on Mar 31, 2010 at 14:43 UTC
BrowserUK: I think the "expanded" problem is reasonable given the title, since the title specifies patterns X and Y rather than specific characters. The way I read the OP is that (s)he properly simplified it to a trivial case in the example. ...roboticus	[reply]
Re^4: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by BrowserUk (Patriarch) on Mar 31, 2010 at 17:15 UTC
Re^2: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by ikegami (Patriarch) on Mar 31, 2010 at 15:34 UTC
If it's not just `c`, my solution still works with the equivalent of `[^c]` for arbitrary patterns. But since I don't know what pattern that might be, I stuck to what the OP said rather than inventing stuff up.	[reply] [d/l] [select]
Re^3: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. (!$c) by tye (Sage) on Mar 31, 2010 at 15:53 UTC
`(?:(?!$c).)*` [download] - tye	[reply] [d/l]
Re: Regex text extraction b/w first intance of pattern X and third instance of pattern Y. by repellent (Priest) on Apr 01, 2010 at 04:56 UTC
`sub match_in_between { my ($str, $r1, $r2, $n) = @_; return undef unless 4 == grep { defined() } ($str, $r1, $r2, $n); my ($match, $end) = ($str =~ /$r1((?:.*?($r2)){$n})/); return "" unless defined($match); return substr($match, 0, rindex($match, $end)); } print match_in_between("abcbcbc", qr/a/, qr/c/, 3); # bcbcb print match_in_between("cccabcbcbc", qr/a/, qr/c/, 3); # bcbcb print match_in_between("abcbxbcff", qr/a/, qr/b./, 3); # bcbx` [download]	[reply] [d/l]