I really don't mind using more than one regex for this one. You're dealing with more than one rule, so there's a nice symmetry; each rule has corresponding code. If you are concerned with it being verbose where you want it to be terse, move the work out to a subroutine.

Anyway, with those ideas, here's my version:

use Test::More; my @test = ( [ '1 This (is a test) with good parens' => 'is a test', 'Match in parens' ], [ '2 This is a (test with broken a paren' => 'test with broken a par +en', 'Match after left paren' ], [ '3 And this would be one) the other way' => '3 And this would be o +ne', 'Match before right paren' ], [ '4 Lastly, no parens' => '', 'No match' ], ); foreach my $test (@test) { my $got = match( $test->[0] ); is( $got, $test->[1], "$test->[2]: <<$got>>" ); } done_testing(); sub match { for (shift) { m/ \(([^)]*?)\) /x && return $1; # Both parens. m/ \((.*)$ /x && return $1; # Left paren. m/ ^(.*)\) /x && return $1; # Right paren. m/ ^[^()]*()$ /x && return $1; # No parens (no capture). return; # Unreachable. } }

Update: As often happens, I just have to go to bed to have an idea disturb me. Here's an improvement (I think) on sub match:

sub match { local $_ = shift; m/ \(([^)]*?)\) /x # Both parens. || m/ \((.*)$ /x # Left paren. || m/ ^(.*)\) /x # Right paren. || m/ ^[^()]*()$ /x; # No parens (no capture). return $1 // (); }

Here's another version that combines the logic above into a single regex using alternation. I don't necessarily think this is better; I prefer the simplicity of breaking things into smaller regexes.

sub match { shift =~ m/ (?: [^(]*\((?<C>[^)]*?)\) ) # Both parens. | (?: \((?<C>.*)$ ) # Left paren. | (?: ^(?<C>.*)\) ) # Right paren. | (?: ^[^()]*(?<C>)$ ) # No parens (empty capture +). /x; return $+{C} // (); }

By using named captures we avoid the problem where other single-regex solutions result in either $1, or $2, or $3 being populated. That's too much to keep track of, and could be error prone. Instead, we name every capture the same: $+{C}. (Warning: After checking perlre, I'm of the vague and uncertain impression that this could rely on undefined behavior.)

Update: Having a little fun with this. Here are two more options with subtle changes from the previous.

The next example eliminates named captures. This would present a problem: The numeric match variable that accepts the capture could be $1, $2, or $3. choroba avoids this issue by concatenating all possible numeric match variables, but that means possibly interpolating undef, and feels a little dirty (but it is clever). We can avoid that by using $^N, which will contain the most recent submatch.

sub match { shift =~ m/ (?: [^(]*\(([^)]*?)\) ) # Both parens. | (?: \((.*)$ ) # Left paren. | (?: ^(.*)\) ) # Right paren. | (?: ^[^()]*()$ ) # No parens (empty capture). /x; return $^N // (); }

This next one wraps all the alternation branches in the (?|...) branch reset construct. That means that each alternate will use the same $1, which is actually the closest I can come to the multiple-regex solutions I originally presented, but within a single regex.

sub match { shift =~ m/ (?| (?: [^(]*\(([^)]*?)\) ) # Both parens. | (?: \((.*)$ ) # Left paren. | (?: ^(.*)\) ) # Right paren. | (?: ^[^()]*()$ ) # No parens (empty capture). ) /x; return $1 // (); }

And finally we can remove the grouping (?...) parens, because alternation is already very low precedence:

sub match { shift =~ m/ (?| [^(]*\(([^)]*?)\) # Both parens. | \((.*)$ # Left paren. | ^(.*)\) # Right paren. | ^[^()]*()$ # No parens (empty capture). ) /x; return $1 // (); }

I think that this, being Perl, grants us license to explore in the spirit of There is more than one way to do it. :)


Dave


In reply to Re: Regex to match text in broken parens by davido
in thread Regex to match text in broken parens by Rodster001

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.