comment on

Are you sure you want a match, and that you can only use a single regex? I have bad news... a regular language has no context. However, a regular expression and another tool or handful of tools can easily get you there. Take, for example, the substitution operator, a counter with a loop and some more regexes, or a regex match and a split on the match... Of course, feel free to use Text::Balanced as atcroft suggests or use some other toolset built for the level of the problem you're trying to solve. Regexes will only solve a subproblem of your problem.

Here's the data file for the following examples.

THIS IS OUTSIDE (THIS IS INSIDE)
(inside) outside
before (within) after
before (within) between (within again) after
b ((nested)) a
before (within (nested)) after
This one hangs (with an unmatched open
This one has () an empty pair
This opens (with one)) and double closes
this is the last (really) one
[download]

Now here's the first example, using the substitution operator.:

#!perl
use strict;
use warnings;
use 5.12.0;

my $cleanup = 1;

while ( <> ) {
    chomp;
    s/\(+.*?\)+//g;
    y/ / /s, s/^\s|\s$// if $cleanup;
    say;
}
[download]

The above code produces the following output by substituting 0 characters in place of any pair of parentheses with anything between them. As written, it eliminates matched pairs and their contents but will also eliminate an extra closing parenthesis and will include in the output an opened but not closed parenthetical.:

THIS IS OUTSIDE
outside
before after
before between after
b a
before after
This one hangs (with an unmatched open
This one has an empty pair
This opens and double closes
this is the last one
[download]

Or if prefer to preserve whitespace as it was, set $cleanup to 0

This next example produces mostly the same output as the ▂leanup = 0 version of the above. It does so by counting nesting level of the parentheses after splitting the string into an array of characters. It then appends to the output string if the nesting level is 0 (outside of any pairs of parentheses). This one will produce its last output before a hanging opened and unclosed pair. It will, as written, also not include in the output negative nesting levels (text trailing an extra close unmatched by an open).

#!perl
use strict;
use warnings;
use 5.12.0;

while ( <> ) {
    chomp;
    my $str = '';
    my @parts = split //;
    my $inside = 0;
    for ( @parts ) {
        /\(/ && $inside++ && next;
        /\)/ && $inside-- && next;
        $str .= $_ unless $inside;
    }
    say $str;
}
[download]

THIS IS OUTSIDE
 outside
before  after
before  between  after
b  a
before  after
This one hangs
This one has  an empty pair
This opens
this is the last  one
[download]

Or if you want to feed the match from one regex match into a split on that match...

#!perl
use strict;
use warnings;
use 5.12.0;

while ( my $str = <> ) {
    chomp $str;
    my $extract = join '|', map { "\Q$_\E" } ( $str =~ m/(\(+.*?\)+)/g
+ );
    say join '', split /$extract/, $str;
}
[download]

The above works because we know what we want to eliminate, which is a good use for split. In this particular case, we don't have a fixed regex against which to split, but we know how to match what we don't want. This solution captures that unwanted part, quotes it with \Q and \E, joins any multiples with the regex alternation (pipe, or '|'), then uses split and join to leave what's left of the string as a single string. This as written will only eliminate matched pairs and their contents. This is basically emulating the substitution operator. One's intuition may be that since it's a more detailed treatment it'll be faster. However, we're more Perl here, and the substitution operator is highly optimized. I don't know without benchmarking by how much, but I'm betting the example with the s/// is faster.

The second example above is fairly easy to tweak to give the sort of error messages you might expect out of a lexer, since it is kind of a degenerative case of one.

#!perl
use strict;
use warnings;
use 5.12.0;

my $inside = 0;
while ( <> ) {
    chomp;
    my $str = '';
    my @parts = split //;
    $inside = 0;
    for ( @parts ) {
        0 > $inside && $inside++ && warn "WARNING: extra close on line
+ $.\n";
        /\(/ && $inside++ && next;
        /\)/ && $inside-- && next;
        $str .= $_ unless $inside > 0;
    }
    warn "WARNING: Unclosed parenthetical on line $.\n" if $inside;
    say $str;
}
[download]

THIS IS OUTSIDE
 outside
before  after
before  between  after
b  a
before  after
WARNING: Unclosed parenthetical on line 7
This one hangs
This one has  an empty pair
WARNING: extra close on line 9
This opens ) and double closes
this is the last  one
[download]

In reply to Re: Regex for outside of brackets by mr_mischief
in thread Regex for outside of brackets by theravadamonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.