in reply to Regular Expressions: Call for Examples

Well, I went truding around and I found these:

This first one was from a client who accidentally did something like: s/\n//g; to a few hundred text files. The files happened to be lists:

1) foo and his friend bar 2) Stuff and more (stuff) 3) More (and30) more (and) more 4) garbage (8) 5) some other things 6) (7lalala)
which turned into:
1) foo and his friend bar2) Stuff and more (stuff)3) More (and30) more + (and) more4) garbage (8)5) some other things6) (7lalala)
Anyways, it was my job to fix them. I was having a difficult time correctly parsing, but ended up solving the problem using your sexeger technique.
$text = reverse $copy; $text =~ s/ (?<= \) ) (\d+) (?= [^()]* \) ) /$1\n/gx; $text = reverse $text;
A few weeks later, for fun, I was able to solve the problem using a forward regex:
my $bal = # this is from perlre qr/ \( (?: (?> [^()]+ ) | (??{$bal}) )* \) /x; $text =~ s/ ( (?: (??{$bal}) [^(\d]* )* ) (\d+) (?= \) ) /$1\n$2/xg;

Which is exponentially uglier, and proves just how useful sexeger really is.

Another possibly useful example is a dealing from irc, where the person needed to perform a crude form escaping that involved stripping all backslashes that were between brackets. This solved his problem:

$text =~ s/ (?<= \[ ) ([^\]]*) (?= \] ) / strip_slash($1) /gex; sub strip_slash { $_=pop; s/\\//g; $_; }

Finally, theres a bunch of stuff at the end of Parsing with Perl 6 you might find useful...

Replies are listed 'Best First'.
Re: Re: Regular Expressions: Call for Examples
by converter (Priest) on Jul 22, 2002 at 08:39 UTC

    Since I'm not yet the regex master I aspire to be, I can't authoritatively state that this solution is better, but it seems to work.

    If you're working with ordered item labels you can make your assertion more specific:

    $n = 1; s/((??{$n+1})\))(?{$n++})/\n$1/g;

    The first iteration matches "2)" and replaces it with "\n2)", the second "3)", and so on.

    conv

    Update: I should know better than to post when I'm tired. Someone just pointed out to me that it would be much neater to do:

    $n = 2; ++$n while s/$n(?=\))/\n$n/

    Thanks, Aristotle, you're right. The while loop substitution isn't equivalent because it will make replacements in any order (at any position in the string) while the original substitution I posted will not.

      Actually, they are not interchangeable: the latter loses the "ordered items" assumption. Observe what they do with 2) bar 3) asfgh 7) lorem 6) ipsum 1) foo 5) baz 4) blah

      I tried fixing that using \G, but didn't come up with anything useful in 5 minutes and gave up since it would have been a lot more complicated than your first regex which I believe is just perfect.

      japhy: I like the scenario presented here. This is a regex (series) I'd propose you pick up; it's simple in premise and not far from something one might actually have to do one day, and it's not hard even for a novice to follow along on the subleties in the differences of each approach. A perfect teaching example, if you ask me.

      Makeshifts last the longest.