I'd really like to be able to put a bit more logic into my regexes. At the moment, I guess I can do this with the (??{ ... }) assertion.

I recently answered a question about inserting newlines every 80 characters, or at the end of the previous word (if we're in the middle of one). What I wanted my solution to be was something like:

That's not as easy as it seems. In a couple of minutes, I'm going to attempt to write a solution that does this. And then you'll see why I'd like to be able to go to a specific point in the regex, or at least be able to break out of a quantifier.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: goto() in regexes
by chipmunk (Parson) on Aug 27, 2001 at 19:21 UTC
    Here's a simple substitution that inserts a newline every 80 characters, or at the end of the previous word. ('word' here is a sequence of non-whitespace characters, since that's usually what you want for wrapping.) s/\G(.{1,80})(?<=\S)\s+/$1\n/gm;
      I don't think the look-behind assertion and anchor are necessary (the anchor actually causes it to terminate early on certain boundary cases with newlines):
      s/(.{0,79}\S\s+)/$1\n/gm
      ... I could be wrong :)
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
      Oh, ****.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: goto() in regexes
by japhy (Canon) on Aug 27, 2001 at 17:27 UTC
    Here's what I got to work. It's pretty self-explanatory.
    #!/tmp/bleadperl/bin/perl -w use re 'eval'; my $re; my $len = 20; $_ = "this is a long string, and I want to insert newlines every 20 ch +ars"; $re = qr{ (??{ '\b\w{1,' . ($len - ($+[0] - $-[0])) . '}\b' }) (?(?{ ($+[0] - $-[0]) < $len }) (??{ '\W{0,' . ($len - ($+[0] - $-[0])) . '}' }) (?(?{ ($+[0] - $-[0]) < $len }) (?(?= (??{ '\b\w{1,' . ($len - ($+[0] - $-[0])) . '}\b' })) (??{ $re }) ) ) ) }x; s{($re)}{$1\n}xg; print;
    What I want to be able to say is something like:
    $re = qr{ (??{ '\b\w{1,' . ($len - ($+[0] - $-[0])) . '}\b' }) (?(?{ ($+[0] - $-[0]) < $len }) (??{ '\W{0,' . ($len - ($+[0] - $-[0])) . '}' }) ) (?(?{ ($+[0] - $-[0]) == $len }) (?&done) ) }x; s{($re+)(?%done)}{$1\n}xg;
    That code creates two new regex assertions, (?%LABEL) and (?&LABEL). The first defines a position in the regex, and the second forcibly moves the regex engine to that position. Now, perhaps this is too much power to wield, but I like power. If people don't like this, then I would do something like:
    $re = qr{ (??{ '\b\w{1,' . ($len - ($+[0] - $-[0])) . '}\b' }) (?(?{ ($+[0] - $-[0]) < $len }) (??{ '\W{0,' . ($len - ($+[0] - $-[0])) . '}' }) ) (?(?{ ($+[0] - $-[0]) == $len }) (?;) ) }x; s{($re+)}{$1\n}xg;
    Here, the (?;) assertion means "jump out of the enclosing quantifier". This means that we're making a while loop out of the quantifier:
    while (1) do_pattern($re); if ($seen_qu_semi) { last } }
    Again, perhaps this is too much power. But I really like this power. However, perhaps I could simply develop some rules by which to munge the label/goto into an ugly format; or at least the out-of-quant into an ugly format.

    In the words of Tim "The Tool Man" Taylor, "more power! arr arr arr!"


    Update: I found that I could write it like so:

    $re = qr{ (??{ '\b\w{1,' . ($len - ($+[0] - $-[0])) . '}\b' }) (?(?{ ($+[0] - $-[0]) < $len }) (??{ '\W{0,' . ($len - ($+[0] - $-[0])) . '}' }) ) }x; s{ ( (?: (?(?{ ($+[0] - $-[0]) == $len })(?!)) $re )+ ) }{$1\n}xg;
    That seems easier to read. Plainer. More logical. It's a real while loop. I like it. Perhaps I'll create a filter.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;