putnik has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have to build a substitution regexp, but this case is too hard for me.
We have a data:
1stline....pattern1.......pattern1..... 2ndline....pattern1.....pattern2...pattern1... 1a line....pattern1.......pattern1..... 2a line....pattern1........pattern2...pattern1.....
"\n" and "." doesn't matter (so I can't split it into many lines and process separately). line numbers just for reference, and real data may mix 1st and 2nd type of lines in random way.

what we must get:
1stline....pattern1.......pattern1..... 2ndline....subst... 1a line....pattern1.......pattern1..... 2a line....subst.....
I realize I should do it with
s/pattern1(not-pattern1)*?pattern2(not-pattern1)*?pattern1/subst/isg
but hung.
What I already tried:

pattern1([^p][^a][^t]...[^1])*?pattern2([^p][^a][^t]...[^1])*?pattern1 pattern1.(pattern1){0}*?pattern2.(pattern1){0}*?pattern1
and so on.
Of couse tried:
O'Reilly Perl bookshelf
man perlretut
Also I tried to search, but (probably because of my English) I didn't find anything helpful.

perl, v5.8.6, Linux.

Please, help. Thank you very much.

Replies are listed 'Best First'.
Re: How can I build regexp with "not" assertion?
by Erez (Priest) on Feb 17, 2008 at 14:20 UTC

    Perl supports negative lookbehing and negative lookahead, filed under EXTENDED CONSTRUCTS of perlreref and Extended Patterns of perlre. Lookbehing is slightly limited in scope (i.e no quantifiers), but lookahead isn't.
    Also, in regex, you might be better off stating what you *are* looking for and unless the rest

    UPDATE:Fixed perldoc links, thanks kyle!

    Software speaks in tongues of man.
    Stop saying 'script'. Stop saying 'line-noise'.
    We have nothing to lose but our metaphores.

      Thank you, I'll go to read, and be back if I miss something :)
Re: How can I build regexp with "not" assertion?
by parv (Parson) on Feb 17, 2008 at 12:58 UTC

    Given your example, if you are trying to do substitutions on all the lines which match all the 2-numbered lines, just ignore the "not parts" ...

    $line =~ s/(pattern1).+?pattern2.+?\1/substitute/;

    ... Above will work since "pattern2" does not appear (per your example) in any lines identified by the 1s, so they remain unchanged.

      Thank you, but sorry, it work in wrong way. I tried this, but got
      1stline....subst... 1a line....subst.....
      I mean it substituted all in parenthesis this way:
      1stline....(pattern1.......pattern1..... 2ndline....pattern1.....pattern2...pattern1)...
      I CAN'T split it in lines, i must treat all dada as a single line because \n may appear in any place
Re: How can I build regexp with "not" assertion?
by johngg (Canon) on Feb 17, 2008 at 17:47 UTC
    Rather than using a pure regex solution, you can find all of the starting offsets of pattern1 and pattern2 and then use List::Util::first (along with reverse) to find which pairs of pattern1 bracket each pattern2. Once you have that you can use substr to replace the text working from the end of the string backwards so the offsets aren't invalidated as the string changes.

    use strict; use warnings; use List::Util q{first}; my $patt1 = q{ABC}; my $patt2 = q{XYZ}; my $replace = q{999999}; my $string = <<EOD; kjdfjdfXYZewfkfABClkjfef sahasjABCsjhksfhABCsjsjfs oreweouABCkerjeXYZewfkfABClkjfef xcvmvbbbvABCdjfABCjsdjfsdf jjnnjfABDjfXYZdjdjABCjfdkjfABClsfj isosiXYZcsfsjfABChfdhgfABCyeryerXYZffjfs EOD print $string, q{-} x 25, qq{\n}; my @patt1Posns = (); push @patt1Posns, ( pos( $string ) - length $patt1 ) while $string =~ m{\Q$patt1}g; my @patt2Posns = (); push @patt2Posns, ( pos( $string ) - length $patt2 ) while $string =~ m{\Q$patt2}g; my @substituteSets = (); foreach my $patt2Posn ( @patt2Posns ) { my $start = first { $_ < $patt2Posn } reverse @patt1Posns or next; my $end = first { $_ > $patt2Posn } @patt1Posns or next; push @substituteSets, [ $start, $end - $start + length $patt1 ]; } for my $raSubstituteSet ( reverse @substituteSets ) { substr $string, $raSubstituteSet->[ 0 ], $raSubstituteSet->[ 1 ], $replace; } print $string, q{-} x 25, qq{\n};

    This produces

    kjdfjdfXYZewfkfABClkjfef sahasjABCsjhksfhABCsjsjfs oreweouABCkerjeXYZewfkfABClkjfef xcvmvbbbvABCdjfABCjsdjfsdf jjnnjfABDjfXYZdjdjABCjfdkjfABClsfj isosiXYZcsfsjfABChfdhgfABCyeryerXYZffjfs ------------------------- kjdfjdfXYZewfkfABClkjfef sahasjABCsjhksfhABCsjsjfs oreweou999999lkjfef xcvmvbbbvABCdjf999999jfdkjf999999hfdhgfABCyeryerXYZffjfs -------------------------

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Fixed typo. s/and/as/

Re: How can I build regexp with "not" assertion?
by ikegami (Patriarch) on Feb 17, 2008 at 18:29 UTC
    This will probably work, depending on what pattern1 and pattern2 really are:
    / pattern1 (?:(?!pattern1).)* pattern2 (?:(?!pattern1).)* pattern1 /x
      THANK YOU! THANK YOU! THANK YOU!
      It greatly work.
      Googd Luck!
Re: How can I build regexp with "not" assertion?
by nikhil.patil (Sexton) on Feb 18, 2008 at 07:00 UTC
    I think what you need is:
    $line =~ s/(pattern1)((?!pattern1).)+pattern2.+?\1/substitute/gs;
    It worked well on Perl v5.8.8
      Thank you, your solution work too :) I didn't check it tightly, but it's look suitable.