in reply to Re: Negating a regexp
in thread Negating a regexp

Not quite. I have a feeling more is not actually more. If so, this may fail. For example, if more is really less,
somestringbilless
doesn't get changed to
somestringHITless.

Replies are listed 'Best First'.
Re^3: Negating a regexp
by GrandFather (Saint) on Jan 31, 2006 at 23:18 UTC

    Modified regex to accomodate bil:

    #!/usr/bin/perl -w use strict; while (<DATA>) { s/(.*ing)((?!bob|fred|bill(?!ess))\w+?)(more|less)/$1HIT$3/; print $_; } __DATA__ somestringbobmoremore somestringfredmoremore somestringbillmoremore somestringtedmore somestringbilless

    Prints:

    somestringbobmoremore somestringfredmoremore somestringbillmoremore somestringHITmore somestringHITless

    DWIM is Perl's answer to Gödel
      Thanks to everyone for working solutions provided. To further refine the OP, as I tried to simplify for the sake of clarity : Consider :
      #!/usr/bin/perl -w use strict; while ( <DATA> ) { s/ angle brackets between element pairs / instead make square +bracketed / print $_ . "\n"; } __DATA__ <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains <this> in angle brackets </furtherta +g> <byetag> another possibility is <more> <than> one <angle pairs> in her +e </byetag>
      As I can no longer specify (more|less) as fixed patterns ( as per ikegami hunch ) Im again struggling to apply what is demonstrated to my actual data... Required output :
      <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains [this] in angle brackets </furtherta +g> <byetag> another possibility is [more] [than] one [angle pairs] in her +e </byetag>

        That looks like XML. I'd seriously consider using XML::Twig!

        If you need some help show us some representative data and what you actually want to extract


        DWIM is Perl's answer to Gödel

        This violates your "no code" premise, but if your tags are all on one line, this seems reasonably robust and should be fairly efficient.

        #! perl -sw use strict; while( <DATA> ) { s[^(<([^>]+?)>)(.*)(</\2>)]{ (my $x = $3) =~ tr[<>][[]]; "$1$x$4"; }e; print; } __DATA__ <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains <this> in angle brackets </furtherta +g> <byetag> another possibility is <more> <than> one <angle pairs> in her +e </byetag> <a tag with spaces> and content containing a false </a tag with spaces +> and a real </a tag with spaces>

        Produces

        C:\Perl\test>junk2 <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains [this] in angle brackets </furtherta +g> <byetag> another possibility is [more] [than] one [angle pairs] in her +e </byetag> <a tag with spaces> and content containing a false [/a tag with spaces +] and a real </a tag with spaces>

        Of course, it doesn't attempt to deal with attributes, or multi-line elements or nested tags or any of that good stuff.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Negating a regexp
by GrandFather (Saint) on Jan 31, 2006 at 22:54 UTC

    It is just what OP asked for. If OP wanted to match more or less then:

    s/(.*ing)((?!bob|fred|bill)\w+?)(more|less)/$1HIT$3/

    does the trick.

    Note that OP says:

    Im getting stuck specifying 'not' - OK with a caracter class, ( ie ^some chars ) - but not when using parenthesis to define multiple 'words'

    DWIM is Perl's answer to Gödel
      Nope, doesn't work.
      #!/usr/bin/perl -w use strict; while (<DATA>) { s/(.*ing)((?!bob|fred|bill)\w+?)(more|less)/$1HIT$3/; print $_; } __DATA__ somestringbilless
      outputs
      somestringbilless
      and not
      somestringHITless

        which is actually correct (bill is rejected remember). Try somestringtedless instead (ted is not rejected - lucky ted)


        DWIM is Perl's answer to Gödel