in reply to Re^3: Negating a regexp
in thread Negating a regexp

Thanks to everyone for working solutions provided. To further refine the OP, as I tried to simplify for the sake of clarity : Consider :
#!/usr/bin/perl -w use strict; while ( <DATA> ) { s/ angle brackets between element pairs / instead make square +bracketed / print $_ . "\n"; } __DATA__ <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains <this> in angle brackets </furtherta +g> <byetag> another possibility is <more> <than> one <angle pairs> in her +e </byetag>
As I can no longer specify (more|less) as fixed patterns ( as per ikegami hunch ) Im again struggling to apply what is demonstrated to my actual data... Required output :
<sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains [this] in angle brackets </furtherta +g> <byetag> another possibility is [more] [than] one [angle pairs] in her +e </byetag>

Replies are listed 'Best First'.
Re^5: Negating a regexp
by GrandFather (Saint) on Feb 01, 2006 at 10:34 UTC

    That looks like XML. I'd seriously consider using XML::Twig!

    If you need some help show us some representative data and what you actually want to extract


    DWIM is Perl's answer to Gödel
      It is of a fashion - impossibly malformed XML. The data now given above ( and specified output ) is now fully representative, this tool is effectivley a parser to take some poor output, to poor, but well-formed XML. Although Im keen to continue with this approach, is the solution on a differant approach ?

        Actually that is a lot simpler to solve assuming one "element" per line:

        use warnings; use strict; while (my $line = <DATA>) { # Assume one "element" per line if ($line =~ /<(\w+)>(.*?)<\/\1>/) { (my $midstr = $2) =~ tr/<>/[]/; $line = "<$1>$midstr</$1>\n"; } print $line; } __DATA__ <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains <this> in angle brackets </furtherta +g> <byetag> another possibility is <more> <than> one <angle pairs> in her +e </byetag>

        Prints:

        <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains [this] in angle brackets </furtherta +g> <byetag> another possibility is [more] [than] one [angle pairs] in her +e </byetag>

        DWIM is Perl's answer to Gödel
Re^5: Negating a regexp
by BrowserUk (Patriarch) on Feb 01, 2006 at 11:16 UTC

    This violates your "no code" premise, but if your tags are all on one line, this seems reasonably robust and should be fairly efficient.

    #! perl -sw use strict; while( <DATA> ) { s[^(<([^>]+?)>)(.*)(</\2>)]{ (my $x = $3) =~ tr[<>][[]]; "$1$x$4"; }e; print; } __DATA__ <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains <this> in angle brackets </furtherta +g> <byetag> another possibility is <more> <than> one <angle pairs> in her +e </byetag> <a tag with spaces> and content containing a false </a tag with spaces +> and a real </a tag with spaces>

    Produces

    C:\Perl\test>junk2 <sometag> my data here </sometag> <anothertag> further text </anothertag> <furthertag> not good as contains [this] in angle brackets </furtherta +g> <byetag> another possibility is [more] [than] one [angle pairs] in her +e </byetag> <a tag with spaces> and content containing a false [/a tag with spaces +] and a real </a tag with spaces>

    Of course, it doesn't attempt to deal with attributes, or multi-line elements or nested tags or any of that good stuff.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks - Considering all the responses, I think attempting to do this +without+ a small amount of code, ie an a single 'replace' expression is not the best approach. Thanks for this solution.