Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I wanna balance the tag with paren (escape char). Below coding does not yield correct output for me. If i remove paren around 1989, then i'm getting the correct output. Can anyone suggest me how to get the output without removing paren?
use Regexp::Common; use Cwd; use Regexp::Common::balanced; my $x = q(<extract> dfjklasjfdk jdflkasdjflkasd (1989) fdjsaflkajdf kkjdslkfjasdk <extract> dfsdlkfjdsa fdfadsfsad</extract> sajflksadjfklasd fkasdjfsadf Close</extract> <line> df adfjlkadf</line><extract> dfjalkdf akdjfklasdjflkadsjfasdclose1</extract>); my $e; my @ele = ('extract','line'); foreach $e (@ele) { while ($x=~/$RE{balanced}{-begin=>"<$e"}{-end=>"<\/$e>"}{-keep}/gs) { my $begin = "$1"; $x =~ s/($begin)/&$e($1,$e)/egsi; print "$x\n"; } } sub extract { my $a = $_[0]; my $tag = $_[1]; $a =~ s/<\/$tag>/<\/$tagæ>/gi; my @a=map(!/^(<e)/?("\n<e>".$_."<\/e>"):"\n".$_, split(/\n/,$a)); return "@a"; } sub line { my $a = $_[0]; my $tag = $_[1]; $a =~ s/<\/$tag>/<\/1$tag>/gi; my @a=map(!/^(<li)/?("\n<li>".$_."<\/li>"):"\n".$_, split(/\n/,$a)); return "@a"; }
Thanks in advance for your suggestion regards, --B

Replies are listed 'Best First'.
Re: Tag balanced with escape characters
by idsfa (Vicar) on Jul 27, 2005 at 14:17 UTC

    You are trying to parse *ML with a regex. JUST SAY NO. Use an appropriate parsing module instead. (SGML::Parser, HTML::Parser, XML::Parser, etc) Because *ML can be nested, or have problems with conflicting tags and is just generally not regular, you should use a tool designed to pull it apart.

    That said, your problem is that you are using parens as the beginning and ending delimiters on your q(). Try switching delimiters to something not in your input text. Maybe q{}?

    Updated: xml parsers: do I need one?, How to use Regular Expressions with HTML, is XML too hard?, No xml module please ... for more insight


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon
Re: Tag balanced with escape characters
by tphyahoo (Vicar) on Jul 27, 2005 at 08:43 UTC
    I don't get the deal with the escape character. What is the fourth character in this... $tagæ thing? What are you trying to do?
      I am trying to change the closing tag with some thing else, so that open and closing tag pattern will not match again with the while loop.
      I tried using "\Q\E" but still i didn't get the required output.
      regards --B
        I still don't really get it.

        Can you repost your question without the funny special character, or is the special character a part of your problem?

        If you need that special character, then it might be helpful to do a regex using hex codes. EG, " " =~ /\x020/, where 20 is the hex code for space, if I recall correctly. Figure out the hex code for that funny character, and use the hex code instead in your regex.

        This will help if the problem you are having is due to the fact that the text editor you use to view/type interprets data differently from perl.