Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl use strict; use warnings; my $tag; while (<DATA>) { #print "$_"; s/\s*<\s*(\/?)\s*(\w+)\s*>\s*/$1?"\n\n":"\n\n\{$2\}\n\n"/eg; print "After substitution: $_\n"; chomp; s/[\cA-\cZ]//g; # To remove control characters #print "Again printing the \$_ : $_\n"; s/^[\\|<]$//g; # To delete the character like \ and < at the e +nd of the line s/[\\|<]$//; # To delete the character like \ and < at the beg +ining of the line s/^\s+//g; # To remove multiple spaces at the begining of the + line s/\s+$//g; # To remove spaces at the end of the line if(/^{(.*)}$/) { # match {METATAG} line $tag = $1; #print "The tag is $tag\n"; } } __DATA__ {SOURCETAG} 0904230634 {DATE} 090424 {EDITION} 1 {HEADLINE} heredero del famoso deportista mexicano, lucha por enaltecer la vida y + obra del autor de sus dM-mas {SOURCE} Por Gisela Orozco 312.527.8461/ Chicago\ <LINE> Por Gisela Orozco< TTL +>312.527.8461/ Chicago</BYTTL>
<LINE> and < TTL> is replaced with {LINE} and {TTL} but it is not considered in the {METATAG} Line.
When I print the print "After substitution: $_\n"; After substitution: is not considered for {LINE} and {TTL}

Replies are listed 'Best First'.
Re: regular expression ignores two lines
by Bloodnok (Vicar) on Aug 20, 2009 at 09:13 UTC
    Matching braces are special - unescaped, they are used signify a repeat count for the previous expression - try escaping them i.e. re-write
    if(/^{(.*)}$/) {
    to read
    if (/^\{(.*)\}$/) {
    A user level that continues to overstate my experience :-))
      Even after escaping the,
      #!/usr/bin/perl use strict; use warnings; my $tag; while (<DATA>) { s/\s*<\s*(\/?)\s*(\w+)\s*>\s*/$1?"\n\n":"\n\n\{$2\}\n\n"/ge; chomp; s/[\cA-\cZ]//g; # To remove control characters #print "Again printing the \$_ : $_\n"; s/^[\\|<]$//g; # To delete the character like \ and < at the e +nd of the line s/[\\|<]$//; # To delete the character like \ and < at the beg +ining of the line s/^\s+//g; # To remove multiple spaces at the begining of the + line s/\s+$//g; # To remove spaces at the end of the line if (/^\{(.*)\}$/) { $tag = $1; print "The tag is $tag\n"; } } __DATA__ {SOURCETAG} 0904230634 {DATE} 090424 {EDITION} 1 {HEADLINE} heredero del famoso deportista mexicano, lucha por enaltecer la vida y + obra del autor de sus dM-mas {SOURCE} Por Gisela Orozco 312.527.8461/ Chicago\ <LINE> Por Gisela Orozco< TTL +>312.527.8461/ Chicago</TTL>
      The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE

        Because your regex is anchored to the start and end of the string, so it'll only find a tag that's on a line by itself:

        if (/^\{(.*)\}$/) { $tag = $1; print "The tag is $tag\n"; }

        Your {TTL} and {LINE} tags are in the middle of the line, so they won't get picked up by the regex.

        You could try something like this instead:

        while (/\{(.*?)\}/g) { $tag = $1; print "The tag is $tag\n"; }

        Output:

        The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE The tag is LINE The tag is TTL
        I'm no longer sure what it is that you're asking - the metatags would now appear to being printed - which seemed, to me, to be the problem in the first place.

        Or maybe the problem, as stated, is either an XY or NWITIAF (Not What i Thought I Asked For) problem and the question ought to have been that the values of the tags are not being printed - in which case you ought to read one, or more, of the numerous postings on multi-line matches...

        A user level that continues to overstate my experience :-))
        If I understand it correctly then the problem you have is that 'LINE' and 'TTL' are not passed through the if condition or in short lines 'The tag is LINE' and 'The tag is TTL' are not printed. If that is true than revisit your condition '/^\{(.*)\}$/' it says that the line should start with '{' and end with '}' which is not the case therefore the condition fails. To solve it try following instead of 'if'
        while(/{(.*?)}/g) {
        Here is the output you get.
        The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE The tag is LINE The tag is TTL
        Update: whoops my suggestion looks like a copy of Crackers2 :)
        Regards,
        Ashish
        Why don't you tell the name of the format?