in reply to regular expression ignores two lines

Matching braces are special - unescaped, they are used signify a repeat count for the previous expression - try escaping them i.e. re-write
if(/^{(.*)}$/) {
to read
if (/^\{(.*)\}$/) {
A user level that continues to overstate my experience :-))

Replies are listed 'Best First'.
Re^2: regular expression ignores two lines
by Anonymous Monk on Aug 20, 2009 at 09:24 UTC
    Even after escaping the,
    #!/usr/bin/perl use strict; use warnings; my $tag; while (<DATA>) { s/\s*<\s*(\/?)\s*(\w+)\s*>\s*/$1?"\n\n":"\n\n\{$2\}\n\n"/ge; chomp; s/[\cA-\cZ]//g; # To remove control characters #print "Again printing the \$_ : $_\n"; s/^[\\|<]$//g; # To delete the character like \ and < at the e +nd of the line s/[\\|<]$//; # To delete the character like \ and < at the beg +ining of the line s/^\s+//g; # To remove multiple spaces at the begining of the + line s/\s+$//g; # To remove spaces at the end of the line if (/^\{(.*)\}$/) { $tag = $1; print "The tag is $tag\n"; } } __DATA__ {SOURCETAG} 0904230634 {DATE} 090424 {EDITION} 1 {HEADLINE} heredero del famoso deportista mexicano, lucha por enaltecer la vida y + obra del autor de sus dM-mas {SOURCE} Por Gisela Orozco 312.527.8461/ Chicago\ <LINE> Por Gisela Orozco< TTL +>312.527.8461/ Chicago</TTL>
    The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE

      Because your regex is anchored to the start and end of the string, so it'll only find a tag that's on a line by itself:

      if (/^\{(.*)\}$/) { $tag = $1; print "The tag is $tag\n"; }

      Your {TTL} and {LINE} tags are in the middle of the line, so they won't get picked up by the regex.

      You could try something like this instead:

      while (/\{(.*?)\}/g) { $tag = $1; print "The tag is $tag\n"; }

      Output:

      The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE The tag is LINE The tag is TTL
      I'm no longer sure what it is that you're asking - the metatags would now appear to being printed - which seemed, to me, to be the problem in the first place.

      Or maybe the problem, as stated, is either an XY or NWITIAF (Not What i Thought I Asked For) problem and the question ought to have been that the values of the tags are not being printed - in which case you ought to read one, or more, of the numerous postings on multi-line matches...

      A user level that continues to overstate my experience :-))
      If I understand it correctly then the problem you have is that 'LINE' and 'TTL' are not passed through the if condition or in short lines 'The tag is LINE' and 'The tag is TTL' are not printed. If that is true than revisit your condition '/^\{(.*)\}$/' it says that the line should start with '{' and end with '}' which is not the case therefore the condition fails. To solve it try following instead of 'if'
      while(/{(.*?)}/g) {
      Here is the output you get.
      The tag is SOURCETAG The tag is DATE The tag is EDITION The tag is HEADLINE The tag is SOURCE The tag is LINE The tag is TTL
      Update: whoops my suggestion looks like a copy of Crackers2 :)
      Regards,
      Ashish
      Why don't you tell the name of the format?