satishchandra has asked for the wisdom of the Perl Monks concerning the following question:

hi,

for eg: if this is my line <head><to>tove</to></head> i got the position of each matched pattern now if i want only tove to be printed how can i do with substr()

while($line = <MYFILE>){ if($line =~ m/<.[^>]*>/g){ $new = substr($line,0,10); print "THE NEW LINE is : $new\n"; } }

Replies are listed 'Best First'.
Re: use of substr() with file
by Khen1950fx (Canon) on Feb 18, 2011 at 20:25 UTC
    You had an uninitialized value in your regex. Hmm... here's an easier way:
    #!/usr/bin/perl use strict; use warnings; my $line = '<head><to>toe</to></head>'; print substr($line, 10, 3), "\n";
Re: use of substr() with file
by umasuresh (Hermit) on Feb 18, 2011 at 20:26 UTC
    If I understand your question correctly:
    use strict; use warnings; while(<DATA>) { my $line = $_; if($line =~ m/<.*>([^>]+)<.*>/g) { my $new = substr($line,0,10); print "THE NEW LINE is : $new\n"; print "capture only : $1\n"; } } __DATA__ <head><to>tove</to></head>
      if(//g) ⇒ bug. Get rid of the "g".

      Good, but just a skosh short of a full dozen -- for lack of generality:

      #!/usr/bin/perl use strict; use warnings; # 888985 while(<DATA>) { my $line = $_; if($line =~ m/<.*>([^>]+)<.*>/) { my $new = substr($line,0,10); print "THE NEW LINE is : $new\n"; print "capture only : $1\n"; # better to get the capture into + a named var } # as soon as it's captured. } __DATA__ <html> <head><to>tove</to></head> <body> <p>now is the time<br />to deal with a line break.</p> </body> </html>

      Output

      THE NEW LINE is : <head><to> capture only : tove THE NEW LINE is : <p>now is capture only : to deal with a line break.

      Using substr($line,0,10); is going to fail anytime the data doesn't conform and, IMO, seems to demand that the programmer have detailed knowledge of the content (in which case, why bother?).

      Omitting substr on the theory that OP really wants the captured data doesn't quite work either: Multiple tags (here, the <br /> (I think) can get swallowed up in the greediness of <.*> et seq.

      Update/Clarification: Re pushing multiple lines and tags into this question, see OP's PRIOR posts on what appears to be a single project.