HI

I have a problem with the following script that needs to grab all instances of specific airplane names inside of tags that have a number reference at the end. I know that the number reference should have been in the starting tag... but can anyone check please if there is a way of making this code work

a million thanks Monks

$in = "captionOutTagged.xml"; open (IN, $in) or die "can't open the infile $in \n"; ##### while (not eof (IN)){ $line = <IN>; chomp $line; #print "$line\n\n"; # airplanes models: have a digit at the end of second tag if ( @terms = $line =~ /\<M\>(.*?)\<\/M\d+?\>/gix ) { # print "=**$1**=\n"; } # no number then avionics general terms elsif (@avionics = $line =~ /\<M\>(.+?)<\/'M'\>/gix) { #print " LINE: $line\n"; #print "$1\n"; } ########## foreach $term (@terms){ print "$term\n"; } foreach $avionic (@avionics){ print "$avionic\n"; } ######### } # end

America's first <M>swept-wing</M>, <M>multiengine jet</M> <M>bomber</M> was the <M>B-47 Stratojet</M200>, and the first <M>swept-wing fighter</M> was the <M>F-86 Sabre Jet</M201>. Both used new swept-wing data found in Germany after <M>World War II</M> and sent back to the United States by American scientists. This photograph, from <D>1951</D>, was taken the first time the two flew together over <PL>Kansas</PL>.

curent output:

swept-wing</M>, <M>multiengine jet</M> <M>bomber</M> was the <M>B-47 Stratojet swept-wing fighter</M> was the <M>F-86 Sabre Jet

desired autput:

B-47 Stratojet

F-86 Sabre Jet


In reply to Regexp Problem with greedy by Isanchez

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.