nganesh has asked for the wisdom of the Perl Monks concerning the following question:

<lt=c24>PRL IJB PRL CHA <refauth><fname>K.</fname> <middlename>M.</mid +dlename> <surname>Cuomo</surname></refauth> and <refauth><fname>A.</f +name> <middlename>V.</middlename> <surname>Oppenheim</surname></refau +th>, <jtitle>Phys. Rev. Lett.</jtitle> <B><volume>71</volume>,</B> <p +ages>65</pages> (<date>1993</date>); <refauth><fname>L.</fname> <surn +ame>Kocarev</surname></refauth>, <refauth><fname>K.</fname> <middlena +me>S.</middlename> <surname>Halle</surname></refauth>, <refauth><fnam +e>K.</fname> <surname>Eckert</surname></refauth>, <refauth><fname>L.< +/fname> <middlename>O.</middlename> <surname>Chua</surname></refauth> +, and <refauth><fname>U.</fname> <surname>Parlitz</surname></refauth> +, <jtitle>Int. J. Bifurcation Chaos Appl. Sci. Eng.</jtitle> <B><volu +me>2</volume>,</B> <pages>973</pages> (<date>1992</date>); <refauth>< +fname>L.</fname> <surname>Kocarev</surname></refauth> and <refauth><f +name>U.</fname> <surname>Parlitz</surname></refauth>, <jtitle>Phys. R +ev. Lett.</jtitle> <B><volume>74</volume>,</B> <pages>5028</pages> (< +date>1995</date>); <refauth><fname>N.</fname> <middlename>F.</middlen +ame> <surname>Rulkov</surname></refauth>, <jtitle>Chaos</jtitle> <B>< +volume>6</volume>,</B> <pages>262</pages> (<date>1996</date>).</lt>
How to backtrack the below pairs of information from the above line. the number of pairs/line may chage from a range of 0 to 5
PRL <jtitle>Phys. Rev. Lett.</jtitle> IJB <jtitle>Int. J. Bifurcation Chaos Appl. Sci. Eng.</jtitle> PRL <jtitle>Phys. Rev. Lett.</jtitle> CHB <jtitle>Chaos</jtitle>
Regards,
Ganesh

20050624 Edit by ysth: code, br tags

Replies are listed 'Best First'.
Re: Backtracking
by tlm (Prior) on Jun 24, 2005 at 11:31 UTC

    I will guess that by "backtrack" what you mean is "extract". If so, then look into XML::Parser or XML::Twig.

    the lowliest monk

Re: Backtracking
by fmerges (Chaplain) on Jun 24, 2005 at 10:14 UTC
    Hi
    I don't really understood the problem, but I you want to get out this tags you could use a loop to get over the lines, or parse using a Parser suitable, and store the values into a hash, you could use something like:
    $hash = { jtitle => ['a', 'b'], ... }
    Regards,
    :)
Re: Backtracking
by Animator (Hermit) on Jun 24, 2005 at 14:42 UTC

    Here is another way (which works by removing the text it extracted):

    $out = ""; while ($data =~ s#(<lt=c24>)(\w{3} ?)(.*?)(<jtitle>.*?</jtitle>)#$1$3# +is) { $out .= $2. " " . $4 . "\n"; }

    Written as a one-liner this gives:

    perl -p0 -e '$x="";while (s#(<lt=c24>)(\w{3} ?)(.*?)(<jtitle>.*?</jtit +le>)#$1$3#is) { $x.=$2." ".$4."\n"; } } $_ = $x; {' data

Re: Backtracking
by sapnac (Beadle) on Jun 24, 2005 at 14:33 UTC
    I bet many there are many other better ways;
    Here is a quick way; # Global declarations open(spooler,"test.html") or die "Can't open file\n"; while($spool=<spooler>){ $spool; @value=split(/<*>*</,$spool); $i=0;$j=0; while ($i<= $#value) { if ($value[$i] =~ /lt=c24/i ) { $value[$i] =~ s/lt=c24>//g; @val1=split(/ /,$value[$i]); } if($value[$i] =~ /^jtitle>/i) { #$value[$i] =~ s/^jtitle>//ig; $val2[$j]=$value[$i]; $j++; } $i++; } } for($n=0;$n<$j; $n++) { print $val1[$n]." <".$val2[$n]."<jtitle>\n"; }
    Hope it helps!