in reply to Backtracking

Here is another way (which works by removing the text it extracted):

$out = ""; while ($data =~ s#(<lt=c24>)(\w{3} ?)(.*?)(<jtitle>.*?</jtitle>)#$1$3# +is) { $out .= $2. " " . $4 . "\n"; }

Written as a one-liner this gives:

perl -p0 -e '$x="";while (s#(<lt=c24>)(\w{3} ?)(.*?)(<jtitle>.*?</jtit +le>)#$1$3#is) { $x.=$2." ".$4."\n"; } } $_ = $x; {' data