I'm trying to extract strings from a data file where the to be extracted information is between tags.
If the strings have no multiple lines (\n) the regular expression works but it fails when the strings have multiple lines.
The data has the following structure (only <tag2> can have multiple lines):
<tag1>[abcd]</tag1>Blah, blah, blah?: <tag2>Yes</tag2> <tag1>[efgh]</tag1>Yadah, yadah?: <tag2>1) Foo; 2) Bar; 3) Quux; </tag2> <tag1>[ijkl]</tag1>Blah?: <tag2>Yes</tag2> <tag1>[mnop]</tag1>Blah, bleh?: <tag2>Yes, I will. If this and that</tag2>
What I want to extract is the strings between <tag1> and <tag2> tags.
The code:
#!/usr/local/bin/perl -w open (DATA, "data.txt") or die "Error"; undef $/; # slurp mode $body=<DATA>; close DATA; while ( $body =~ /<tag1>\[(\w+)\]<\/tag1>.*<tag2>(.*)<\/tag2>/g ) { print "[$1] => [$2]\n"; }
It should print:
[abcd] => [yes] [efgh] => [1) Foo; 2) Bar; Quux;] [ijkl] => [Yes] [mnop] => [Yes, I will. If this and that.]
Instead it only prints [abcd] and [ijkl], the r.e. does not work if the string in <tag2> has multiple lines.
I tried several combinations with s and g modifier and .*? but can't get anything to work.
I think the problem is in the part made bold:
I'm overlooking something but don't know what... :(
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |