Max_NL has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to extract strings from a data file where the to be extracted information is between tags.
If the strings have no multiple lines (\n) the regular expression works but it fails when the strings have multiple lines.
The data has the following structure (only <tag2> can have multiple lines):
<tag1>[abcd]</tag1>Blah, blah, blah?: <tag2>Yes</tag2> <tag1>[efgh]</tag1>Yadah, yadah?: <tag2>1) Foo; 2) Bar; 3) Quux; </tag2> <tag1>[ijkl]</tag1>Blah?: <tag2>Yes</tag2> <tag1>[mnop]</tag1>Blah, bleh?: <tag2>Yes, I will. If this and that</tag2>
What I want to extract is the strings between <tag1> and <tag2> tags.
The code:
#!/usr/local/bin/perl -w open (DATA, "data.txt") or die "Error"; undef $/; # slurp mode $body=<DATA>; close DATA; while ( $body =~ /<tag1>\[(\w+)\]<\/tag1>.*<tag2>(.*)<\/tag2>/g ) { print "[$1] => [$2]\n"; }
It should print:
[abcd] => [yes] [efgh] => [1) Foo; 2) Bar; Quux;] [ijkl] => [Yes] [mnop] => [Yes, I will. If this and that.]
Instead it only prints [abcd] and [ijkl], the r.e. does not work if the string in <tag2> has multiple lines.
I tried several combinations with s and g modifier and .*? but can't get anything to work.
I think the problem is in the part made bold:
I'm overlooking something but don't know what... :(
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regular expression for grabbing strings with multiple lines between tags
by Ratazong (Monsignor) on Apr 05, 2016 at 09:33 UTC | |
|
Re: Regular expression for grabbing strings with multiple lines between tags
by Eily (Monsignor) on Apr 05, 2016 at 09:46 UTC | |
by Max_NL (Novice) on Apr 05, 2016 at 11:17 UTC | |
|
Re: Regular expression for grabbing strings with multiple lines between tags
by choroba (Cardinal) on Apr 05, 2016 at 09:52 UTC | |
by Max_NL (Novice) on Apr 05, 2016 at 11:21 UTC | |
|
Re: Regular expression for grabbing strings with multiple lines between tags
by Max_NL (Novice) on Apr 05, 2016 at 09:53 UTC |