The regexp needs the /g switch or the code will loop forever, and you use $1 but don't have capturing groups.
I don't know the specifics as to how /g works. All I know is that in other scripts I've wrote it stops at the first instance. Considering that I have multiple <RemoteHost> sections I don't think it will work and thus am assuming that is what I want. I haven't set up $1 captures yet because it should only have one thing being feed to it; the file. I'll work on error handling for invalid input later. | [reply] [d/l] [select] |
Hi Boyd.Ako,
I don't know the specifics as to how /g works.
/g on an m// match is documented in perlop, in this case: "In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match." Since the regexp is being used in a while loop, without the /g modifier it would simply match the first occurrence every time and the loop would never end. With /g, the match advances through the string.
Alternatively: "The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. ... In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression." Without /g, it would only find the first match.
When you add the /g and the capturing group to codiac's code it works:
use warnings;
use strict;
my $text = <<'END';
<y><ReportHost>one</ReportHost>
<x>test</x></y>
<ReportHost>
two
</ReportHost>
<ReportHost><z>thr</z>ee</ReportHost>
END
# m//g in scalar context
while ($text=~ m{<ReportHost[^>]*>((?:(?!</ReportHost>).)*)</ReportHos
+t>}sg) {
print "a: \"$1\"\n";
}
# m//g in list context
my @m = $text=~ m{<ReportHost[^>]*>((?:(?!</ReportHost>).)*)</ReportHo
+st>}sg;
print "b: \"$_\"\n" for @m;
__END__
a: "one"
a: "
two
"
a: "<z>thr</z>ee"
b: "one"
b: "
two
"
b: "<z>thr</z>ee"
Hope this helps, -- Hauke D | [reply] [d/l] [select] |