The primary reason you're trying to remove as much whitespace as possible (including and in particular newlines) is probably so that your XML tags don't get line-broken. And this is probably important because you're parsing XML tags using regular expressions. That entire issue and resulting data contortion is avoidable by using a real XML parser. XML::Simple is one of the easiest parsers to use for simple tasks, but there are others.
The /g modifier is necessary if you stick with the regexp solution, but the /i modifier only applies to characters that have some notion of upper/lower case. Space doesn't have such a context, and so the /i modifier is unnecessary, and in fact does impact performance (though probably not enough to care about). The point is to not wield modifiers unnecessarily without considering what they're being used for.
The three argument version of open is considered to be a safer programming practice. So is the use of lexical filehandles as opposed to global typeglob filehandles. For example, "open my $infile, '<', $filename or die "Couldn't open the input file $filename: $!\n";....... which reminds me, you should get in the habit of using meaningful messages in die statements. That will aid in debugging.
The advantage to something like XML::Simple is that you don't have to invent a fragile and probably flawed regexp approach to parsing something that is quite difficult to parse correctly. XML::Simple dumps the XML file into a hash. If you're trying to match multiple things at once, you just have to ask, can I get what I'm after by diving into a hash instead? I think the answer is probably yes. But if a hash based representation of your XML file isn't helpful, XML::Twig give a tree-based representation instead. One of those two strategies ought to satisfy most basic needs. If you have to dig deeper, XML::Parser gives a lower level hook into the parsing mechanics. But I doubt you need to dig that deep.
Hope this helps...
|