Re^2: Help required inText manipulation

Thanks for your comments Dave,
Yes, you may correct. the requirement i'm trying here is something should work with multiple lines at same time.
so that i have removed all the whitespace, tabs, and newlines to have the complete content in single line.
Probably i should not use the ..gi; thanks will update them.
regarding open(B, ">A.xml") or die("Sorry!");
just trying to create a intermediate file without whitespace, tabs, and newlines.
IF the message was not clear am apologize!
using XML::Simple and XML::Twig can we match multiple elements OR attributes at a time?
Thanks

Comment on Re^2: Help required inText manipulation

Replies are listed 'Best First'.
Re^3: Help required inText manipulation by davido (Cardinal) on Apr 12, 2011 at 09:24 UTC
The primary reason you're trying to remove as much whitespace as possible (including and in particular newlines) is probably so that your XML tags don't get line-broken. And this is probably important because you're parsing XML tags using regular expressions. That entire issue and resulting data contortion is avoidable by using a real XML parser. XML::Simple is one of the easiest parsers to use for simple tasks, but there are others. The /g modifier is necessary if you stick with the regexp solution, but the /i modifier only applies to characters that have some notion of upper/lower case. Space doesn't have such a context, and so the /i modifier is unnecessary, and in fact does impact performance (though probably not enough to care about). The point is to not wield modifiers unnecessarily without considering what they're being used for. The three argument version of open is considered to be a safer programming practice. So is the use of lexical filehandles as opposed to global typeglob filehandles. For example, "`open my $infile, '<', $filename or die "Couldn't open the input file $filename: $!\n";`....... which reminds me, you should get in the habit of using meaningful messages in die statements. That will aid in debugging. The advantage to something like XML::Simple is that you don't have to invent a fragile and probably flawed regexp approach to parsing something that is quite difficult to parse correctly. XML::Simple dumps the XML file into a hash. If you're trying to match multiple things at once, you just have to ask, can I get what I'm after by diving into a hash instead? I think the answer is probably yes. But if a hash based representation of your XML file isn't helpful, XML::Twig give a tree-based representation instead. One of those two strategies ought to satisfy most basic needs. If you have to dig deeper, XML::Parser gives a lower level hook into the parsing mechanics. But I doubt you need to dig that deep. Hope this helps... Dave	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: Help required inText manipulation
by davido (Cardinal) on Apr 12, 2011 at 09:24 UTC

The primary reason you're trying to remove as much whitespace as possible (including and in particular newlines) is probably so that your XML tags don't get line-broken. And this is probably important because you're parsing XML tags using regular expressions. That entire issue and resulting data contortion is avoidable by using a real XML parser. XML::Simple is one of the easiest parsers to use for simple tasks, but there are others.

The /g modifier is necessary if you stick with the regexp solution, but the /i modifier only applies to characters that have some notion of upper/lower case. Space doesn't have such a context, and so the /i modifier is unnecessary, and in fact does impact performance (though probably not enough to care about). The point is to not wield modifiers unnecessarily without considering what they're being used for.

The three argument version of open is considered to be a safer programming practice. So is the use of lexical filehandles as opposed to global typeglob filehandles. For example, "open my $infile, '<', $filename or die "Couldn't open the input file $filename: $!\n";....... which reminds me, you should get in the habit of using meaningful messages in die statements. That will aid in debugging.

The advantage to something like XML::Simple is that you don't have to invent a fragile and probably flawed regexp approach to parsing something that is quite difficult to parse correctly. XML::Simple dumps the XML file into a hash. If you're trying to match multiple things at once, you just have to ask, can I get what I'm after by diving into a hash instead? I think the answer is probably yes. But if a hash based representation of your XML file isn't helpful, XML::Twig give a tree-based representation instead. One of those two strategies ought to satisfy most basic needs. If you have to dig deeper, XML::Parser gives a lower level hook into the parsing mechanics. But I doubt you need to dig that deep.

Hope this helps...

Dave

[reply]
[d/l]