Re: Help required inText manipulation

Your post makes me dizzy. Can you provide a solution for that first?

Where did you get this code?

open(B, ">A.xml") or die("Sorry!");

I'm sorry too. Rather than apologize, why not spit out a meaningful and useful message, or at least print the contents of $!

Next let's look at this:

 $_=~s/\s\s//gi;
    $_=~s/\t//gi;
    $_=~s/^\n$//gi;
    $_=~s/\n//gi;
[download]

Could you explain why it's a good idea to use case-insensitivity when you're matching whitespace, tabs, and newlines? (Hint: It isn't, and you shouldn't indiscriminately do so.)

After that I just lost interest. But I can say that you should parse XML with an XML parser, not regexps. You'll have a better success rate with less brain cramps if you do. Have a look at XML::Simple, or XML::Twig for starters.

And for heaven's sake, for projects starting in 2011 use lexical filehandles, and three-arg version of open.

Dave

Comment on Re: Help required inText manipulation Select or Download Code

Replies are listed 'Best First'.
Re^2: Help required inText manipulation by thirilog (Acolyte) on Apr 12, 2011 at 08:46 UTC
Thanks for your comments Dave, Yes, you may correct. the requirement i'm trying here is something should work with multiple lines at same time. so that i have removed all the whitespace, tabs, and newlines to have the complete content in single line. Probably i should not use the ..gi; thanks will update them. regarding open(B, ">A.xml") or die("Sorry!"); just trying to create a intermediate file without whitespace, tabs, and newlines. IF the message was not clear am apologize! using XML::Simple and XML::Twig can we match multiple elements OR attributes at a time? Thanks	[reply]
Re^3: Help required inText manipulation by davido (Cardinal) on Apr 12, 2011 at 09:24 UTC
The primary reason you're trying to remove as much whitespace as possible (including and in particular newlines) is probably so that your XML tags don't get line-broken. And this is probably important because you're parsing XML tags using regular expressions. That entire issue and resulting data contortion is avoidable by using a real XML parser. XML::Simple is one of the easiest parsers to use for simple tasks, but there are others. The /g modifier is necessary if you stick with the regexp solution, but the /i modifier only applies to characters that have some notion of upper/lower case. Space doesn't have such a context, and so the /i modifier is unnecessary, and in fact does impact performance (though probably not enough to care about). The point is to not wield modifiers unnecessarily without considering what they're being used for. The three argument version of open is considered to be a safer programming practice. So is the use of lexical filehandles as opposed to global typeglob filehandles. For example, "`open my $infile, '<', $filename or die "Couldn't open the input file $filename: $!\n";`....... which reminds me, you should get in the habit of using meaningful messages in die statements. That will aid in debugging. The advantage to something like XML::Simple is that you don't have to invent a fragile and probably flawed regexp approach to parsing something that is quite difficult to parse correctly. XML::Simple dumps the XML file into a hash. If you're trying to match multiple things at once, you just have to ask, can I get what I'm after by diving into a hash instead? I think the answer is probably yes. But if a hash based representation of your XML file isn't helpful, XML::Twig give a tree-based representation instead. One of those two strategies ought to satisfy most basic needs. If you have to dig deeper, XML::Parser gives a lower level hook into the parsing mechanics. But I doubt you need to dig that deep. Hope this helps... Dave	[reply] [d/l]