A small suggestion based on your samples above: maybe you could use a dictionary to identify non-english blocks of text. Something like, if a block of text has X% "words" that are not found in a dictionary, then assume a higher likelihood that it should be left unchanged.
Also, note that your links to modules above would work better with a "[cpan://..." style link, like:
Text::Wrap
XML::LibXML
Text::Autoformat
Joe