in reply to How to check the encoding format of an XML
Beyond that, if there is non-ASCII content, the actual nature of such content (what character encoding, what language) might require some guessing... Encode::Guess could be helpful, depending on what language and character encoding are actually present.#!/usr/bin/perl use strict; my $non_ascii = 0; while (<>) { $non_ascii++ if ( /[^\x00-\x7f]/ ); } warn "input contains non-ASCII\n" if ( $non_ascii );
People who are smart enough to use XML with non-ASCII data usually have the clue about using utf8 encoding, and if your data falls into this category, Encode::Guess will work fine to confirm that (byte patterns in utf8 are quite distinctive and unmistakable). But if its one or another single-byte encoding (any of the cp125* or iso-8859-* character sets), you'll need to know what the intended language is in order to help Encode::Guess come up with a right answer.
|
|---|