comment on

Most versions of Microsoft Internet Explorer contain herustics to guess the encoding of web pages where the encoding is unknown. It works on statistical methods based on the letter frequency in different languages.

You could try wraping your text in basic html tags, and then loading them into MSIE and seeing which encoding is detected, and if all the texts are detected with the same encoding. (I assume you have at least a rudamentary knowlege of Russan, so you can tell if herustics have got it wrong and produced rubish).

If that does not work, or if your documents all have different encodings, then you will need to come up with some heuristics of your own. My suggestion would be to try out all the likey possiblities (using ikegami's code), and compare the output with a wordlist of common russian words, taken from your system's spellcheker dictionary.

In reply to Re^3: Decoding Russian text by chrestomanci
in thread Decoding Russian text by vit

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.