in reply to unknown encoding
For something on the order of 100 MB that's a lot of work, and as simple as the task is I'd just write it in C. But if you want to keep it in Perl, there's one bug and a few optimizations that comes to mind:
However, I think your right the whole task needs to get clearer. You say it's unknown what the encoding is supposed to be, but are you sure you're dealing with an 8-bit character set? As you wrote it, it would probably work for ASCII but not much else---anything from the Latin-x family (and many other charsets) may contain characters >126. The "ISO 8859 Alphabet Soup" might help visualizing what you want to check for: czyborra.com/charsets/iso8859.html
Edit: fixed character range typo as per jimw54321's comment
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: unknown encoding
by jimw54321 (Acolyte) on Oct 31, 2011 at 17:19 UTC | |
by Marshall (Canon) on Oct 31, 2011 at 18:28 UTC | |
by jimw54321 (Acolyte) on Oct 31, 2011 at 19:07 UTC | |
by Marshall (Canon) on Oct 31, 2011 at 19:51 UTC | |
by Lotus1 (Vicar) on Oct 31, 2011 at 20:30 UTC | |
| |
by mbethke (Hermit) on Oct 31, 2011 at 18:19 UTC |