Here's another approach that may be a little faster than my previous one (but it will be nowhere near 3 - 4 seconds for 256 MB!).

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'A man, a plan, a canal: Panama!'; ;; my @char_counts; $#char_counts = 255; ++$char_counts[ ord(substr $s, $_, 1) ] for 0 .. length($s) - 1; die 'oops...' if $#char_counts != 255; ;; printf qq{'%s' (0x%02x) == $char_counts[$_] (%6.3f%%) \n}, chr, $_, ($char_counts[$_] / length $s) * 100 for grep defined($char_counts[$_]), 0 .. $#char_counts; " ' ' (0x20) == 6 (19.355%) '!' (0x21) == 1 ( 3.226%) ',' (0x2c) == 2 ( 6.452%) ':' (0x3a) == 1 ( 3.226%) 'A' (0x41) == 1 ( 3.226%) 'P' (0x50) == 1 ( 3.226%) 'a' (0x61) == 9 (29.032%) 'c' (0x63) == 1 ( 3.226%) 'l' (0x6c) == 2 ( 6.452%) 'm' (0x6d) == 2 ( 6.452%) 'n' (0x6e) == 4 (12.903%) 'p' (0x70) == 1 ( 3.226%)

... how HxD is able to do it ...

... is by writing the code in C or some such compiled language — at least, I'd be willing to bet doughnuts to dollars that's the case. You, too, can do this with Inline::C! (Update: See also Inline::C::Cookbook.) In fact, the array-based approach in the code example above should, I think, convert very neatly to C. The learning curve for Inline::C is not too bad (assuming you know C!) and well worth the effort if you have a need for speed! (I need to brush up on Inline::C myself, so if I have some time later, I may play around with this.)


In reply to Re^5: Computing the percentage of certain characters in a file by AnomalousMonk
in thread Computing the percentage of certain characters in a file by james28909

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.