comment on

a) see what I'm doing wrong,

The first thing you are doing wrong is that you are comparing apples and oranges. Take your 2nd benchmark.

cmpthese( 10, {
    slurp_substr => sub { 
        open (FH, "<$filename");
        my $i = 0;
        while ( <FH>) {
            while ($ch = substr($_,$i++,1)){
            }
        }
        close FH;
    },
    slurp_simpleregex => sub {
        my $len=0;
        open (FH, "<$filename");
        while ( <FH>){
            $_ =~ /(.)$/;
        }
        close FH;
    },
    slurp_length => sub {
        my $len=0;
        open (FH, "<$filename");
        while ( <FH>){
            $len += length($_);
        }
        close FH;
    },
});
[download]

Slurp_substr()
This reads the whole file, record-by-record, and then appears to set the (global) variable $ch to each character in each record.
But, your setting the variable $i outside the main loop; incrementing it for each char in the record; but never resetting it.
Hence, for the second and subsequent records, $i will have whatever value it had at the end of the previous record. If the first record is longer than the rest, it will do nothing for any record other than the first.
Both slurp_substr() and raw_slurp_substr() routines in the 1st benchmark are similarly afflicted.
slurp_simpleregex()
Your regex says put the last character of each record into $1. Your simply ignoring every character except the last in each record.
slurp_length()
This is the most mysterious of all. You read each record in the file and accumulate the lengths of those records into the variable $len.
You never access any of the characters?

The first rule of benchmarking is that you need to ensure that you are comparing like with like.

The second rule is that you need to make sure that what you are benchmarking is useful and relevant to your final program.

In these tests, you are doing neither.

That's much worse than what I've seen (but haven't tested here) in C.

If your point is that Perl is slower than C. Your right.

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re: Re: character-by-character in a huge file by BrowserUk
in thread character-by-character in a huge file by mushnik

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.