comment on

I didn't understand s and m switch of regex until furry_marmot's explanation... man perlre says about /ms

'let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string'

I didn't think of example that needs this. Do you have any example case like 'little princess' example for /ms?

As for block mode of this example, I saw this way in awk script. I first met this way($\='') in perl.

People sometimes say regex is slow, so I tried to use index function insted of regex. But it seems not improving time. I simplified just to pick up From address in this example and index version needs utf8 treatment for index and substr.

use strict;
use warnings;

use File::Find;
use Data::Dumper;

my %addresses;

sub test1 {
    my ($from);
    find(sub {
        return unless -f $_;

        open my $fh, '<', $_ or die;
            local $/ = '';      # "Paragraph" mode, reads a block of t
+ext to next \n\n
            $_ = <$fh>;         # Read Header block

            ($from)= $_ =~ /^From:(.*)/m; # /m to anchor
            #print "$from\n";

        close $fh;
    }, glob('./009_mailtest/*'));

    #print Dumper \%addresses;

}
sub test2{
    binmode(STDOUT,":utf8");
    my ($from,$bgn,$end,$len);

    find(sub {
        return unless -f $_;

        open my $fh, '<:utf8', $_ or die;
            local $/ = '';      # "Paragraph" mode, reads a block of t
+ext to next \n\n
            $_ = <$fh>;         # Read Header block

            $bgn=index($_,"From:",0) + length("From:");
            $end=index($_,chr(10),$bgn+1);
            $len=$end - $bgn;
            $from=substr($_, $bgn, $len);
            #print "$from\n";
        close $fh;
    }, glob('./009_mailtest/*'));
}

my($start,$end);
$start=(times)[0];
&test1;
$end=(times)[0];
print "with regex=" . ($end - $start) . "sec\n";

$start=(times)[0];
&test2;
$end=(times)[0];
print "without regex=" . ($end - $start) . "sec\n";
[download]

The result for my 319Mb test mail box was like this.

with regex=0.296875sec
without regex=0.34375sec

In reply to Re^2: Looking for ideas on how to optimize this specialized grep by remiah
in thread Looking for ideas on how to optimize this specialized grep by afresh1

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.