I didn't understand s and m switch of regex until furry_marmot's explanation... man perlre says about /ms

'let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string'

I didn't think of example that needs this. Do you have any example case like 'little princess' example for /ms?

As for block mode of this example, I saw this way in awk script. I first met this way($\='') in perl.

People sometimes say regex is slow, so I tried to use index function insted of regex. But it seems not improving time. I simplified just to pick up From address in this example and index version needs utf8 treatment for index and substr.
use strict; use warnings; use File::Find; use Data::Dumper; my %addresses; sub test1 { my ($from); find(sub { return unless -f $_; open my $fh, '<', $_ or die; local $/ = ''; # "Paragraph" mode, reads a block of t +ext to next \n\n $_ = <$fh>; # Read Header block ($from)= $_ =~ /^From:(.*)/m; # /m to anchor #print "$from\n"; close $fh; }, glob('./009_mailtest/*')); #print Dumper \%addresses; } sub test2{ binmode(STDOUT,":utf8"); my ($from,$bgn,$end,$len); find(sub { return unless -f $_; open my $fh, '<:utf8', $_ or die; local $/ = ''; # "Paragraph" mode, reads a block of t +ext to next \n\n $_ = <$fh>; # Read Header block $bgn=index($_,"From:",0) + length("From:"); $end=index($_,chr(10),$bgn+1); $len=$end - $bgn; $from=substr($_, $bgn, $len); #print "$from\n"; close $fh; }, glob('./009_mailtest/*')); } my($start,$end); $start=(times)[0]; &test1; $end=(times)[0]; print "with regex=" . ($end - $start) . "sec\n"; $start=(times)[0]; &test2; $end=(times)[0]; print "without regex=" . ($end - $start) . "sec\n";
The result for my 319Mb test mail box was like this.
with regex=0.296875sec
without regex=0.34375sec

In reply to Re^2: Looking for ideas on how to optimize this specialized grep by remiah
in thread Looking for ideas on how to optimize this specialized grep by afresh1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.