My recollection (aided by a quick trip to perldoc -q space and another into Friedll's "Mastering Regular Expressions") is that "whitespace" character includes an ordinary space (0x20), a tab and a newline. Update perlrequick says:
"\s is a whitespace character and represents

[\ \t\r\n\f]

Hence, I suspect that if the data OP is dealing with has embedded tab chars, the count will be unreliable.

FTR, brian d. foy's remarks in perlfaq4.pod either ignore this case or indicate there's something wrong with my understanding (in which case, correction would be welcome).

So, in the spirit of self-education, I tried this little experiment (Update: all tabs in original are hard tabs; here they are replaced by multiple spaces</update>):

#!usr/bin/perl use strict; use warnings; my @var= ('now is the time', #space, tab between "n +ow" and "is" ' for all good men', #leading tab, no spac +e 'to come to the aid of their party.' #space, tab +before party ); #hbm's method: my $count = 0; my $linecount = 0; for my $var(@var) { $linecount = $var =~ tr/ \t/ \t/; print "\$linecount: $linecount\n"; $count += $linecount; } print "$count \n"; #hbm's method with \t (tr doesn't know from "\s" my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = $var =~ tr/\t/|\t/; print "\$linecount_s: $linecount_s\n"; $count_s += $linecount_s; } print "$count_s \n"; =head OUTPUT $linecount: 4 $linecount: 4 $linecount: 8 16 # WTF? with tabs converted to spaces, I count 17 as I h +ave my tabs set. $linecount_s: 1 $linecount_s: 1 $linecount_s: 1 3 =cut
Which largely undermines my supposition above.

Update 20090212 00:35

Ignore the comment in line 37. That's not a Perl issue (nor a reflection of my inability to count, but it's waaaaay OT and way complicated). But having elaborated the code in this manner (still using the same array):

# and now using \s & /g # output smells bad print "and now using \\s only, with /g modifier\n"; my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = scalar ( $var =~ s/\s/_/g ); print "\$linecount_s: $linecount_s, \$var after substitution: $var +\n"; $count_s += $linecount_s; } print "\$count_s: $count_s \n\n";

the output of that snippet confounds me:

and now using \s only, with /g modifier $linecount_s: 3, $var after substitution: now_|is_the_time $linecount_s: 3, $var after substitution: |for_all_good_men $linecount_s: 7, $var after substitution: to_come_to_the_aid_of_their_ +|party. $count_s: 13

because -- while substituting "_" (only) for \s I now find pipes in the output where \t existed in @var. WTF????

More in the next node below, but it does NOT explain the pipes. :-(


In reply to Re: counting leading white spaces by ww
in thread counting leading white spaces by Spooky

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.