in reply to counting leading white spaces

My recollection (aided by a quick trip to perldoc -q space and another into Friedll's "Mastering Regular Expressions") is that "whitespace" character includes an ordinary space (0x20), a tab and a newline. Update perlrequick says:
"\s is a whitespace character and represents

[\ \t\r\n\f]

Hence, I suspect that if the data OP is dealing with has embedded tab chars, the count will be unreliable.

FTR, brian d. foy's remarks in perlfaq4.pod either ignore this case or indicate there's something wrong with my understanding (in which case, correction would be welcome).

So, in the spirit of self-education, I tried this little experiment (Update: all tabs in original are hard tabs; here they are replaced by multiple spaces</update>):

#!usr/bin/perl use strict; use warnings; my @var= ('now is the time', #space, tab between "n +ow" and "is" ' for all good men', #leading tab, no spac +e 'to come to the aid of their party.' #space, tab +before party ); #hbm's method: my $count = 0; my $linecount = 0; for my $var(@var) { $linecount = $var =~ tr/ \t/ \t/; print "\$linecount: $linecount\n"; $count += $linecount; } print "$count \n"; #hbm's method with \t (tr doesn't know from "\s" my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = $var =~ tr/\t/|\t/; print "\$linecount_s: $linecount_s\n"; $count_s += $linecount_s; } print "$count_s \n"; =head OUTPUT $linecount: 4 $linecount: 4 $linecount: 8 16 # WTF? with tabs converted to spaces, I count 17 as I h +ave my tabs set. $linecount_s: 1 $linecount_s: 1 $linecount_s: 1 3 =cut
Which largely undermines my supposition above.

Update 20090212 00:35

Ignore the comment in line 37. That's not a Perl issue (nor a reflection of my inability to count, but it's waaaaay OT and way complicated). But having elaborated the code in this manner (still using the same array):

# and now using \s & /g # output smells bad print "and now using \\s only, with /g modifier\n"; my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = scalar ( $var =~ s/\s/_/g ); print "\$linecount_s: $linecount_s, \$var after substitution: $var +\n"; $count_s += $linecount_s; } print "\$count_s: $count_s \n\n";

the output of that snippet confounds me:

and now using \s only, with /g modifier $linecount_s: 3, $var after substitution: now_|is_the_time $linecount_s: 3, $var after substitution: |for_all_good_men $linecount_s: 7, $var after substitution: to_come_to_the_aid_of_their_ +|party. $count_s: 13

because -- while substituting "_" (only) for \s I now find pipes in the output where \t existed in @var. WTF????

More in the next node below, but it does NOT explain the pipes. :-(

Replies are listed 'Best First'.
Re^2: counting leading white spaces
by ww (Archbishop) on Feb 12, 2009 at 05:34 UTC
    Further experiments:
    #!usr/bin/perl use strict; use warnings; print $0 . "\n\n"; my @var= ('now is the time', # space, tab between now and i +s, 4 in entire line (3 spaces & 1 tab) ' for all good men', # leading tab, no space, 4 + in entire line (1 tab & 3 spaces) 'to come to the aid of their party.' # space, tab before + party, 8 (7 spaces & 1 tab) ); # Sixteen to +tal spaces & tabs # now using \s & /g print "and now using \\s only, with /g modifier\n"; my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = scalar ( $var =~ s/\s/_/g ); print "\$linecount_s: $linecount_s, \$var after substitution: $var +\n"; $count_s += $linecount_s; } print "\$count_s: $count_s \n\n"; # and now using \s and \t print "Now, simple count (Note new array \@var1) on LEADING \\s or \\t +:\n"; my @var1 = (' Now is the time', #NO leading spaces, two +leading tabs ' for all men to come to the', #one leading space ' of their party.' #5 leading spaces ); my $matchcount = 0; for my $var1(@var1) { if ( $var1 =~ /^(\s*)/ && length($1) ) { $matchcount += length($1); } } print "Total \$matchcount: $matchcount (of tabs and spaces)"; __END__ =head OUTPUT from countspaces2.pl (initially using the same @var as in + the previous node) and now using \s only, with /g modifier $linecount_s: 4, $var after substitution: now__is_the_time $linecount_s: 4, $var after substitution: _for_all_good_men $linecount_s: 8, $var after substitution: to_come_to_the_aid_of_their_ +_party. $count_s: 16 # Contrast the result from countspaces.pl (previous no +de) # Correct count and "_" replaced each "\t" Now, simple count (Note new array @var1) on LEADING \s or \t: Total $matchcount: 8 (of tabs and spaces) # Correct =cut