Spooky has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl monks, Is there a function or subroutine out there that counts leading white spaces in a field? I'm hoping there is but if not how would I do that? ..thanks!

Replies are listed 'Best First'.
Re: counting leading white spaces
by ikegami (Patriarch) on Feb 11, 2009 at 19:58 UTC

    "Counting whitespace" doesn't make much sense. Did you mean "counting whitespace characters"?

    /^(\s*)/ && length($1)

    Update: Presuming a scalar context,

    () = /\G\s/g

    also works.

      what does () = /\G\s/g do??
Re: counting leading white spaces
by ww (Archbishop) on Feb 11, 2009 at 22:39 UTC
    My recollection (aided by a quick trip to perldoc -q space and another into Friedll's "Mastering Regular Expressions") is that "whitespace" character includes an ordinary space (0x20), a tab and a newline. Update perlrequick says:
    "\s is a whitespace character and represents

    [\ \t\r\n\f]

    Hence, I suspect that if the data OP is dealing with has embedded tab chars, the count will be unreliable.

    FTR, brian d. foy's remarks in perlfaq4.pod either ignore this case or indicate there's something wrong with my understanding (in which case, correction would be welcome).

    So, in the spirit of self-education, I tried this little experiment (Update: all tabs in original are hard tabs; here they are replaced by multiple spaces</update>):

    #!usr/bin/perl use strict; use warnings; my @var= ('now is the time', #space, tab between "n +ow" and "is" ' for all good men', #leading tab, no spac +e 'to come to the aid of their party.' #space, tab +before party ); #hbm's method: my $count = 0; my $linecount = 0; for my $var(@var) { $linecount = $var =~ tr/ \t/ \t/; print "\$linecount: $linecount\n"; $count += $linecount; } print "$count \n"; #hbm's method with \t (tr doesn't know from "\s" my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = $var =~ tr/\t/|\t/; print "\$linecount_s: $linecount_s\n"; $count_s += $linecount_s; } print "$count_s \n"; =head OUTPUT $linecount: 4 $linecount: 4 $linecount: 8 16 # WTF? with tabs converted to spaces, I count 17 as I h +ave my tabs set. $linecount_s: 1 $linecount_s: 1 $linecount_s: 1 3 =cut
    Which largely undermines my supposition above.

    Update 20090212 00:35

    Ignore the comment in line 37. That's not a Perl issue (nor a reflection of my inability to count, but it's waaaaay OT and way complicated). But having elaborated the code in this manner (still using the same array):

    # and now using \s & /g # output smells bad print "and now using \\s only, with /g modifier\n"; my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = scalar ( $var =~ s/\s/_/g ); print "\$linecount_s: $linecount_s, \$var after substitution: $var +\n"; $count_s += $linecount_s; } print "\$count_s: $count_s \n\n";

    the output of that snippet confounds me:

    and now using \s only, with /g modifier $linecount_s: 3, $var after substitution: now_|is_the_time $linecount_s: 3, $var after substitution: |for_all_good_men $linecount_s: 7, $var after substitution: to_come_to_the_aid_of_their_ +|party. $count_s: 13

    because -- while substituting "_" (only) for \s I now find pipes in the output where \t existed in @var. WTF????

    More in the next node below, but it does NOT explain the pipes. :-(

      Further experiments:
      #!usr/bin/perl use strict; use warnings; print $0 . "\n\n"; my @var= ('now is the time', # space, tab between now and i +s, 4 in entire line (3 spaces & 1 tab) ' for all good men', # leading tab, no space, 4 + in entire line (1 tab & 3 spaces) 'to come to the aid of their party.' # space, tab before + party, 8 (7 spaces & 1 tab) ); # Sixteen to +tal spaces & tabs # now using \s & /g print "and now using \\s only, with /g modifier\n"; my $count_s = 0; my $linecount_s = 0; for my $var(@var) { $linecount_s = scalar ( $var =~ s/\s/_/g ); print "\$linecount_s: $linecount_s, \$var after substitution: $var +\n"; $count_s += $linecount_s; } print "\$count_s: $count_s \n\n"; # and now using \s and \t print "Now, simple count (Note new array \@var1) on LEADING \\s or \\t +:\n"; my @var1 = (' Now is the time', #NO leading spaces, two +leading tabs ' for all men to come to the', #one leading space ' of their party.' #5 leading spaces ); my $matchcount = 0; for my $var1(@var1) { if ( $var1 =~ /^(\s*)/ && length($1) ) { $matchcount += length($1); } } print "Total \$matchcount: $matchcount (of tabs and spaces)"; __END__ =head OUTPUT from countspaces2.pl (initially using the same @var as in + the previous node) and now using \s only, with /g modifier $linecount_s: 4, $var after substitution: now__is_the_time $linecount_s: 4, $var after substitution: _for_all_good_men $linecount_s: 8, $var after substitution: to_come_to_the_aid_of_their_ +_party. $count_s: 16 # Contrast the result from countspaces.pl (previous no +de) # Correct count and "_" replaced each "\t" Now, simple count (Note new array @var1) on LEADING \s or \t: Total $matchcount: 8 (of tabs and spaces) # Correct =cut
Re: counting leading white spaces
by hbm (Hermit) on Feb 11, 2009 at 20:08 UTC
    Perhaps tr?
    $_ = " \ta"; print tr/ \t/ \t/; # prints "5"

    Or count and delete:

    $_ = " \ta"; print tr/ \t//d, $_; # prints "5a"

      A problem with tr/// could be that it would count all occurrences, not only leading ones...

        Good point; bad oversight on my part. Below, I take extra effort to count only leading spaces...

        use strict; use warnings; AUTOLOAD{$.=0;$_=@_[$??$;:$.];@_=reverse split//;++$.while+pop=~m,\s,;y, , ,s,warn$.,$_,$/}&* ($_)for(<DATA>) __DATA__ Initiate Novice Acolyte Sexton Beadle Scribe Monk Pilgrim Friar Hermit Chaplain Deacon Curate Priest Vicar Parson Prior Monsignor Abbot Canon Chancellor Bishop Archbishop Cardinal Sage Saint Apostle Pope

        See also JAPH at the firing range

Re: counting leading white spaces
by Anonymous Monk on Nov 09, 2015 at 07:41 UTC
    $linecopy=$originalLine; $linecopy=~ s/^\s+//; # remove leading spaces $length = length(originalLine)-length($linecopy);
Re: counting leading white spaces
by Daga (Initiate) on Apr 27, 2011 at 09:34 UTC

    Hi you can use below method for the same

    sub countLeadingTabSpaces { my($data) = @_; my @string=split(/\t/,$data); my $numberofLeadingTabs=0; my $size = @string; my $TabSpace = ""; my $loopBreak = 0; for (my $count = 0; $count < $size; $count++) { if($loopBreak==0) { if($string[$count] =~/([A-Za-z0-9@])/) { $loopBreak = 1; }else{ ++$numberofLeadingTabs; } } } print "=>NumberofLeadingTabs: ".$numberofLeadingTabs; return $numberofLeadingTabs; }

      Your code has an error. It reports the wrong number of leading tabs if there are some characters in the line that you didn't cater for:

      sub countLeadingTabSpaces { my($data) = @_; my @string=split(/\t/,$data); my $numberofLeadingTabs=0; my $size = @string; my $TabSpace = ""; my $loopBreak = 0; for (my $count = 0; $count < $size; $count++) { if($loopBreak==0) { if($string[$count] =~/([A-Za-z0-9@])/) { $loopBreak = 1; }else{ ++$numberofLeadingTabs; } } } print "=>NumberofLeadingTabs: ".$numberofLeadingTabs; return $numberofLeadingTabs; } countLeadingTabSpaces("\t\t!\tfoo"); # prints "3" but only has two lea +ding tabs!

      It's better to avoid split and then checking whether a line contains a certain set of characters, and instead just count the number of tab characters at the start:

      my $numberofLeadingTabs = 0; if ($data =~ /^(\t+)/) { my $tabs = $1; $numberofLeadingTabs = length $tabs; };
Re: counting leading white spaces
by Anonymous Monk on Apr 27, 2011 at 09:33 UTC

    Hi you can use below method for the same

    sub countLeadingTabSpaces { my($data) = @_; my @string=split(/\t/,$data); my $numberofLeadingTabs=0; my $size = @string; my $TabSpace = ""; my $loopBreak = 0; for (my $count = 0; $count < $size; $count++) { if($loopBreak==0) { if($string[$count] =~/([A-Za-z0-9@])/) { $loopBreak = 1; }else{ ++$numberofLeadingTabs; } } } print "=>NumberofLeadingTabs: ".$numberofLeadingTabs; return $numberofLeadingTabs; }