To check for a space followed by a word character is simple, though there are a few similar patterns that might serve your needs best:

$string =~ / \w/; # a space followed by a word character $string =~ /\s\w/; # any whitespace character followed by a word chara +cter $string =~ /\s\S/; # any whitespace character followed by a non-whites +pace character

However, since you're applying a regex here, it might be just as efficient to go ahead and do the split and then see whether it split anything. That would take a bit more time on the lines that are a single word, but less time on the ones with multiple words:

#!/usr/bin/env perl use 5.010; use strict; use warnings; my @s = ('John', 'John ', 'John Doe', 'John P. Doe'); # last 2 should +match for (@s){ my @v = split /\s+\b/; # split on whitespace followed by a word bou +ndary if(@v > 1 ){ # if the split did any splitting say; # do stuff with the line or elements } }

Update: I thought I'd benchmark it (code below), and found that if 50% of the values needed to be split as in the example above, the two methods were equally fast:

Rate split and check check and split split and check 145/s -- -1% check and split 146/s 1% --

But when I made it so 75% of the values needed to be split, the "split everything and then check for a second element" method was the clear winner:

Rate check and split split and check check and split 112/s -- -17% split and check 136/s 21% --

So it looks like if less than half your lines will need to be split, check first, then split the ones that matched. If more than half will end up being split, just split them all and check for a second element in the resulting array, and go from there. (Incidentally, checking for the second element ($v[1]) was also a gain over checking the number of elements (@v>1) as I originally did.) Here's the benchmarking code:

#!/usr/bin/env perl use 5.010; use strict; use warnings; use Benchmark qw(:all); use Data::Printer; # my @s = ('John', 'John ', 'John Doe', 'John P. Doe') x 1000; # big a +rray 50% need split my @s = ('John', 'John Poe', 'John Doe', 'John P. Doe') x 1000; # big +array 75% need split cmpthese( 1000, { 'split and check' => \&one, 'check and split' => \&two, }); sub one { for (@s){ my @v = split /\s+\b/; # split on a space followed by a word +boundary if($v[1] ){ # if the split did any splitting # do stuff with the line or elements } } } sub two { for (@s){ if (/\s\b/){ # if the line would be split my @v = split /\s+\b/; # split it # do stuff with the line or element +s } } }

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.


In reply to Re: Check for Spaces in a String by aaron_baugher
in thread Check for Spaces in a String by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.