Amiru has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I hope someone can shed some light to my problem: there is a long string with multiple white spaces, how can I formulate an expression where string is split at the nth occurrence of the whitespace, specifically on 11th space. Thanks.
  • Comment on Split String after nth occurrence of a charater

Replies are listed 'Best First'.
Re: Split String after nth occurrence of a charater
by davido (Cardinal) on Nov 08, 2011 at 05:39 UTC

    It sort of depends on what your string looks like, but here is one solution:

    my $string = "one two three four five six seven eight nine ten eleven +twelve thirteen"; my( $left, $right ); if( ( $left, $right ) = $string =~ m{\A((?:\S+\s){11})(.+)\z} ) { say "Left: ($left)"; say "Right: ($right)"; }

    Output:

    Left: (one two three four five six seven eight nine ten eleven ) Right: (twelve thirteen)

    If you want to gobble that eleventh space, this will do it:

    m{\A((?:(\S+\s){10}\S+)\s(.+)\z}

    Dave

      Thank you Dave, your regular expression gave me the start. Thanks again.
      Your expression is failing many of the points I pointed out here.
Re: Split String after nth occurrence of a charater
by GrandFather (Saint) on Nov 08, 2011 at 07:28 UTC

    You specification is rather vague. Using the word split is likely to make a Perl user think you want to split your string into possibly more than two parts. The following does that trick:

    my $string = join ' ', 1 .. 30; my @parts = $string =~ /((?:\S*\s?){1,11})/g; pop @parts; print ">$_<\n" for @parts;

    Prints:

    >1 2 3 4 5 6 7 8 9 10 11 < >12 13 14 15 16 17 18 19 20 21 22 < >23 24 25 26 27 28 29 30<
    True laziness is hard work
      ok.. didn't think of this before, as I was going to keep splitting in 11th space of the right string, but this does it. Thank you
Re: Split String after nth occurrence of a charater
by johngg (Canon) on Nov 08, 2011 at 09:24 UTC

    I'm not sure if this is anything like what you are aiming at. It uses a global match to find the whitespace positions then substr to divide the string.

    knoppix@Microknoppix:~$ perl -E ' > $str = q{one word two spaces four spaces one}; > push @posns, [ $-[ 0 ], $+[ 0 ] ] while $str =~ m{\s+}g; > say qq{@$_} for @posns; > # Split string at fourth gap. > $left = substr $str, 0, $posns[ 3 ]->[ 0 ]; > $right = substr $str, $posns[ 3 ]->[ 1 ]; > say qq{>$left<}; > say qq{>$right<};' 3 4 8 10 13 14 20 24 28 31 37 38 >one word two spaces< >four spaces one< knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Split String after nth occurrence of a charater
by ansh batra (Friar) on Nov 08, 2011 at 07:14 UTC
    #! usr/bin/perl my ($occ,$str); $str="hi there this is perl monks forum."; print "enter number of occurence where you want to split\n"; $occ=<STDIN>; chomp($occ); $str=~ /(\S+\s){$occ}/; print "left : $& \nRight: $'\n";
      Wrong on several accounts:
      1. It doesn't count leading whitespace.
      2. It counts non-space whitespace as if it were a space.
      3. It only counts whitespace directly following non-whitespace.
      4. It doesn't actually split; the whitespace on which it should split is returned in $&.
      5. It doesn't actually split: it only returns one substring; if you were to fix this by adding /g, whitespace following the "11"th one will be lost.
      6. It only matches chunks that actually have an 11th space. If $str would contain only 5 spaces, the match fails; a split would still return something.
      Better would be (untested):
      @chunks = $str =~ /((?:[^ ]* ){10}[^ ]*|.+)/gs;
      but even that isn't quite split.
Re: Split String after nth occurrence of a charater
by Anonymous Monk on Nov 08, 2011 at 21:28 UTC

    Literal split:

    #!/usr/bin/perl use warnings; use strict; my $_ = join ' ', 1..30; say "Split at 11th whitespace:"; say " <$_>" for split /(?:\s+\S+){10}\K\s+/, $_, 2; say "Split every 11th whitespace:"; say " <$_>" for split /(?:\s+\S+){10}\K\s+/; __END__ Split at 11th whitespace: <1 2 3 4 5 6 7 8 9 10 11> <12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30> Split every 11th whitespace: <1 2 3 4 5 6 7 8 9 10 11> <12 13 14 15 16 17 18 19 20 21 22> <23 24 25 26 27 28 29 30>