Roger has asked for the wisdom of the Perl Monks concerning the following question:

Dear Fellow Monks,

Recently I came accross a problem of how to capture the nth word in a sentense. I quickly came up with the following answer -
use strict; my $str = "Element1 Element2 Element3 Element4 Element5 Element6"; my $nth = @{[$str =~ m/\w+/g]}[3]; print "$nth\n";
And the output is -
Element4
In the above solution, $str =~ m/\w+/g returns an array, I constructed an anonymous array with @{[ ... ]}, and refer to the 4th element (3 zero based).

This solutions works, if not efficient in memory usage.

I am interested to find out how other monks would go about solving the same problem.

Thanks in advance.

Replies are listed 'Best First'.
Re: Capturing the nth word in a string
by Zaxo (Archbishop) on Oct 14, 2003 at 03:27 UTC

    Your @{[$str =~ m/\w+/g]}[3] would be better written as ($str =~ m/\w+/g)[3]. I'd more likely write it as, my $nth = (split ' ', $str)[3]; though that's not exactly the same thing with respect to punctuation, etc. Whether that is as usable to you depends on your data.

    After Compline,
    Zaxo

      Yes you are right. ($str =~ m/\w+/g)[3] is better than @{[$str =~ m/\w+/g]}[3]. I came up with my original solution because I first wanted to do @{$str =~ m/\w+/g}->[3], but that was obviously wrong. So I wrapped it inside @{[ ... ]} to make it work.

      And yes I would use split normally, but that $m = $str =~ m/\w+/g idiom and davido's meditation on TMTOWTDI inspired me to come up with something similar.

        You wanted to say:
        [$str =~ m/\w+/g]->[3]
Re: Capturing the nth word in a string
by delirium (Chaplain) on Oct 14, 2003 at 07:06 UTC
    Don't forget about trusty command-line options, they can come in pretty handy. -a does a split on whitespace into the @F array.

    echo elem1 elem2 elem3 elem4 elem5 | perl -lane 'print $F[3];' elem4
    And speaking of Davido's thread, if you're shamelessly going for extra XP, there's always the ever-present @ARGV array:

    perl -le 'print $ARGV[3]' elem1 elem2 elem3 elem4 elem5 elem4
Re: Capturing the nth word in a string
by grantm (Parson) on Oct 14, 2003 at 07:46 UTC

    Perhaps since you're only interested in one 'word' you could save Perl the bother of capturing all of them:

    print ( $str =~ /(?:\w+\W+){3}(\w+)/ ), "\n";

    Of course having n (3 in this case) buried inside the regex doesn't exactly make for a general solution.

      You don't have to hard code the '3' here. It can easily be a variable. This is a good solution.
      Yes! Indeed this is an excellent solution. Thanks for the efford, I will write it down somewhere for future reference. :-)

      my ($nth) = $str =~ /(?:\w+\W+){3}(\w+)/; # capture 4th word
Re: Capturing the nth word in a string
by pg (Canon) on Oct 14, 2003 at 03:57 UTC
    Yet another way:
    $_ = "ele1 ele2 ele3 ele4 ele5 ele6"; m/(\w+\s*){$ARGV[0]}/; print $1;
    pass in a parameter to indicate which word you want. (you can eaily turn this into a function). This does not create an array and saves space, assuming the only thing you care is the nth.
      Nice. Tiny changes, though... want nth char, not (n+1)th... also, you want \s+, not \s*... and might as well decide that by "word" we mean a run of non-whitespace (that is... probably a bad idea to mix \w and \S... so I'll just go with \S).
      [me@host bin]$ perl -we '$n = $ARGV[0]; $_ = "a b csadf sdfas ddd"; /( +?:\S+\s+){@{[$n-1]}}(\S+)/; print "$1\n"' 1 a [me@host bin]$ perl -we '$n = $ARGV[0]; $_ = "a b csadf sdfas ddd"; /( +?:\S+\s+){@{[$n-1]}}(\S+)/; print "$1\n"' 2 b [me@host bin]$ perl -we '$n = $ARGV[0]; $_ = "a b csadf sdfas ddd"; /( +?:\S+\s+){@{[$n-1]}}(\S+)/; print "$1\n"' 4 sdfas [me@host bin]$

      ------------
      :Wq
      Not an editor command: Wq
        The reason I used \s* not \s+ is, to make it also work for the last word, as there is no space after that one.
Re: Capturing the nth word in a string
by pg (Canon) on Oct 14, 2003 at 03:34 UTC
    $_ = "ele1 ele2 ele3 ele4 ele5 ele6"; my @abc = split(/\s+/); print $abc[3];
Re: Capturing the nth word in a string
by Jasper (Chaplain) on Oct 14, 2003 at 10:59 UTC
    I tried to make a solution that captured the nth word, where n was the more human index. 1 being first, etc.
    $str = "this is a sentence of words and I want the nth one"; $n = 4; # the word we want $a = 0; $a = 1 + index $str, $", $a while --$n; $word = substr $str, $a, -$a + index $str, $", $a; print "$word";
    prints "sentence". It seems very complicated, though.

    Doesn't work for the last word. Bah! Unless I change it to index"$str ", $", $a everywhere. Nuts.
      Here's another possibility
      $num = 4; for($str=~/.?/g){ $h{space} += !/\S/; $word .= $_ if $h{space} == $num-1 && /\S/ .. !/\S/ } print $word;
Re: Capturing the nth word in a string
by exussum0 (Vicar) on Oct 14, 2003 at 12:01 UTC
    I understand your curiosity, and I totally support it. For everyone, not just you, claiming efficiency as a reason isn't so strong. The speed and memory difference on a small sentence, of say.. 20 words, might be nearly as fast as some other solutions.

    Unless you are doing something that requires user interaction or is time dependent, if it works, and is somewhat readable, your solution is great. If you were dealing with a 5e10 words, yeah, efficiency would really matter. Who would want their machine to run out of memory?

    It's like comparing the bubble sort to the merge sort to the quick sort on a small set of data. Oddly enough, bubble sort does really really well since the overhead is so low. But once the size gets significant, it does shitty. AND if you are performing an operation on a small set of data, not only on sorting, it doesn't really matter. It's readable, w/o your explanation. Horay! :)

    Play that funky music white boy..
Re: Capturing the nth word in a string
by Anonymous Monk on Oct 16, 2003 at 08:50 UTC
    Use the split function using whitespaces as delimiters Then you can access your string as an array of words