author104 has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am looking to extract a specific number of words form a string imported from a text file. This is a web script.

Example.

Text file contains "Mary had a little lamb"

x=3

I then want to pull the first three words, "Mary had a " and display it on a web page.

What I am asking is similar to the $str function, but with words, instead of string length.

How do I do this?

Thanks in advance

Author104

  • Comment on Extract a specific number of words from a string

Replies are listed 'Best First'.
Re: Extract a specific number of words from a string
by hippo (Archbishop) on Nov 25, 2017 at 16:19 UTC

    TIMTOWTDI. Here's one with split and join:

    use strict; use warnings; use Test::More tests => 3; my $in = 'Mary had a little lamb'; is (nwords ($in, 1), 'Mary'); is (nwords ($in, 2), 'Mary had'); is (nwords ($in, 3), 'Mary had a'); sub nwords { my ($src, $num) = @_; return join " ", (split (/ /, $src, $num + 1))[0 .. $num - 1]; }
Re: Extract a specific number of words from a string
by AnomalousMonk (Archbishop) on Nov 25, 2017 at 16:38 UTC

    Essentially the same as 1nickt's approach, except the regex for a word is defined in one place so you can play with it until you get it right, and matches are extracted to an array:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $str = 'mary had a little lamb'; ;; my $word = qr{ [[:alpha:]]+ }xms; ;; my @words = ($str =~ m{ $word }xmsg)[0 .. 2]; dd \@words; " ["mary", "had", "a"]


    Give a man a fish:  <%-{-{-{-<

Re: Extract a specific number of words from a string
by 1nickt (Canon) on Nov 25, 2017 at 16:09 UTC

    Hi, $str is a variable, maybe you meant substr?

    Try this:

    $ perl -Mstrict -wE 'my $str = "mary had a little lamb"; say for ($st +r =~ /(\S+)/g)[0..2]'
    ( See perlrequick, perlretut )

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Extract a specific number of words from a string
by johngg (Canon) on Nov 25, 2017 at 23:09 UTC

    I note that your expected output retains the trailing space after the third word. If that was intentional then you could use a regular expression that uses zero-width look-around assertions to match the starting position of each word, recording them in an array using pos. You can then use substr to pull out everything up to the start of the first word you wish to discard.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' my @strs = ( q{Mary had a little lamb}, q{ its fleece was white as snow}, ); my $nWords = 3; my $qrWordStart = qr{(?x) (?: (?<= \A ) | (?<= \W ) ) # Preceded by either beginning of # string or non-word character (?= \w ) # Followed by a word character }; foreach my $str ( @strs ) { my @posns; push @posns, pos $str while $str =~ m{$qrWordStart}g; say qq{->@{ [ substr $str, 0, $posns[ $nWords ] ] }<-}; }' ->Mary had a <- -> its fleece was <-

    I hope this is of interest.

    Update: Added information about using look-arounds and commented the regex accordingly.

    Cheers,

    JohnGG