c0d34w4y has asked for the wisdom of the Perl Monks concerning the following question:

Fella Monks...

The code below is a demo of my attempt at
writing a little function to retreave N first
sentences from a string.  I wonder if there
would be any suggestion on your part as to 
other implementations to achieve similar results?

Cheers,
c0d34w4y

my $s = "foo. bar. foobar. foo bar."; my $p = get_nsentences($s,2); # $p = "foo. bar."; print $p; print "done\n"; ## SUBS # get n first sentences from a given string. sub get_nsentences { my $source_txt = $_[0]; my $ret_txt = ""; # i know this looks twisted... for($i=$_[1]||1;$i>0 && length($source_txt);$i--) { $source_txt =~ + s/([^\.]+\.)/(($ret_txt.=$1)&&"")/e; } return $ret_txt; }


--
print join(" ", map { sprintf "%#02x", $_ }unpack("C*",pack("L",0x1234 +5678)))

Replies are listed 'Best First'.
Re: Retreaving N first sentences from text.
by Beatnik (Parson) on Dec 15, 2001 at 03:48 UTC
Re: Retreaving N first sentences from text.
by Zaxo (Archbishop) on Dec 15, 2001 at 04:05 UTC

    Lingua::EN::Sentence has lots of re's:

    use Lingua::EN::Sentence qw( get_sentences ); sub nsentences { my ($num, @text) = @_; my $sentences = get_sentences(join " ",@text); join " ", @{$sentences}[0..$num-1]; }

    After Compline,
    Zaxo

      I knew my code was far from being perfect ;-)
      Thankx for great suggestions... I'm going to
      change my code a bit now. I think the Lingua::EN::Sentence module
      might be of some use.
      


      --
      print join(" ", map { sprintf "%#02x", $_ }unpack("C*",pack("L",0x1234 +5678)))
Re: Retreaving N first sentences from text.
by dws (Chancellor) on Dec 15, 2001 at 03:58 UTC
    There was a good article in TPJ a few issues back that included a discussion of the inner workings of Text::Sentence, and how it recognizes sentences.

Re: Retreaving N first sentences from text.
by dragonchild (Archbishop) on Dec 15, 2001 at 03:49 UTC
    Uhhh.. if you know that all sentences are separated by the '.', then why not do something like:
    sub get_nsentences { my ($source_text, $n) = @_; my @sentences = split /\.\s*/, $source_text; $ret_text = (join '.', @sentences[0..$n-1]) . '.'; return $ret_text; }

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      broken : She said "I believe this will be the winter of my discontent. Then again, I may drink some egg-nog. Who knows?".
      How many sentences there?
      You could use the /g regex modifier to overcome the 'only periods' assumption....
      sub get_nsentences { my $text = shift; my $count = shift || 1; join '', ($text =~ /.*?[\.!?]/sg)[0..$count-1] }
      This keeps the punctuation with the "sentence" so you don't have to keep track of what it was.... i.e. you can join with '', not with '.'

      -Blake