chiburashka has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I wanted to ask you, if i have taken the text that's in a file and stored it in an array by : @all=<DAT>; and then split all the lies by the parameters "." and ";" with : @y = split(/"[(.,;)]"/, @all);

Now, my question is how to join the lines togeter so that in every index in the array will be contained one whole line - that start with a capital letter(have to come after nothing or "." or ";") and end with "." or ";" ?

And is there anything i'm doing wrong ?

"You can stop an individual, but you can't stop us all. After all, we are all alike."

Replies are listed 'Best First'.
Re: sorting text into sentences.
by Random_Walk (Prior) on Sep 06, 2004 at 08:15 UTC
    perhaps you should take a look at Lingua::EN::Sentence This will handle some common sentence splitting probs like acronyms/abreviations. Something like this may do all you need.
    use Lingua::EN::Sentence qw(get_sentences); open FH, "SomeFile.txt" or die "can't open: $!\n"; { local $/; my $all=<FH>; } my $sentences=get_sentences($all); ## Get the sentences.

    Looking at your code, I don't think split will work like this (spliting an entire array in one). you will need to foreach through @all and split each element individualy or possibly play some fun and games with map.

    my @all=<>; print "all is:\n"; foreach (@all) { print "$_"; foreach (split /([.,;])/, $_) {print "$_\n"} }
    With the parenthesis round the split pattern you get the punctuation that split acted on stored in your array too.

    From your later description I think you may want to look at splitting on something like [A..Z]\s*(\.|;) and as you are slurping in the entire file and not making use of the line breaks perhaps you want to try

    local $/; my $all=<>;
    then split $all into the array of lines Cheers,
    R.
Re: sorting text into sentences.
by lidden (Curate) on Sep 06, 2004 at 09:28 UTC
    Using 'A zero-width positive look-ahead assertion' and 'A zero-width positive look-behind assertion.' in the split should do what you want. Read "perldoc perlre".
    my $text; { local $/ = undef; open my $fh, 'file_name' or die "Damn $!"; $text = <$fh>; } $text =~ tr/\n/ /; my @arr = split /(?<=[.,;])\s*(?=[A-Z])/, $text; foreach (@arr) { print ':', $_, ":\n"; }
Re: sorting text into sentences.
by davidj (Priest) on Sep 06, 2004 at 07:35 UTC
    I really want to help you, but I do not understand what you are asking. Could you please clarify your question.

    davidj

      Thanks for the will to help. Look i open a file that contains text(e.g perlref.txt), and i want to make eventually an array, that every index of which will contain every line (accordingly) by the next rules :
      1. every line shold start with a capital letter.
      2. every line should end with "." or ";"
      3. befor every line's begining there should be the character "." or ";" (let's even assume that before the first line there's a dot.)

      "If you know the right question to ask, you already know the answer."
        This will do it for you.

        Sample text file:
        C:\Temp>type t.txt This is a test sentence. And this is another one; Also so is this. And by the way, this is the fourth sentence.
        Now the perl code:
        #!/usr/bin/perl use strict; use warnings; my $s; my @arr; open(FILE, "<t.txt"); while(<FILE>) { chomp $_; $s .= $_; } @arr = $s =~ m/[A-Z].+?[.;]/g; foreach (@arr) { print $_, "\n"; }
        Now the output:
        C:\Temp>t.pl This is a test sentence. And this is another one; Also so is this. And by the way, this is the fourth sentence.
        as you can see, each array position in @arr contains 1 sentence (as you have defined it).

        hope this helps,

        davidj