Re: Perl Split
by Corion (Patriarch) on Mar 13, 2006 at 12:49 UTC
|
You don't show us your data, but my guess is, that your data has a space after every fullstop. Of course, when you don't have anything to discard, you could simply avoid the split and just match what you want to keep:
use strict;
while (my $xline = <DATA>) {
chomp $xline;
my @sentences = ($xline =~ /(.*?(?:\.|$))\s*/g);
print "---\n";
print "$_\n" for @sentences;
print "---\n";
};
__DATA__
First line. Second line. Third line.
Fourth line. Fifth line
| [reply] [d/l] |
Re: Perl Split
by McDarren (Abbot) on Mar 13, 2006 at 13:07 UTC
|
One of my favourite debugging tools (apart from print) is Data::Dumper::Simple
If you are new to Perl (or even if you aren't), I'd really recommend getting into the habit of using it. It can very quickly and easily show you what's going on with your data - and it becomes especially useful once you start working with more complex data structures.
With regards to your current problem, as Corion points out, you're probably just forgetting about the whitespace after each period (full stop).
To demonstrate how Data::Dumper::Simple could have helped you to easily see this for yourself, consider the following:
#!/usr/bin/perl -w
use strict;
use Data::Dumper::Simple;
my @lines;
while (<DATA>) {
chomp;
@lines = split (/(?<=\.)/, $_);
}
print Dumper(@lines);
__DATA__
First line. Second Line. Third line. Fourth line.
Which outputs:
@lines = (
'First line.',
' Second Line.',
' Third line.',
' Fourth line.'
);
So straight away it becomes apparent what's going on - ie. you are capturing the space after each period.
Hope this helps,
Darren :) | [reply] [d/l] [select] |
|
|
Thanks again all for input;
I have never used Data Dumper before.
Please excuse my ignorance but when unzipped where do the various components go?
| [reply] |
|
|
Don't feel you need to excuse ignorance -- we were all there once (even if some of us forget from time to time)!
It sounds like you're asking how to install Data::Dumper::Simple. For that, let me suggest you read A Guide to Installing Modules from the Tutorials section.
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] |
|
|
If you are using ActiveState Perl on Win32, then all you probably need to do is open a command window and type "ppm install Data::Dumper::Simple"
On a *nix system, then it would be "perl -MCPAN -e install Data::Dumper::Simple"
But there is of course a lot more to understand about installing and using modules than that - so yes, as xdg suggests - check out the tutorials :)
| [reply] |
|
|
Re: Perl Split
by Samy_rio (Vicar) on Mar 13, 2006 at 12:52 UTC
|
Hi, Here is one way to do it.
local $/;
$xline=<DATA>;
chomp($xline);
@xline = split (/(?<=\.)\s*/, $xline);
foreach(@xline){
print "$_\n";
}
__DATA__
This is my first line. This is my second line. This is my third line.
Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';
| [reply] [d/l] [select] |
|
|
Depending on the properties of the data, one might be better off insisting on whitespace after period to define a full-stop:
$_ = "Next line (2.) contains a url. The url is perlmonks.org. Send th
+em \$20.00.\n";
@sentences = split(/(?<=\.)\s+/);
print "$_\n" for ( @sentences );
Of course, things like "Mr. Smith", etc. will make it... a little more complicated. (Full stops not followed by space "are left as an excercise for the reader.") | [reply] [d/l] |
Re: Perl Split
by ww (Archbishop) on Mar 13, 2006 at 13:12 UTC
|
perldoc -f split
Not possible to be positive without your data, but suspect it is like this:
This is my first line. This is my second line. This is....
Oh! See the space after each period. Guess what your split does with it. Right!
and, in apology for the flippancy of the above, a confession: I puzzled considerably over the same thing when I first encountered it. FWIW, a good solid understanding of regexen will make using split (and perl, in general) easier, more productive, and a darn-sight more fun.
| [reply] |
Re: Perl Split
by thundergnat (Deacon) on Mar 13, 2006 at 15:05 UTC
|
Why don't you just change the input record separator?
This will have trouble with abbreviations, but then again, so will yours...
use warnings;
use strict;
{
local $/ = '.';
while (<DATA>) {
s/\cM\cJ|\cM|\cJ/ /g; # clean up newlines (any OS)
s/^\s+//g; # and leading spaces
s/\s\s+/ /g; # and double spaces
print "$_\n";
}
}
__DATA__
This is my first line. This is my second line. This is
my third line. This is my fourth line. This is my fifth
line. This is my sixth line. This is my seventh line.
This is my eighth line. This is my ninth line. This is
my tenth line.
| [reply] [d/l] |