Re: Strip user-defined words with regexp
by Limbic~Region (Chancellor) on Mar 04, 2004 at 14:38 UTC
|
Marcello,
You do not say what marks the end of a sentence. In english, there are many ways to do this (period, question mark, exclamation mark, etc). Also, your regex does not look like it should work by your specifications. This will also break on the sentence "How are you today Dr. Smith?"
#!/usr/bin/perl
use strict;
use warnings;
my $msg = "One bright day in the middle of the night,\n";
$msg .= "two dead men got up to fight.\n";
$msg .= "Back to Back they faced each other,\n";
$msg .= "drew their swords and shot each other.\n";
$msg .= "A deaf police man heard this noise,\n";
$msg .= "came and killed those two dead boys.\n";
$msg .= "If you don't believe this lie is true,\n";
$msg .= "ask the blind man - he saw it too!\n";
$msg =~ tr/\n//d;
for my $sentence ( split /[.!?]/ , $msg ) {
if ( $sentence =~ /^\s*([a-zA-Z0-9]+)\s+([a-zA-Z0-9]+)\s+/ ) {
print "$1 $2\n";
}
}
__END__
One bright
Back to
A deaf
If you
Cheers - L~R | [reply] [d/l] |
Re: Strip user-defined words with regexp
by rnahi (Curate) on Mar 04, 2004 at 14:37 UTC
|
I would do it this way. I don't know if it deserves an high grade, but it gets the job done :).
my $count = 0;
while ( $message =~ /([A-Za-z0-9]+)/g) {
last if $count++ > 1;
print "$1\n";
}
| [reply] [d/l] |
|
|
rnahi,
"but it gets the job done :)."
Sorry to nitpick, but actually it doesn't. The first two words were being desired of each sentence.
L~R
Updated to clarify Updated is in italics. I am still wrong because of a misinterpretation of the OP's requirements.
| [reply] |
|
|
| [reply] |
|
|
|
|
I think my original post was a bit too unclear.
I am looking for at most the first two words. So it may be zero, one or two.
I've modified his code to:
my $firstWord = undef;
my $secondWord = undef;
my $i = 0;
while ($message =~ /([A-Za-z0-9]+)/g) {
if ($i == 0) {
$firstWord = $1;
}
elsif ($i == 1) {
$secondWord = $1;
last;
}
$i++;
}
which does exactly what I was looking for. I only tried to do it in one regexp.
Thanks, Marcel
| [reply] [d/l] |
Re: Strip user-defined words with regexp
by BrowserUk (Patriarch) on Mar 04, 2004 at 16:01 UTC
|
my( $word1, $word2 ) = ( grep $_, split /[^A-Za-z0-9]+/, $message )[0,
+1]
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
|
|
Out of curiosity: why not just
my ($word1, $word2) = split(/[^A-Za-z0-9]+/, $message);
?
Marcel | [reply] [d/l] |
|
|
$message = "\n +_ ABC1_\n2 3 4";
print join'|', ( grep $_, split /[^A-Za-z0-9]+/ , $message )[0,1];
ABC1|2
Without $message = "\n +_ ABC1_\n2 3 4";
print join'|', split /[^A-Za-z0-9]+/ , $message;
|ABC1|2|3|4
You'll notice the null leading element.
The list slice is pretty redundant, but it does make it obvious that you are only wanting the first two.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
Re: Strip user-defined words with regexp
by Happy-the-monk (Canon) on Mar 04, 2004 at 14:32 UTC
|
What makes up or defines a sentence? The newline? The dot if it's not followed by a word character? | [reply] |
|
|
Sorry, the term phrase is probably better. I am looking for the first two words in a phrase, the phrase can end with anything and can contain newlines, etc etc.
| [reply] |
Re: Strip user-defined words with regexp
by halley (Prior) on Mar 04, 2004 at 14:32 UTC
|
This kinda sounds like homework. You might try typing the following command at your command prompt.
perldoc perlre
Check out the \w match symbol, and ask yourself why you're using all these * in a regex when the problem as stated says +.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
|
|
I knew somebody was going to say this...
It's not, I have an application which has to determine by the first two words of a phrase what todo. This phrase can be anything, it might even be only one word. Examples:
my $message = "test one";
my $message = "test";
my $message = "_$ test...";
my $message = "_$ TEST..1..";
my $message = "_$\nTEST1.\n.1.2.3";
BTW: \w is not helping me here, since I do not want the underscore character.
| [reply] [d/l] |
|
|
if ( $message =~ /([a-zA-Z0-9]+)[^a-zA-Z0-9]*([a-zA-Z0-9]*)/ ) {
print "$1 $2\n";
}
| [reply] [d/l] |