fadingjava has asked for the wisdom of the Perl Monks concerning the following question:

hi, i am using the following code to extract single words(filtering out all punctaution marks) from a text (compound words like the-hell are counted as one word). i have to use this script on letters,e-mails,text documents etc. the string i used here is just an example. is there any better way to do it?
@single = qw(I sit here and think- what the-hell am i doing with perl?); foreach (@single) { $_=~ s/([^A-Za-z0-9-])//i; }
Also when i extract the text from database into a variable and do it this way
$a= "I sit here and think- what the-hell am i doing with perl?" qw($a);
it only returns the variable name "a". how can i get the value of the variable there? i want it to return single words in the text contained in $a. how can i do this?? alternatively, if i have to extract single words from a text (excluding punctuation marks), how do i do it?

janitored by ybiC: Retitle from atrocious "HELP WITH REGULAR EXPRESSIONS!!!!!!"

Replies are listed 'Best First'.
Re: Split a string into individual words
by pzbagel (Chaplain) on Feb 06, 2004 at 01:11 UTC

    Basically, qw() is the wrong construct in your second bit of code. qw() assumes that whatever is in the delimiters is text, not a variable. What you want is split.

    #replace qw($a); #with my @words=split(/\s+/,$a);

    Later

Re: Split a string into individual words
by Roger (Parson) on Feb 06, 2004 at 00:21 UTC
    # --- Part one --- @single = qw(I sit here and think- what the-hell am i doing with perl?); foreach (@single) { s/[,.?-]//g; # Updated: strip punctuation print "$_\n"; } # --- Part two --- $a= "I sit here and think- what the-hell am i doing with perl?"; $a=~ s/[,.?-]//g; print "$_\n" for $a =~ /(\w+)/g;

    And the output -
    I sit here and think what thehell am i doing with perl

      Roger,

      Your code fails to keep 'the-hell' as one word which the original poster stated was a requirement. Also it doesn't answer the original posters question about why qw($a) only leaves then with 'a' after s///.

        Ah, thanks for pointing out the bit about 'the-hell'.

Re: Split a string into individual words
by zentara (Cardinal) on Feb 08, 2004 at 14:18 UTC
    Well since no one mentioned the word boundary method, I submit the following:
    #!/usr/bin/perl $sentence= $a= "I sit here and think- what the-hell am i doing with perl?"; @segments = split(/\b$word\b/, $sentence); #@segments = grep { $_ ne '' } @segments; #remove undef elements @segments = grep { $_ ne ' ' } @segments; #remove space elements #print "$#segments\n"; print "$sentence\n"; $"="\n"; print "@segments\n";
    I don't know the best way of getting rid of the newlines. Output:
    I sit here and think - what the - hell am i doing with perl ?