Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Seeking a better way to do it

by starX (Chaplain)
on Feb 01, 2013 at 04:55 UTC ( [id://1016436]=perlquestion: print w/replies, xml ) Need Help??

starX has asked for the wisdom of the Perl Monks concerning the following question:

Most esteemed monks, I seek your wisdom in finding a more elegant solution to a problem I'm working on. Given a string: Enter Iago, Othello, and  others I want to extract "Iago" and "Othello" to a data structure. My solution is as follows:
use strict; use warnings; # I'm actually reading this from another source, but am hand codin +g # the string here for demonstration purposes. my $char_list = "Iago, Othello, and others"; my @words = split /\W/, $char_list; foreach my $word (@words) { if ($word =~ m/[A-Z]\w+/) { my @entering_chars; push @entering_chars, $word; } }
My present solution works, but it seems like I'm taking a lot of unnecessary steps to get there. If anyone would care to explain how to do this with a regex, or some other method less dependent on a loop, I would much appreciate it.

Update: correction. I'm not looking to capture "Enter," but it's also been split off from the string by the time I get here.

Replies are listed 'Best First'.
Re: Seeking a better way to do it
by BrowserUk (Patriarch) on Feb 01, 2013 at 05:39 UTC

    Taking your spec as read:

    @words = 'Enter Iago, Othello, and others' =~ m[(\b[A-Z]\w+\b)]g;; print @words;; Enter Iago Othello

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Seeking a better way to do it
by AnomalousMonk (Archbishop) on Feb 01, 2013 at 06:25 UTC

    A few approaches, some already covered by others:

    • The split approach produces a lot of 'noise' (even when using a more reasonable  /\W+/ split pattern) that must be removed with further processing.
    • A tricky approach is to try to figure out just what a 'player' is and define a regex to extract those substrings.
    • Maybe the easiest and most reliable approach is to go to the dramatis personae list at the beginning of the play, look at all the players found there, and make a regex of that.

    >perl -wMstrict -le "my $char_list = 'Exit Cassio; Enter Iago, Othello, and others'; ;; my @words = split /\W+/, $char_list; printf qq{'$_' } for @words; print ''; ;; ;; my $not_player = qr{ (?! Enter | Exit) }xms; my $player = qr{ \b $not_player [[:upper:]] [[:lower:]]+ }xms; ;; my @players = $char_list =~ m{ $player }xmsg; printf qq{'$_' } for @players; print ''; ;; ;; my @dramatis_personae = qw(Cassio Iago Othello); my ($character) = map qr{ \b (?: $_) \b }xms, join '|', @dramatis_personae ; ;; @players = $char_list =~ m{ $character }xmsg; printf qq{'$_' } for @players; " 'Exit' 'Cassio' 'Enter' 'Iago' 'Othello' 'and' 'others' 'Cassio' 'Iago' 'Othello' 'Cassio' 'Iago' 'Othello'
Re: Seeking a better way to do it
by vinoth.ree (Monsignor) on Feb 01, 2013 at 05:15 UTC

    Use grep!!!

    my $char_list = "Enter Iago, Othello, and others"; my @Word_List = grep { /[A-Z]\w+/ } split(/\W/, $char_list); print "@Word_List\n";
    Update:

    Simple way with regular expression,

    my $char_list = "Enter Iago, Othello, and others"; my @Word_List; @Word_List = ($char_list =~ /([A-Z]\w+)/g); print "@Word_List\n";

      That would also match camelCase and Some_shell, you'd want

      my @wordlist = ($string =~ m{\b ([A-Z][a-z]+) \b}gx);

      as BrowserUK also posted


      Enjoy, Have FUN! H.Merijn
      This works assuming OP also wants to return 'Enter'. This is left out of the requirements, but I'm not sure if it is an oversight or not.
Re: Seeking a better way to do it
by 7stud (Deacon) on Feb 01, 2013 at 05:37 UTC
    use strict; use warnings; use 5.012; #for say() my $text = "Enter Iago, Othello, and others"; while ($text =~ / \s+ #A space one or more times ( #Start of $1 [^,]+ #Not a comma, one or more times ) #End of $1 , #A comma /gxms #(g)lobal matching plus standard xms ) { say $1; } --output:-- Iago Othello

    Given a string: Enter Iago, Othello, and others I want to extract "Iago" and "Othello" to a data structure.

    No data structure :(. Reputation--.
Re: Seeking a better way to do it
by frozenwithjoy (Priest) on Feb 01, 2013 at 05:20 UTC
    What is your rule for extracting 'Iago' and 'Othello'? Do you also mean to extract 'Enter' (another word that starts w/ a cap)?

    You said your solution works, but you write plit instead of split and I get other errors. Can you update it with functional code?

      I think the OP meant 'Iago' and 'Othello', so maybe something similar to this:
      #!/usr/bin/perl -l use strict; use warnings; my ($char_list) = "Enter Iago, Othello, and others"; my (@words) = split( /\W/, $char_list, 0 ); foreach my $word (@words) { if ( $word =~ m/[F-Z]\w+/g ) { push my (@entering_chars), $word; print "@entering_chars"; } }

        But starX's script also gives "Enter" word in the output. I guess he/she need to extract words contains Upper case letter

        Ya, we really need more info. Limiting the first letter to F through Z is not very portable. If you already know the words you want (and choose the letters accordingly), you might as well name the words specifically.
Re: Seeking a better way to do it
by starX (Chaplain) on Feb 01, 2013 at 11:08 UTC
    Thanks, everyone, that was what I was looking for.
      Uh, out of curiosity - which was what you were looking for?



      Time flies like an arrow. Fruit flies like a banana.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1016436]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-03-28 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found