Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Vowel search

by Noob@Perl (Novice)
on Jun 11, 2014 at 20:15 UTC ( [id://1089589]=perlquestion: print w/replies, xml ) Need Help??

Noob@Perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, looking for some Perl enlightenment, i'm creating a script that reads the file lines into an array and then searches and print the words that have 2 consecutive vowels in them. This is what I have so far:

#!/usr/local/bin/perl use strict; use warnings; use 5.10.0; my $file = "faketext.txt"; open (FH, "< $file") or die "Can't open $file $!"; my @lines; while (<FH>) { push @lines, $_; foreach $file(@lines) if(($file =~ /a/) && ($file =~ /e/) && ($file =~ /i/) && ($file =~ + /o/) && ($file =~ /u/)) { print $file; }

Though it is giving me an error and not printing the words with the consecutive vowels. Guidance please?!

Update:

Hello Monks, looking for some Perl enlightenment, i'm creating a script that reads the file lines into an array and then searches and print the words that have 2 consecutive vowels in them. This is what I have so far:

#!/usr/local/bin/perl use strict; use warnings; use 5.10.0; my $file = "faketext.txt"; open (FH, "< $file") or die "Can't open $file $!"; my @lines; while (<FH>) { push @lines, $_; foreach $file(@lines); if ($file = ~ /[aeiou]{2}/) { print $file; } }
syntax error at vowels.pl line 11, near ");" execution of vowels.pl aborted due to compilation errors

Replies are listed 'Best First'.
Re: Vowel search
by kennethk (Abbot) on Jun 11, 2014 at 20:30 UTC
    Please read How do I post a question effectively?. In particular, the code you've posted does not compile; if the compilation is your error, please make sure to include the error message in your post. The compilation error is related to a number of omitted curly brackets.

    If I make assumptions about where to stick the curly brackets based upon your indentation, you have made an odd choice for reading your lines: you will actually scan lines multiple times. If you want to take in all the data at the start (wholly unnecessary here), it could be done much more cleanly as:

    my @lines = <FH>;

    Second, your choice to name your variable containing your current line $file is odd at best, and conflicts with the file name variable, which could easily lead to confusion.

    Lastly, your use of regular expressions is quite unnecessarily computationally intensive, and doesn't actually require vowels be adjacent. A read through of perlretut would likely be enlightening. You probably mean something closer to

    if ($file =~ /[aeiou]{2}/i) { print $file; }
    Of course, this doesn't handle the conditional nature of y as a vowel. I assume you have plans to write a machine learning script to train against a dictionary so it can develop heuristics for resolving the ambiguity. Vowel should provide a sufficiently thorough background.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      if ($file =~ /[aeiou]{2}/i) { ... }

      Another small point: case-insensitive matching (enabled by the  /i regex modiifer (which I see you snuck in there)) imposes a run-time penalty which will become noticable, at a wild guess, for files of more than several thousand lines. Maybe use
          $file =~ m{ [AaEeIiOoUu]{2} }xms
      to avoid this overhead. Of course, another approach would be to common-case all lines before matching...

        Well, if we're micro-optimizing, then either we want to do this…

        [AaEeIiOoUu][AaEeIiOoUu]

        …instead of this…

        [AaEeIiOoUu]{2}

        …or we want to rewrite the program in C.

        I did add that in post, as it occurred to me that was an oversight the OP would likely make if a bread crumb were not left. Of course, the updated code does not include it, so I was unfortunately not obvious enough. If you're concerned with the match performance, it's probably more reasonable to use
        $file =~ m{ [AaEeIiOoUu][aeiou] }x
        since the simplified English the OP is likely attacking doesn't support two leading capital characters.

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Of course, this doesn't handle the conditional nature of y as a vowel.

      In how many ordinary words is y one of a pair of two consecutive vowels?

      I assume you have plans to write a machine learning script to train against a dictionary so it can develop heuristics for resolving the ambiguity.

      I assume the novice Perl programmer with the PerlMonks username Noob@Perl (noob == newbie == neophyte) has no such plans.

        Jim:

        Hey, guy, today a bit of playing with my grey matter suggests they may be fairly common.... ;^D

        roboticus@sparky:~$ grep -i -E 'y[aeiou]|[aeiou]y' /usr/share/dict/ame +rican-english | wc -l 3244 roboticus@sparky:~$ wc -l /usr/share/dict/american-english 99171 /usr/share/dict/american-english

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        How many words in English are ordinary? And how does one define a vowel? In the word thigh, the vowels are i, g, and h. That's certainly more than two consecutive vowel characters, though it's only one vowel sound.

        You'll have to excuse a poor attempt at humor, attempting to illustrate how poorly constrained the spec is in actuality, and trying to highlight a distinct lack of effort on what strongly resembles a homework assignment.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      thanks for the references, I shall begin reading this too along with what I already have. :)

Re: Vowel search
by Jim (Curate) on Jun 11, 2014 at 21:23 UTC

    Consider the implicit suggestions in this refactoring of your Perl script.

    #!/usr/local/bin/perl use strict; use warnings; use autodie qw( open close ); @ARGV == 1 or die "Usage: perl $0 file\n"; my $file = shift; open my $fh, '<:encoding(UTF-8)', $file; my @lines = <$fh>; close $fh; for my $line (@lines) { print $line if $line =~ m/[aeiou][aeiou]/i; } exit 0;

    Since your script is using regular expression pattern matching, it's important that it knows the correct character encoding of the text in the input text file. I've assumed it's in the UTF-8 character encoding form of the Unicode coded character set. If it's in some other character encoding (e.g., Windows-1252), then you need to modify the second argument of open.

    I'm creating a script that reads the file lines into an array and then searches and print the words that have 2 consecutive vowels in them

    Neither your script nor my refactoring of it are doing exactly this. They're both printing whole lines on which there are two consecutive vowels anywhere on the line. The following script parses each line into words (where "words" are contiguous strings of non-whitespace characters) and then prints each word that has two consecutive vowels in it (where "vowels" are the Latin letters A/a, E/e, I/i, O/o and U/u).

    #!/usr/local/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); @ARGV or die "Usage: perl $0 file ...\n"; while (my $line = <ARGV>) { chomp $line; my @words = split ' ', $line; for my $word (@words) { print "$word\n" if $word =~ m/[aeiou][aeiou]/i; } } exit 0;
      This hit the nail in the head. Just tweaked it a little and managed to print the words from the file. I really appreciate your help Jim. I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain!
        I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain!

        It sounds like you're doing wrong what I do wrong:  Read too much and code too little. It's my worst bad habit. This is precisely why I tried to help you a bit by refactoring your code. Study each of the changes I made very carefully, asking yourself "Why did he do that?" for each one. I promise you you'll learn several helpful lessons.

        Are you using a good programmer's text editor with Perl syntax highlighting? If not, do. One of the obvious problems you're having is with trivial syntax errors (i.e., typos). A good Perl text editor will help you avoid these.

Re: Vowel search
by GotToBTru (Prior) on Jun 11, 2014 at 20:19 UTC

    Your code is asking for files that have an a AND an e AND an i AND an o AND a u. What you probably want is

    if ($file =~ /[aeiou]{2}/) { print $file; }

    [aeiou] matches any one of those letters and {2} specifies that it should match two in a row.

    1 Peter 4:10
      I switched what you suggested though I'm still getting an error, error stated on the edited parent post

        You're missing the closing brace } of the if block, that's all.

Re: Vowel search
by Anonymous Monk on Jun 11, 2014 at 20:29 UTC

    What error are you getting?

    The code sample you provided does not compile. Although we can try to guess what you meant, it'd be much better if you provided working code we can use to reproduce your problem exactly.

    GotToBTru has already given you guidance on the regex issue.

      sorry, updated the parent post with error

        You're missing things in a few places (note I'm making a few assumptions on your code but this seems to make the most sense to me): 1. The closing curly brace of the while, which should be after the push statement. 2. The opening curly of the foreach, which should be on that line instead of the semicolon. 3. The closing curly of the foreach (or if, depending how you look at it) ...and you just edited the node to take care of that one.

        As described in How do I change/delete my post?, please mark the changes you have made to your node, because currently GotToBTru's and kennethk's responses don't make sense any more.

Re: Vowel search
by andal (Hermit) on Jun 12, 2014 at 11:37 UTC

    In the "Update" part look at line 11 of your code. It reads "foreach $file(@lines);". This ';' at the end is not acceptable, so perl gives you error. You need { instead.

    Looks like you're also missing closing } for your while. It should be before foreach.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1089589]
Approved by toolic
Front-paged by Jim
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-18 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found