Vowel search

Noob@Perl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Vowel search by kennethk (Abbot) on Jun 11, 2014 at 20:30 UTC
Please read How do I post a question effectively?. In particular, the code you've posted does not compile; if the compilation is your error, please make sure to include the error message in your post. The compilation error is related to a number of omitted curly brackets. If I make assumptions about where to stick the curly brackets based upon your indentation, you have made an odd choice for reading your lines: you will actually scan lines multiple times. If you want to take in all the data at the start (wholly unnecessary here), it could be done much more cleanly as: `my @lines = <FH>;` [download] Second, your choice to name your variable containing your current line `$file` is odd at best, and conflicts with the file name variable, which could easily lead to confusion. Lastly, your use of regular expressions is quite unnecessarily computationally intensive, and doesn't actually require vowels be adjacent. A read through of perlretut would likely be enlightening. You probably mean something closer to `if ($file =~ /[aeiou]{2}/i) { print $file; }` [download] Of course, this doesn't handle the conditional nature of `y` as a vowel. I assume you have plans to write a machine learning script to train against a dictionary so it can develop heuristics for resolving the ambiguity. Vowel should provide a sufficiently thorough background. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: Vowel search by AnomalousMonk (Archbishop) on Jun 11, 2014 at 21:52 UTC
`if ($file =~ /[aeiou]{2}/i) { ... }` Another small point: case-insensitive matching (enabled by the `/i` regex modiifer (which I see you snuck in there)) imposes a run-time penalty which will become noticable, at a wild guess, for files of more than several thousand lines. Maybe use `$file =~ m{ [AaEeIiOoUu]{2} }xms` to avoid this overhead. Of course, another approach would be to common-case all lines before matching...	[reply] [d/l] [select]
Re^3: Vowel search by Jim (Curate) on Jun 11, 2014 at 22:04 UTC
Well, if we're micro-optimizing, then either we want to do this… `[AaEeIiOoUu][AaEeIiOoUu]` …instead of this… `[AaEeIiOoUu]{2}` …or we want to rewrite the program in C.	[reply] [d/l] [select]
Re^4: Vowel search by AnomalousMonk (Archbishop) on Jun 11, 2014 at 22:47 UTC
Re^5: Vowel search (benchmarks--) by tye (Sage) on Jun 12, 2014 at 01:43 UTC
Some notes below your chosen depth have not been shown here
Re^5: Vowel search by Jim (Curate) on Jun 11, 2014 at 23:13 UTC
Re^3: Vowel search by kennethk (Abbot) on Jun 12, 2014 at 15:24 UTC
I did add that in post, as it occurred to me that was an oversight the OP would likely make if a bread crumb were not left. Of course, the updated code does not include it, so I was unfortunately not obvious enough. If you're concerned with the match performance, it's probably more reasonable to use `$file =~ m{ [AaEeIiOoUu][aeiou] }x` [download] since the simplified English the OP is likely attacking doesn't support two leading capital characters. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l]
Re^4: Vowel search by AnomalousMonk (Archbishop) on Jun 12, 2014 at 23:25 UTC
Re^2: Vowel search by Jim (Curate) on Jun 11, 2014 at 21:47 UTC
Of course, this doesn't handle the conditional nature of y as a vowel. In how many ordinary words is y one of a pair of two consecutive vowels? I assume you have plans to write a machine learning script to train against a dictionary so it can develop heuristics for resolving the ambiguity. I assume the novice Perl programmer with the PerlMonks username Noob@Perl (noob == newbie == neophyte) has no such plans.	[reply]
Re^3: Vowel search by roboticus (Chancellor) on Jun 12, 2014 at 15:27 UTC
Jim: Hey, guy, today a bit of playing with my grey matter suggests they may be fairly common.... ;^D `roboticus@sparky:~$ grep -i -E 'y[aeiou]\|[aeiou]y' /usr/share/dict/ame +rican-english \| wc -l 3244 roboticus@sparky:~$ wc -l /usr/share/dict/american-english 99171 /usr/share/dict/american-english` [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^4: Vowel search by Jim (Curate) on Jun 12, 2014 at 18:00 UTC
Re^5: Vowel search by roboticus (Chancellor) on Jun 12, 2014 at 21:26 UTC
Some notes below your chosen depth have not been shown here
Re^3: Vowel search by kennethk (Abbot) on Jun 12, 2014 at 15:19 UTC
How many words in English are ordinary? And how does one define a vowel? In the word `thigh`, the vowels are `i`, `g`, and `h`. That's certainly more than two consecutive vowel characters, though it's only one vowel sound. You'll have to excuse a poor attempt at humor, attempting to illustrate how poorly constrained the spec is in actuality, and trying to highlight a distinct lack of effort on what strongly resembles a homework assignment. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^4: Vowel search by Jim (Curate) on Jun 12, 2014 at 19:19 UTC
Re^2: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 22:52 UTC
thanks for the references, I shall begin reading this too along with what I already have. :)	[reply]
Re: Vowel search by Jim (Curate) on Jun 11, 2014 at 21:23 UTC
Consider the implicit suggestions in this refactoring of your Perl script. `#!/usr/local/bin/perl use strict; use warnings; use autodie qw( open close ); @ARGV == 1 or die "Usage: perl $0 file\n"; my $file = shift; open my $fh, '<:encoding(UTF-8)', $file; my @lines = <$fh>; close $fh; for my $line (@lines) { print $line if $line =~ m/[aeiou][aeiou]/i; } exit 0;` [download] Since your script is using regular expression pattern matching, it's important that it knows the correct character encoding of the text in the input text file. I've assumed it's in the UTF-8 character encoding form of the Unicode coded character set. If it's in some other character encoding (e.g., Windows-1252), then you need to modify the second argument of `open`. I'm creating a script that reads the file lines into an array and then searches and print the words that have 2 consecutive vowels in them Neither your script nor my refactoring of it are doing exactly this. They're both printing whole lines on which there are two consecutive vowels anywhere on the line. The following script parses each line into words (where "words" are contiguous strings of non-whitespace characters) and then prints each word that has two consecutive vowels in it (where "vowels" are the Latin letters A/a, E/e, I/i, O/o and U/u). `#!/usr/local/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); @ARGV or die "Usage: perl $0 file ...\n"; while (my $line = <ARGV>) { chomp $line; my @words = split ' ', $line; for my $word (@words) { print "$word\n" if $word =~ m/[aeiou][aeiou]/i; } } exit 0;` [download]	[reply] [d/l] [select]
Re^2: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 22:20 UTC
This hit the nail in the head. Just tweaked it a little and managed to `print` the words from the file. I really appreciate your help Jim. I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain!	[reply] [d/l]
Re^3: Vowel search by Jim (Curate) on Jun 11, 2014 at 22:35 UTC
I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain! It sounds like you're doing wrong what I do wrong: Read too much and code too little. It's my worst bad habit. This is precisely why I tried to help you a bit by refactoring your code. Study each of the changes I made very carefully, asking yourself "Why did he do that?" for each one. I promise you you'll learn several helpful lessons. Are you using a good programmer's text editor with Perl syntax highlighting? If not, do. One of the obvious problems you're having is with trivial syntax errors (i.e., typos). A good Perl text editor will help you avoid these.	[reply]
Re^4: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 22:41 UTC
Re^5: Vowel search by Jim (Curate) on Jun 11, 2014 at 22:48 UTC
Some notes below your chosen depth have not been shown here
Re: Vowel search by GotToBTru (Prior) on Jun 11, 2014 at 20:19 UTC
Your code is asking for files that have an a AND an e AND an i AND an o AND a u. What you probably want is `if ($file =~ /[aeiou]{2}/) { print $file; }` [download] `[aeiou]` matches any one of those letters and `{2}` specifies that it should match two in a row. 1 Peter 4:10	[reply] [d/l] [select]
Re^2: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 20:51 UTC
I switched what you suggested though I'm still getting an error, error stated on the edited parent post	[reply]
Re^3: Vowel search by Jim (Curate) on Jun 11, 2014 at 20:56 UTC
You're missing the closing brace `}` of the `if` block, that's all.	[reply] [d/l] [select]
Re^4: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 21:00 UTC
Re^5: Vowel search by kennethk (Abbot) on Jun 11, 2014 at 21:04 UTC
Re: Vowel search by Anonymous Monk on Jun 11, 2014 at 20:29 UTC
What error are you getting? The code sample you provided does not compile. Although we can try to guess what you meant, it'd be much better if you provided working code we can use to reproduce your problem exactly. GotToBTru has already given you guidance on the regex issue.	[reply]
Re^2: Vowel search by Noob@Perl (Novice) on Jun 11, 2014 at 20:52 UTC
sorry, updated the parent post with error	[reply]
Re^3: Vowel search by Anonymous Monk on Jun 11, 2014 at 21:06 UTC
You're missing things in a few places (note I'm making a few assumptions on your code but this seems to make the most sense to me): 1. The closing curly brace of the `while`, which should be after the `push` statement. 2. The opening curly of the `foreach`, which should be on that line instead of the semicolon. 3. The closing curly of the `foreach` (or `if`, depending how you look at it) ...and you just edited the node to take care of that one. As described in How do I change/delete my post?, please mark the changes you have made to your node, because currently GotToBTru's and kennethk's responses don't make sense any more.	[reply] [d/l] [select]
Re: Vowel search by andal (Hermit) on Jun 12, 2014 at 11:37 UTC
In the "Update" part look at line 11 of your code. It reads "foreach $file(@lines);". This ';' at the end is not acceptable, so perl gives you error. You need { instead. Looks like you're also missing closing } for your while. It should be before foreach.	[reply]


Perl: the Markov chain saw
	PerlMonks