Re: Vowel search
by kennethk (Abbot) on Jun 11, 2014 at 20:30 UTC
|
Please read How do I post a question effectively?. In particular, the code you've posted does not compile; if the compilation is your error, please make sure to include the error message in your post. The compilation error is related to a number of omitted curly brackets.
If I make assumptions about where to stick the curly brackets based upon your indentation, you have made an odd choice for reading your lines: you will actually scan lines multiple times. If you want to take in all the data at the start (wholly unnecessary here), it could be done much more cleanly as:
my @lines = <FH>;
Second, your choice to name your variable containing your current line $file is odd at best, and conflicts with the file name variable, which could easily lead to confusion.
Lastly, your use of regular expressions is quite unnecessarily computationally intensive, and doesn't actually require vowels be adjacent. A read through of perlretut would likely be enlightening. You probably mean something closer to if ($file =~ /[aeiou]{2}/i) {
print $file;
}
Of course, this doesn't handle the conditional nature of y as a vowel. I assume you have plans to write a machine learning script to train against a dictionary so it can develop heuristics for resolving the ambiguity. Vowel should provide a sufficiently thorough background.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
|
if ($file =~ /[aeiou]{2}/i) { ... }
Another small point: case-insensitive matching (enabled by the /i regex modiifer (which I see you snuck in there)) imposes a run-time penalty which will become noticable, at a wild guess, for files of more than several thousand lines. Maybe use
$file =~ m{ [AaEeIiOoUu]{2} }xms
to avoid this overhead. Of course, another approach would be to common-case all lines before matching...
| [reply] [d/l] [select] |
|
Well, if we're micro-optimizing, then either we want to do this…
[AaEeIiOoUu][AaEeIiOoUu]
…instead of this…
[AaEeIiOoUu]{2}
…or we want to rewrite the program in C.
| [reply] [d/l] [select] |
|
|
|
|
|
I did add that in post, as it occurred to me that was an oversight the OP would likely make if a bread crumb were not left. Of course, the updated code does not include it, so I was unfortunately not obvious enough. If you're concerned with the match performance, it's probably more reasonable to use
$file =~ m{ [AaEeIiOoUu][aeiou] }x
since the simplified English the OP is likely attacking doesn't support two leading capital characters.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] |
|
|
| [reply] |
|
roboticus@sparky:~$ grep -i -E 'y[aeiou]|[aeiou]y' /usr/share/dict/ame
+rican-english | wc -l
3244
roboticus@sparky:~$ wc -l /usr/share/dict/american-english
99171 /usr/share/dict/american-english
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
|
|
|
|
How many words in English are ordinary? And how does one define a vowel? In the word thigh, the vowels are i, g, and h. That's certainly more than two consecutive vowel characters, though it's only one vowel sound.
You'll have to excuse a poor attempt at humor, attempting to illustrate how poorly constrained the spec is in actuality, and trying to highlight a distinct lack of effort on what strongly resembles a homework assignment.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Vowel search
by Jim (Curate) on Jun 11, 2014 at 21:23 UTC
|
#!/usr/local/bin/perl
use strict;
use warnings;
use autodie qw( open close );
@ARGV == 1 or die "Usage: perl $0 file\n";
my $file = shift;
open my $fh, '<:encoding(UTF-8)', $file;
my @lines = <$fh>;
close $fh;
for my $line (@lines) {
print $line if $line =~ m/[aeiou][aeiou]/i;
}
exit 0;
Since your script is using regular expression pattern matching, it's important that it knows the correct character encoding of the text in the input text file. I've assumed it's in the UTF-8 character encoding form of the Unicode coded character set. If it's in some other character encoding (e.g., Windows-1252), then you need to modify the second argument of open.
I'm creating a script that reads the file lines into an array and then searches and print the words that have 2 consecutive vowels in them
Neither your script nor my refactoring of it are doing exactly this. They're both printing whole lines on which there are two consecutive vowels anywhere on the line. The following script parses each line into words (where "words" are contiguous strings of non-whitespace characters) and then prints each word that has two consecutive vowels in it (where "vowels" are the Latin letters A/a, E/e, I/i, O/o and U/u).
#!/usr/local/bin/perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std );
@ARGV or die "Usage: perl $0 file ...\n";
while (my $line = <ARGV>) {
chomp $line;
my @words = split ' ', $line;
for my $word (@words) {
print "$word\n" if $word =~ m/[aeiou][aeiou]/i;
}
}
exit 0;
| [reply] [d/l] [select] |
|
This hit the nail in the head. Just tweaked it a little and managed to print the words from the file. I really appreciate your help Jim. I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain!
| [reply] [d/l] |
|
I don't know why i'm having such a hard time with Perl...Iv'e read and read and it just flies through my brain!
It sounds like you're doing wrong what I do wrong: Read too much and code too little. It's my worst bad habit. This is precisely why I tried to help you a bit by refactoring your code. Study each of the changes I made very carefully, asking yourself "Why did he do that?" for each one. I promise you you'll learn several helpful lessons.
Are you using a good programmer's text editor with Perl syntax highlighting? If not, do. One of the obvious problems you're having is with trivial syntax errors (i.e., typos). A good Perl text editor will help you avoid these.
| [reply] |
|
|
|
Re: Vowel search
by GotToBTru (Prior) on Jun 11, 2014 at 20:19 UTC
|
Your code is asking for files that have an a AND an e AND an i AND an o AND a u. What you probably want is
if ($file =~ /[aeiou]{2}/) {
print $file;
}
[aeiou] matches any one of those letters and {2} specifies that it should match two in a row.
| [reply] [d/l] [select] |
|
I switched what you suggested though I'm still getting an error, error stated on the edited parent post
| [reply] |
|
| [reply] [d/l] [select] |
|
|
Re: Vowel search
by Anonymous Monk on Jun 11, 2014 at 20:29 UTC
|
What error are you getting?
The code sample you provided does not compile. Although we can try to guess what you meant, it'd be much better if you provided working code we can use to reproduce your problem exactly.
GotToBTru has already given you guidance on the regex issue.
| [reply] |
|
sorry, updated the parent post with error
| [reply] |
|
You're missing things in a few places (note I'm making a few assumptions on your code but this seems to make the most sense to me): 1. The closing curly brace of the while, which should be after the push statement. 2. The opening curly of the foreach, which should be on that line instead of the semicolon. 3. The closing curly of the foreach (or if, depending how you look at it) ...and you just edited the node to take care of that one.
As described in How do I change/delete my post?, please mark the changes you have made to your node, because currently GotToBTru's and kennethk's responses don't make sense any more.
| [reply] [d/l] [select] |
Re: Vowel search
by andal (Hermit) on Jun 12, 2014 at 11:37 UTC
|
In the "Update" part look at line 11 of your code. It reads
"foreach $file(@lines);".
This ';' at the end is not acceptable, so perl gives you error. You need { instead.
Looks like you're also missing closing } for your while. It should be before foreach.
| [reply] |