optikool has asked for the wisdom of the Perl Monks concerning the following question:

Hi I'm trying to open a dictionary file and read a list of words from that file. Using a "while (my $words = <DICT>)", I'm trying to match words that end in "a"... "$words =~ /a$/i" and assign it to an array $matches[$counter] = $words . However I know that there are \n's after each word and it will not match. When I try to chomp $words to remove the \n, the word is also removed. I also tried to match /a\n/i without any success either. I can however match the new line character but that would only print out every word. Can anybody tell me the best way I can get around this issue while still using regular expressions? Thanks!

Replies are listed 'Best First'.
Re: Regular Expression
by Zaxo (Archbishop) on May 18, 2003 at 05:41 UTC
    while (my $words = <DICT>) { chomp $words; push @matches, $words if $words =~ /a$/ }
    or just  my @matches = grep { chomp; /a$/ } <DICT>;

    Update: With CountZero's caveat, the second example will work fine on windows. That is Perl's internal grep, not the unix system utility of the same name. Perl chomp removes whatever $/ is from the end of a string - by default that is the native newline.

    After Compline,
    Zaxo

      The grep solution slurps the whole dictionary file as one list and can make for a huge process if the dictionary is large.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Thanks for the information Zaxo. The first example cuts down a lot of code I had plan to use. I can't really use the second example because I need this to work on both windows and unix, though eventually it will be unix only. However the words are still missing when the search is started. Here is the code...
      use strict; use warnings; my @matches = (); open (DICT, "dictionary.txt") or die "Dictionary.txt: $!\n"; while (my $words = <DICT>) { chomp $words; push @matches, $words . " 1" if $words =~ /a$/; push @matches, $words . " 2" if $words =~ /.*[i].*[i].*[i].*[i].*[ +i].*/gi; push @matches, $words . " 3" if $words =~ /[^aeiou]/gi; my $reverse = reverse($words); push @matches, $words . " 4" if $words eq $reverse; } close (DICT); foreach (@matches) { print $_; }
        Regarding this issue:

        I need this to work on both windows and unix

        The safest way to remove line terminations across platforms is:

        s/[\r\n]+//;
        For that matter, you could probably just do s/\s+//g; based on the assumption that the only white space to be found in your dictionary file is the line breaks.
        You might want to check to see if the words are there inside of the while loop. A simple print statement like
        print "----$words-----\n";
        inserted after the chomp statement may detect the problem.
Re: Regular Expression
by Abigail-II (Bishop) on May 18, 2003 at 09:23 UTC
    Could you please show us the code that is failing? (And keep it short). Because /a$/i is supposed to match strings that end with "a\n". And that chomping removes the entire word doesn't make any sense. I think you made a mistake in your code, and are blaming the buildins for it.

    Abigail

Re: Regular Expression
by arthas (Hermit) on May 18, 2003 at 18:25 UTC
    chomp() shouldn't really remove the whole word, even a weird setting of $/ wouldn't make that happen.

    You can try to use chop(), which trims ONLY the last character no matter what it is but, really, a look at the code would be a better choice: it's probably not chomp that eats your words. ;-) Michele.
      Thanks a lot for everybodies input. I'm not sure why the chomp function was not working correctly but it is working correctly now which solved most of the other problems I was having. I have one more question... Is there a way to match words with double letters.. ex keenness or mississippi? Also is there a way to match words that are in alphabetical order.. ex abby or abit? Thanks so much for your help. =0)
        For double letters:
        if ($word =~ /(.)\1/) ...
        All letters in alphabetical order (not regex):
        if (lc($word) eq join "", sort split //, lc($word));

        antirice    
        The first rule of Perl club is - use Perl
        The
        ith rule of Perl club is - follow rule i - 1 for i > 1