better has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I'm trying to compare two arrays. I know there are many ways to go: After trying, I ended up with use List::MoreUtils qw {any};

It works fine, in a testing environment, with two small arrays defined within the script

my @cleanwords = qw (hut hat); my @allwords = qw (hit het hat); foreach my $var (@cleanwords) { if (any { $var eq $_} @allwords) { print "Found: ",$var, "\n"; } else { print $var," not found\n"; } }

But the process doesn't work, after reading a textfile (list of words) into an array called @allwords. For reading I use:

use File::Slurp::Tiny 'read_file'; my @allwords = read_file ($ref);

The array is definetely filled with all the words of the textfile. But no matches are found.

I think the problem might be, the whole list is probably "slurped" into one element of the array only.

 print scalar @allwords;

reports back: '1'

But why are all words printed nicely one beneath the other doing:

foreach (@allwords){ print "allwords: ",$_,"\n"; }

Result is:

allwords: hastig

hastige

hastigem

hastigen

hastiger

hat

???

It looks almost like a display of sperate elements. Alomost, because 'allwords:'should be printed in front of every word.

Well, I think I have to learn more about creating arrays from a file.

It's a special challenge, regarding the functionality of the script, because @allwords should contain all words of the German language

The idea behind is to check, which element of a list of random letter combinations of varying size is a proper German word

Any helpful comments appreciated

Horst

P.S.: A step forward: I tried IO::ALL and scalar @allwords reports 10 elements, means all words of the textfile are handed over into the array. But still no matches found.

Replies are listed 'Best First'.
Re: Search element of array in another array
by toolic (Bishop) on Mar 26, 2015 at 19:43 UTC
    I think the problem might be, the whole list is probably "slurped" into one element of the array only.
    Yes. According to File::Slurp::Tiny, read_file:
    Reads file $filename into a scalar. By default it returns this scalar.

    One way is to split:

    my @allwords = split /\n/, read_file($ref);
Re: Search element of array in another array
by AnomalousMonk (Archbishop) on Mar 26, 2015 at 23:25 UTC

    Once you have your list of dictionary words read from its file and all cleaned up (whitespace, newlines, etc. fixed up), the next step is to realize that very fast lookup of this sort can be had from a hash:

    c:\@Work\Perl>perl -wMstrict -le "use Data::Dump; ;; my @cleanwords = qw(hut Hat foo HIC); my @allwords = qw(hit Het HAT HiC hAc hoc); ;; my %dict = map { $_ => 1 } map canonicalize($_), @allwords ; dd \%dict; ;; for my $word (@cleanwords) { my $common = canonicalize($word); printf qq{word '$word' %sin dictionary \n}, exists $dict{$common} ? '' : 'NOT '; } ;; sub canonicalize { return lc $_[0]; } " { hac => 1, hat => 1, het => 1, hic => 1, hit => 1, hoc => 1 } word 'hut' NOT in dictionary word 'Hat' in dictionary word 'foo' NOT in dictionary word 'HIC' in dictionary
    This approach works very simply and quickly for dictionary sizes up to a few tens of million words — but how many words does German have, anyway?

    Update: Added a few more words to both word lists in example code.


    Give a man a fish:  <%-(-(-(-<

Re: Search element of array in another array
by Anonymous Monk on Mar 26, 2015 at 19:47 UTC

    See the Basic debugging checklist: use Data::Dumper or Data::Dump to see what your data structures really look like. I'm guessing you will indeed find that your array contains only one element. The reason seems to be in the File::Slurp::Tiny docs: the documentation for read_file says: "Reads file $filename into a scalar. By default it returns this scalar.", implying that the entire file is read and returned as one scalar. You should probably try using the read_lines function instead. You should probably also turn on the chomp option.

      Thanks Anonymous Monk pointing to Data::Dumper. I checked both arrays

      The result show clearly, that the structure of the elements is different. An example:

      print Dumper @allwords;

      ...

      $VAR9 = 'hat

      ';

      ...

      print Dumper @cleanwords;

      ...

      $VAR6 = 'hat';

      ...

      I selected the word which should match. In @allwords the single quotation mark appears in a new line. Maybe it's caused by newline? and I should chomp the elements?

      I chomped it like this:

      chomp (@allwords);

      Result:

      ';AR9 = 'hat

      The single quotation mark of the predecessing word appears in the sam line.

        Please mark updates to your nodes, see here for why.

        I chomped it like this: ... ';AR9 = 'hat ... The single quotation mark of the predecessing word appears in the sam line.

        I'd guess the file is terminated by CRLF and your input record separator $/ is a plain LF. One approach to fix this is to set $/ = "\r\n"; before reading the file and the chomp. If you were using a regular open, you could use the :crlf I/O layer: open my $handle, '<:crlf', 'filename.txt' or die $!; (see PerlIO).

        Maybe it's caused by newline? and I should chomp the elements?

        Yes and yes! :-)

        As mentioned below, try setting $Data::Dumper::Useqq=1; before calling Dumper.

      You are right.I tried

      read_lines

      before. But it didn't work either. Thanks for your hint.

Re: Search element of array in another array
by Anonymous Monk on Mar 26, 2015 at 19:55 UTC
    P.S.: A step forward: I tried IO::ALL and scalar @allwords reports 10 elements ... But still no matches found.

    Perhaps you need to chomp @allwords?

    Again, Data::Dumper (especially its Useqq option) or Data::Dump are useful to see the actual strings.

    BTW, be careful with IO::All, it is easy to create ETOOMUCHMAGIC errors with it...

      $Data::Dumper::Useqq = 1;

      This is really amazing!

      @allwords

      $VAR9 = "hat\r";

      @cleanwords

      $VAR6 = "hat";

      How can I get rid of the \r? I vaguely remember I had this problem before...

        Probably this:
        s/\r//g;
        (if the variable if contained in $_), or, otherwise, something like this:
        $var =~ s/\r//g;

        Je suis Charlie.