Ok, here is my script after some corrections:

#!/usr/bin/perl #This script is used to extract words not found in the dictionary file + from corpus data. For this, we use binary search. Linear source woul +d take too long and use too much resources. use strict; use warnings; #Use encode because of special characters. use encoding "utf8"; use open IN => "utf8"; use open OUT => "utf8"; binmode STDIN => "utf8"; binmode STDOUT => "utf8"; my $wordlist = shift; my @allwords; #array containing all dictionary words. #First, I open the dictionary file. I then push all words into the all +words array. open WORDLIST, $wordlist; while (<WORDLIST>){ chomp; s/\r//; my $word = $_; push (@allwords,$word) } close WORDLIST; #I then sort the array in alphabetic order. my @sorted_wordlist = sort {$a cmp $b} @allwords; #I create a subroutine to use binary search. sub binary_search { my ($array, $target) = @_; #set arguments for future use : $array will be the sorted wordlist a +nd $target, the word we will be looking for. my ($low, $high) = (0, @$array - 1); #Declare high and low indexes. Low index = 0 and high index = last i +ndex of the array. while ($low < $high) { # If high index is higher than the low index, + keep the window open. my $cur = int(($low+$high)/2); #Declare a middle, which is the tot +al of high index and low index /2. if ($array->[$cur] lt $target) { $low = $cur + 1; #If the target is too small, try lower. } elsif ($array->[$cur] gt $target) { $high = $cur - 1; #Else, try higher. } else{ return $cur; #Got it! } } return; #It doesn't exist. } # Open the corpus data. while (<>){ chomp; s/\r//; my $corpus_word = $_; #Declare the read line as a corpus word. my $index = binary_search (\@sorted_wordlist, $corpus_word); #use +the binary search to find the index if($index == 0){ #if index is not returned, then the word doesn't +exist. print "$corpus_word\n"; } else{ } }

It did give me some results, but :
1)I'm getting words that ARE in the dictionnary file.
2)it gives me the use of unitialized value in numeric eq (==).

Looking at the "Mastering algorithms with perl" O'Reilly Book, it turned out that I had to do $high = $cur - 1 and not +1, to adjust $high.
I don't know why I'm still getting words that exist in the dictionnary file, but I understand why I get the error message, but don't know how to make it cleaner. Any ideas ?

In reply to Re^6: Check word presence WITHOUT hashes or grep by gojippo
in thread Check word presence WITHOUT hashes or grep by gojippo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.