Hello! I need some help, please. What I would like to do, is the following: I have two text files, BilingualWordList.txt and FalseFriendsList.txt - the columns are seperated by tabs. They look like this:

# BilingualWorldList.txt (Les't call it FileA) vriendelik aardig irriterend vervelend losieshuis pension eksamen examen goed braaf damwal dam water water rekenaar computer outoritêr outoritaire wêreld wereld alle alle word worden angesien overwegende erkenning erkenning afrigter trainer FalseFriendsList.txt (Let's call it FileB) vriendelik aardig goed braaf damwal dam bruinmens kleurling kamera fototoestel jammer sneu//spijten japon ochtendjas losieshuis pension buffer bumper bruinmens kleurling brulpadda brulkikker jammerlik zielig buffer bumper irriterend irritant//vervelend kameelperd giraf//giraffe

I want to take FileB, and with FileB search through FileA, looking for matches. The matches don't need to be 100% identical, though, because as you can see are these two entries almost exactly the same:

#FileA irriterend vervelend #FileB irriterend irritant//vervelend

Thus far, my code looks like this:

#!/usr/bin/perl-w use strict; #use warnings; use open ':utf8'; #open files open (FALSEF, "<FalseFriendsList.txt"); open (BILWL, "<BilingualWordList.txt"); #declare hashes my %falsef; my %existingfalsefriend; #while the FF input exists while (<FALSEF>) { #assign each line to $line my $line = $_; #chomp off the new line chomp $line; #increment $line $falsef{$line}++; } #declare variables my $token; my %hash; #open output files open (OUTPUT1, ">OutputFalseFriends.txt"); open (OUTPUT2, ">OutputUnsortedWordList.txt"); #while input is received while (<BILWL>) { #assign each line to $line my $line = $_; #chomp off the new line chomp $line; #assign $line to the array my @wordlist = split/\t/,$line; #a for-loop to 'clean up' the words, to get rid of all the commas, + full stops, etc, except the apstrophes and hyphens for (my $x = 0; $x <= $#wordlist; $x++) { my $token = $wordlist[$x]; if ($token =~ /(['\-\w]+)/) { #$word is now clean my $searchword = $1; #checks to see whether the word exists in the false friend +s list if (exists $hash{$searchword} || exists $falsef{$searchwor +d}) { $existingfalsefriend{$searchword}++; } else { #print to unsorted.txt print OUTPUT2 "$searchword\n"; } } } } my $searchword; foreach my $searchword(sort keys %existingfalsefriend) { #sorts the matched words alphabetically my $value = $existingfalsefriend{$searchword}; print OUTPUT1 "$searchword\t $value\n"; }

However, my output does not look like I want it to look. I want the matching lines to be written to OutputFalseFriends.txt, and the non-matching lines to be written to OutputUnsortedWorldList.txt, like this:

#OutputFalseFriends.txt vriendelik aardig losieshuis pension goed braaf damwal dam irriterend irritant//vervelend #OutputUnsortedWorldList.txt eksamen examen water water rekenaar computer outoritêr outoritaire wêreld wereld alle alle word worden angesien overwegende erkenning erkenning afrigter trainer

But OutputFalseFriends.txt is empty every time and OutputUnsortedWordList.txt contains my whole inputfile BilingualWorldList.txt, just with every word on its own line. A sample is shown here:

goed braaf naak bloot damwal dam kombers deken homoseksueel flikker bronstig geil munisipaliteit gemeente

Does anyone have any advice on how I can correct this, please?

!!!!!!!!!!!!!! UPDATE !!!!!!!!!!!!!!

I finally got my program to do what I wanted it to do! (Well, part 1 of the whole program I'm trying to code, that is.. :p) Here is my code (same input as before, obviously)

#!/usr/bin/perl-w use strict; use warnings; use open ':utf8'; use autodie; #open FILE B open (FALSEFRIENDINPUT, "<SNonCognatesAndFF.txt"); #declare hash my %fileb; #get a line from #FILEB while (my $line = <FALSEFRIENDINPUT>) { #chomp off the new line chomp $line; # split the line on tab my ($filebkeys, $filebvalues) = split /\t/, $line; $fileb{$filebkeys} = $filebvalues; #open output files open (OUTPUT1, ">OutputMatchedFalseFriends.txt"); open (OUTPUT2, ">OutputNonMatchedWords.txt"); #open FILE A open (BILINGUALWL, "<BilingualWordList.1.0.0.IW.2012-06-20.txt +"); my %filea; #get a line from #FILEA while (my $line = <BILINGUALWL> ) { chomp $line; #split the line on tab my ($fileakeys, $fileavalues) = split /\t/, $line; #do first columns match? if ($fileb{$fileakeys}) { #does the second column value contain the other as a s +ubstring? if ($fileb{$fileakeys} =~ /$fileavalues/ or $fileavalu +es =~ /$fileb{$fileakeys}/) { #if yes, print it to OutputMatchedFalseFriends.txt print OUTPUT1 "$line\n"; #loop to the next line next; } } else { #if not, print it to OutputNonMatchedWords.txt print OUTPUT2 "$line\n"; } } }

And here is my output:

#OutputMatchedFalseFriends.txt damwal dam bitsig vinnig bot been dikwels vaak aantreklik knap bees rund baas chef bestuur directie alles alles afrigter trainer #OutputNonMatchedWords.txt (only a sample of a 73 line output) vriendelik aardig polisieman agent net-net amper gedierte beest goed braaf naak bloot kombers deken homoseksueel flikker bronstig geil munisipaliteit gemeente menskop hoofd toedraai inpakken kiestand kies dierekop kop

I have only one question now, though.. Sometimes (and quite randomly) when I run my program, I get the following messages (only one at a time, on a rotating basis):

Can't open '>MatchedFalseFriends.txt' for writing: 'Invalid argument' +at Script.ExtractionofCognates.1.0.5.2012.06.28.pl line 25 #and Can't open '>OutputNonMatchedWords.txt' for writing: 'Invalid argument +' at Script.ExtractionofCognates.1.0.5.2012.06.28.pl line 25
<Why would this be? Does anyone know, maybe? But it doesn't hamper the output in any way..

Oh, and a big thank you to everyone for their input, code examples and opinions - especially Athanasius and aaron_baugher. I appreciate it. :)


In reply to Comparing / Searching through Hashes by SayWhat?!

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.