SayWhat?! has asked for the wisdom of the Perl Monks concerning the following question:
Hello! I asked this question in the Chatterbox, and although someone gave me a very good answer, I’m still stuck.. I have two text files – both of them comprising of a tab delimited Afrikaans and Dutch bilingual word list. Eg.: BilingualWorldList.txt net-net amper gedierte beest wêreld wereld alle alle regte rechten and: FalseFriends.txt 'n een Augustus augustus Christelik christelijk afdraaipad afrit//afslag afdraende helling What I would like to do, is the following: Enter both files into separate arrays
#!/usr/bin/perl-w use strict; use warnings; use open ':utf8'; open (INPUT1, "<FalseFriends.txt"); open (INPUT2, "<BilingualWordList.txt"); while (<INPUT1>) { my $line = $_; chomp $line; my @words = $line; } while (<INPUT2>) { my $line2 = $_; chomp $line2; my @words2 = $line2; }
Now I want to loop through both arrays and see if there are any matches between the two arrays. However, the match does not need to be 100%. Eg.: In BilingualWordList.txt, there could be something like: “beddinkie bedjie”, and in FalseFriends.txt, there could be: “beddinkie bedjie//perkje”. Thus, they would be a match (or a partial match, if you’d like). Or you could get “kombers deken” in both files, and they would also be a match. I tried the loop like this, but the only results I get are a bunch of 0’s. Why is this?
open (OUTPUT1, ">FF.txt"); open (OUTPUT2, ">Unmatched.txt"); for (my $falsefriend = 0; $ falsefriend <= $#words; $ falsefriend +++) { for (my $bilingualword = 0; $ bilingualword <= $#words; $ bili +ngualword++) { if ($falsefriend eq $ bilingualword) { print OUTPUT1 “$falsefriend"\n"; } else { print OUTPUT2 "$bilingualword\n"; } } }
Now I want to take Unmatched.txt and sort it into a hash, so that the Afrikaans words (column1) would be the keys, and the Dutch words (column2) the values. I then want to compare the keys to the values. If there is a 100% match, both the key and the value need to be written to IdenticalCognates.txt. How would I go about to do this?
UPDATE!!!Hello again! Thank you so much for your responses.. I tried a few things during the day, and decided on using hashes instead. I wrote this piece of code, which is supposed to compare the two input files. It executes - thus no real errors - but the Output is not what I longed for. The output is supposed to be a FalseFriends.txt file and an Unsorted.txt file. However, when the code is executed, I only get data in Unsorted.txt. And the data is exactly the first column of my BilingualWorldList input file. What am I doing wrong? Could anyone help me out, plese?
#!/usr/bin/perl-w use strict; #use warnings; use open ':utf8'; #open files open (FALSEF, "<SNonCognatesAndFF.txt"); open (BILWL, "<BilingualWordList.1.0.0.IW.2012-06-20.txt"); #declare hashes my %falsef; my %existingfalsefriend; #while the FF input exists while (<FALSEF>) { #assign each line to $line my $line = $_; #chomp off the new line chomp $line; #increment $line $falsef{$line}++; } #declare variables my $token; my %hash; #open output files open (OUTPUT1, ">YayOutputFalseFriends.txt"); open (OUTPUT2, ">AhhUnsortedWordList.txt"); #while input is received while (<BILWL>) { #assign each line to $line my $line = $_; #chomp off the new line chomp $line; #assign $line to the array my @wordlist = $line; #split/\t/, $line; #a for-loop to 'clean up' the words, to get rid of all the commas, + full stops, etc, except the apstrophes and hyphens for (my $x = 0; $x <= $#wordlist; $x++) { my $token = $wordlist[$x]; if ($token =~ /('?\w+)/) { #$word is now clean my $searchword = $1; #checks to see whether the word exists in the false friend +s list if (exists $hash{$searchword} || exists $falsef{$searchwor +d}) { my $existingfalsefriend; $existingfalsefriend{$searchword}++; } else { #print to unsorted.txt print OUTPUT2 "$searchword\n"; } } } } my $searchword; foreach my $searchword(sort keys %existingfalsefriend) { #sorts the matched words alphabetically my $value = $existingfalsefriend{$searchword}; print OUTPUT1 "$searchword\t $value\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Comparing arrays
by McA (Priest) on Jun 27, 2012 at 08:39 UTC | |
|
Re: Comparing arrays
by zeni (Beadle) on Jun 27, 2012 at 09:27 UTC | |
by zeni (Beadle) on Jun 27, 2012 at 09:34 UTC | |
by muba (Priest) on Jun 27, 2012 at 12:28 UTC | |
|
Re: Comparing arrays
by Athanasius (Archbishop) on Jun 28, 2012 at 07:40 UTC |