The first entry in the list always refers to the correct spelling. I would like to find the mistakes in the entries 1-3, checking against the reference word. I would like to obtain an output like this:$words[0] = "believe"; $words[1] = "beleive"; $words[2] = "beeliv"; $words[3] = "pelief";
So far I have written a very long and clumsy code, which I changed several times. I paste it here, but it actually does not work (the output is also very different from what I would like to have):0-1: ie ~ ei 0-2: e ~ ee; ie ~ i; v ~ 0-3: b ~ p; ve ~ f
$words[0] = "believe"; $words[1] = "beleive"; $words[2] = "beeliv"; $words[3] = "pelief"; $reference_word = $words[0]; for ($n = 1; $n<$#words; $n++) { $z = 0; $l_count = 0; $r_count = 0; $l_common = ""; $r_common = ""; @char_a = split (//, $words[0]); @char_b = split (//, $words[$n]); #finding the largest part in common between two words on the le +ft for ($i=0;$i<=$#char_a;$i++) { #for ($j=0;$j<=$#char_b;$j++) { if ($char_a[$i] eq $char_b[$i]) { $l_count++; $l_common = $l_common.$char_a[$i]; ; } else { last } #} } #finding the largest part in common between two words on the r +ight #check parity of elements in the arrays if ($#char_a > $#char_b) { print "---PARITY BROKEN\n"; $diff = $#char_a > $#char_b; for ($k=1;$k<=$diff;$k++) { unshift (@char_b, "#") } } for ($i=$#char_a;$i>=0;$i--) { #for ($j=$#char_b;$j>=0;$j--) { if ($char_a[$i] eq $char_b[$i]) { $r_count++; $r_common = $r_common.$char_a[$i]; } else { last } #} } $r_common = reverse $r_common; print "$words[$n] ~ $words[$m] -> L_COMMON: >>$l_common<< -- R_COMM +ON: >>$r_common<< L_COUNT: $l_count - R_COUNT: $r_count\n"; if ($l_count ne $total_char) { $lenght_n = length($words[$n]); $lenght_m = length($words[$m]); $diff = ""; #print "1 -- TOTAL_CHAR: $total_char -- L_COUNT: $l_count\n"; #CASE1: magillum ~ magilla -> l_count= 6 r_count = 0 -> um ~ a +--- also ibilam ~ igilu if (!$r_common) { $xx = $total_char - $l_count; print "CASE1 -- TOTAL_CHAR: $total_char -- L_COUNT: $l_count +-- R_COUNT IS 0 -- TOT-LEFT: $xx\n"; $var1 = substr ($words[$n], $l_count); $var2 = substr ($words[$m], $l_count); $diff = $var1."~".$var2; $difference[$z] = "RIGHT_".$diff; print "CASE1 DIFFERENCE: $difference[$z] --- Z = $z\n"; $z++; $length_var1 = length ($var1); $length_var2 = length ($var2); if ($length_var1 > 2 || $length_var2 >2) { print "CASE1: LONG SEQUENCE FOUND IN VAR1 OR VAR2 --- L +ENGTH_VAR1 = $length_var1 LENGTH_VAR2 = $length_var2\n"; #chopping first and last characters from var1 and var2 +#at this point we know that they do not match, ex. bilam ~ gilu $left_var = substr ($var1, 0, 1)."~".substr ($var2, 0, + 1); $right_var = substr ($var1, -1)."~".substr ($var2, -1) +; $difference[$z-1] ="LEFT_$left_var"; $difference[$z] ="RIGHT_$right_var"; $words[$n] = substr ($var1, 1, -1); $words[$m] = substr ($var2, 1, -1); $z++; foreach $d (@difference) { print "-----NEW DIFFERENCE:$d\n"; } goto START; } } } #CASE2: zahadin ~ sumhadin -> l_count = 0 r_count = 5 if (!$l_common) { $xx = $total_char - $r_count; print "CASE2 -- TOTAL_CHAR: $total_char -- R_COUNT: $r_count +-- TOT-LEFT: $xx\n"; $var1 = substr ($words[$n], -$lenght_n, -($r_count)); $var2 = substr ($words[$m], -$lenght_m, -($r_count)); $diff = $var1."~".$var2; $difference[$z] = $diff; print "CASE2 DIFFERENCE: $difference[$z] --- Z = $z\n"; $z++; $length_var1 = length ($var1); $length_var2 = length ($var2); if ($length_var1 > 2 || $length_var2 >2) { print "CASE2: LONG SEQUENCE FOUND IN VAR1 OR VAR2 --- L +ENGTH_VAR1 = $length_var1 LENGTH_VAR2 = $length_var2\n"; #chopping first and last characters from var1 and var2 +#at this point we know that they do not match, ex. $left_var = substr ($var1, 0, 1)."~".substr ($var2, 0, + 1); $right_var = substr ($var1, -1)."~".substr ($var2, -1) +; $difference[$z-1] ="$left_var"; $difference[$z] ="$right_var"; $words[$n] = substr ($var1, 1, -1); $words[$m] = substr ($var2, 1, -1); $z++; foreach $d (@difference) { print "-----NEW DIFFERENCE:$d\n"; } goto START; } } #CASE3: ibila ~ igila -> l_count = 1 r_count = 3 if (($r_common) && ($l_common)) { print "CASE3 -- TOTAL_CHAR: $total_char -- R_COUNT: $r_count +-- TOT-LEFT: $xx\n"; $var1 = substr ($words[$n], $l_count, ($lenght_n - $r_count - + $l_count)); $var2 = substr ($words[$m], $l_count, ($lenght_m - $r_count - + $l_count)); $diff = $var1."~".$var2; $difference[$z] = $diff; print "CASE3 DIFFERENCE: $difference[$z] --- Z = $z\n"; $z++; $length_var1 = length ($var1); $length_var2 = length ($var2); if ($length_var1 > 2 || $length_var2 >2) { print "CASE2: LONG SEQUENCE FOUND IN VAR1 OR VAR2 --- L +ENGTH_VAR1 = $length_var1 LENGTH_VAR2 = $length_var2\n"; #chopping first and last characters from var1 and var2 +#at this point we know that they do not match, ex. $left_var = substr ($var1, 0, 1)."~".substr ($var2, 0, + 1); $right_var = substr ($var1, -1)."~".substr ($var2, -1) +; $difference[$z-1] ="$left_var"; $difference[$z] ="$right_var"; $words[$n] = substr ($var1, 1, -1); $words[$m] = substr ($var2, 1, -1); $z++; foreach $d (@difference) { print "-----NEW DIFFERENCE:$d\n"; } goto START; } } foreach $element (@difference) { print "ELEMENT-->>$element<<-\n"; } }
In reply to Help finding mistakes in spellings using Perl by shamat
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |