G'day csthflk,
Firstly, here's working code (written and run on Mac OS X) that does what you want.
See the Notes at the end for details of what I did differently and why.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use charnames ':full';
my $in_map = 'pm_unicode_1061453_map2.txt';
my $in_words = 'pm_unicode_1061453_greekwords1.txt';
my $out_greek = 'pm_unicode_1061453_greek_out.txt';
my $in_map_re = qr{^([^#]+)\s###[^#]+###\s([^#]+?)\s*$};
open my $in_map_fh, '<', $in_map;
my %uni_map = map { /$in_map_re/ ? ($1 => $2) : () } <$in_map_fh>;
close $in_map_fh;
open my $in_words_fh, '<', $in_words;
open my $out_greek_fh, '>:utf8', $out_greek;
while (<$in_words_fh>) {
chomp;
my @word_chars = split '';
my $greek_word = '';
my $key = '';
while (@word_chars) {
$key .= shift @word_chars;
next unless exists $uni_map{$key};
next if @word_chars && exists $uni_map{join '' => $key, $word_
+chars[0]};
$greek_word .= charnames::string_vianame($uni_map{$key});
$key = '';
}
die "Can't find charname for '$key'" if $key;
print $out_greek_fh "$greek_word\n";
}
close $in_words_fh;
close $out_greek_fh;
I downloaded the input files with wget.
They have the same line ending discrepancy that graff noted (above).
Here's the output.
There's some issues with posting Unicode code with <code>...</code> tags; I've used <pre>...</pre> tags here.
$ cat pm_unicode_1061453_greek_out.txt
Θεωροῦντες
δὲ
τὴν
τοῦ
Notes:
-
Use strict and warnings in all your scripts. Turn off a limited subset of their functionality, in a limited scope, when it's unwanted and you understand what you're doing and why.
-
I've used autodie to trap I/O errors.
I would recommend doing this, because it's much easier than the alternative and your script does not become littered with "... or die "Some custom message: $!;" code; if you choose not to do this, you'll need to handcraft every one of those yourself.
Just looking at your open statements: you don't check whether one of them (OUT) worked at all; the other two (MAP and IN) have "... or die "!";" ('!' should be '$!' and there's no message).
-
Use lexical filehandles and the 3-argument form of open.
See my code for examples and the doco for further examples and discussion.
-
map is often used to create a hash.
As you can see, it uses a lot less code than your while loop.
It's pretty straightforward, but ask if you don't understand some part of what I did here.
-
For generating the Unicode characters, I've used charnames::string_vianame().
This meant I didn't need an extra function (i.e. chr) to convert the code point to a string.
-
Note how I've only needed a single print statement to populate the output file.
Whenever you find yourself writing the same (near) identical code, consider whether there's a better algorithm; if not, use a subroutine (one place to make mistakes, fixes, enhancements, etc.).
-
Depending on far along you are with your project, and whether you have control of the map2.txt file, you might like to look at charnames: CUSTOM ALIASES which would allow you to get rid of all that mapping code completely and just replace "use charnames ':full';" with "use charnames ':alias' => 'file';". It's a little more complicated than that and explained in the doco.