G'day csthflk,
Firstly, here's working code (written and run on Mac OS X) that does what you want.
See the Notes at the end for details of what I did differently and why.
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use charnames ':full';
my $in_map = 'pm_unicode_1061453_map2.txt';
my $in_words = 'pm_unicode_1061453_greekwords1.txt';
my $out_greek = 'pm_unicode_1061453_greek_out.txt';
my $in_map_re = qr{^([^#]+)\s###[^#]+###\s([^#]+?)\s*$};
open my $in_map_fh, '<', $in_map;
my %uni_map = map { /$in_map_re/ ? ($1 => $2) : () } <$in_map_fh>;
close $in_map_fh;
open my $in_words_fh, '<', $in_words;
open my $out_greek_fh, '>:utf8', $out_greek;
while (<$in_words_fh>) {
chomp;
my @word_chars = split '';
my $greek_word = '';
my $key = '';
while (@word_chars) {
$key .= shift @word_chars;
next unless exists $uni_map{$key};
next if @word_chars && exists $uni_map{join '' => $key, $word_
+chars[0]};
$greek_word .= charnames::string_vianame($uni_map{$key});
$key = '';
}
die "Can't find charname for '$key'" if $key;
print $out_greek_fh "$greek_word\n";
}
close $in_words_fh;
close $out_greek_fh;
I downloaded the input files with wget.
They have the same line ending discrepancy that graff noted (above).
Here's the output.
There's some issues with posting Unicode code with <code>...</code> tags; I've used <pre>...</pre> tags here.
$ cat pm_unicode_1061453_greek_out.txt
Θεωροῦντες
δὲ
τὴν
τοῦ
Notes:
-
Use strict and warnings in all your scripts. Turn off a limited subset of their functionality, in a limited scope, when it's unwanted and you understand what you're doing and why.
-
I've used autodie to trap I/O errors.
I would recommend doing this, because it's much easier than the alternative and your script does not become littered with "... or die "Some custom message: $!;" code; if you choose not to do this, you'll need to handcraft every one of those yourself.
Just looking at your open statements: you don't check whether one of them (OUT) worked at all; the other two (MAP and IN) have "... or die "!";" ('!' should be '$!' and there's no message).
-
Use lexical filehandles and the 3-argument form of open.
See my code for examples and the doco for further examples and discussion.
-
map is often used to create a hash.
As you can see, it uses a lot less code than your while loop.
It's pretty straightforward, but ask if you don't understand some part of what I did here.
-
For generating the Unicode characters, I've used charnames::string_vianame().
This meant I didn't need an extra function (i.e. chr) to convert the code point to a string.
-
Note how I've only needed a single print statement to populate the output file.
Whenever you find yourself writing the same (near) identical code, consider whether there's a better algorithm; if not, use a subroutine (one place to make mistakes, fixes, enhancements, etc.).
-
Depending on far along you are with your project, and whether you have control of the map2.txt file, you might like to look at charnames: CUSTOM ALIASES which would allow you to get rid of all that mapping code completely and just replace "use charnames ':full';" with "use charnames ':alias' => 'file';". It's a little more complicated than that and explained in the doco.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.