A couple minor points about this script (unrelated to the main theme of the thread):

First, @ARGV is your friend -- use it to get input and output file names from the command line. Here's one way to do it:

my $Usage = "Usage: $0 infile outfile\n"; # open input and output files die $Usage unless ( @ARGV == 2 ); open( IN, $ARGV[0] ) or die "Unable to read $ARGV[0]: $!\n$Usage"; open( OUT, ">$ARGV[1]" ) or die "Unable to write $ARGV[1]: $!\n$Usage" +; ...
You have problems in both of your "until (open(...))" loops, which would be avoided if you use @ARGV (because you don't need those loops at all). In your first "until" loop, if there ever really is a failure to open the output file, there's no exit from that loop -- not good. As for the second one (for getting an input file name), you forgot to "chomp" the user input that you read inside the loop, which means the loop will never succeed (unless a file name happens to contain a final newline character) -- also not good.

For that matter, you could do without open statements altogether -- just use  while (<>) to read input (from a named file or from stdin), and just print to STDOUT. Let the users decide if/when to redirect these to or from a disk file (e.g. as opposed to piping data to/from other processes):

converter.pl < some.input > some.output # or some_process | converter.pl | another_process # or any combination of the above...

As for the main "while()" loop, it can be expressed more compactly without loss of clarity:

while (<IN>) { my @chars = split //; for (@chars) { # $_ now holds one char per iteration my $out = ( exists $name{$_} ) ? $name{$_} : $_; print $out; } }

Finally, you may want to look at "perldoc enc2xs", which gives a nice clear explanation about how to roll your own encoding modules that can be used in combination with Encode (i.e. on a par with "iso-8859-1" or "koi8-r"), to convert back and forth bewteen Unicode and your own particular non-Unicode character set. It's actually pretty simple, provided that your mapping involves just one character of output for each character of input (which is not true for the OP that started this thread, unfortunately).

If you're the same Anonymous Monk who posted the first reply to the script, I don't expect this will help with the problem you mentioned (only handling small files) -- maybe you need to start your own SoPW thread on that...


In reply to Re: Re: Orthography Translation using Regex by graff
in thread Orthography Translation using Regex by Baz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.