Q1: How do I rewrite my script so that I get paragraph-sized chunks getting sent to google regardless of line feed encoding?
You're right that paragraph mode ($/ = "") doesn't work right when reading a CRLF file on *NIX. You could enable the :crlf PerlIO layer, which will leave plain LF as is but convert CRLF to LF, and paragraph mode will work: open my $fh, '<:crlf', $file
By the way, to pick two nits: Module names in all lowercase are reserved (by convention) for pragmas, so I'd name your module Translate. Also, you're not checking your open for errors.
Q2: Do I really need all of this to extract one paragraph of translation?
In general I'd say get it fully working first, and leave the simplification of the code for a little later :-)
In reply to Re: chunking up texts correctly for online translation
by haukex
in thread chunking up texts correctly for online translation
by Aldebaran
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |