enable the :crlf PerlIO layer

Thx haukex, that worked. Even so, the input to google exceeded their rate limit, so I had to slow it down. I added sleep time and a means to keep track of how long a file takes to translate.

for my $file (@texts) { local $/ = ""; open my $fh, '<:crlf', $file or die; my $base_name = path("$file")->basename; my $out_file = path( $out_dir, $base_name )->touchpath; say "out_file is $out_file"; ## time it use Benchmark; my $t0 = Benchmark->new; while (<$fh>) { print "New Paragraph: $_"; my $r = get_trans( $wgt, $_ ); for my $trans_rh ( @{ $r->{data}->{translations} } ) { my $result = $trans_rh->{translatedText}; say "result is $result "; my @lines = split /\n/, $result; push @lines, "\n"; path("$out_file")->append_utf8(@lines); sleep(1); } } my $t1 = Benchmark->new; my $td = timediff( $t1, $t0 ); print "$file took:", timestr($td), "\n"; sleep(3); close $fh;

84-0.txt is Shelley's Frankenstein, which is about 450 k in length. Of the $300 credit they give anyone to sign up for their API, I used 7 cents of it, so I'm down to $297.22 left. It made for an interesting way to skim both the original and the translation. This ballparks 20 minutes as an outer limit:

/home/bob/Documents/meditations/castaways/Translate1/data/84-0.txt too +k:1180 wallclock secs (23.34 usr + 1.36 sys = 24.70 CPU) $

Q3: What do the usr and sys numbers mean?

Module names in all lowercase are reserved (by convention) for pragmas, so I'd name your module Translate. Also, you're not checking your open for errors.

I did fix both of these but went with Translate1 . The reason I did this is that I know there is going to be a Translate2 that will not work with Translate1. I've heard such naming called "trampolining," and something to be avoided. Q4: Am I supposed to not have such collisions using version numbers or clever use of git? The features of the package change quickly, and sometimes, I have to roll back to something that actually worked.

I found that I had to go back to make clean every time I made a change in the script, so I wrote a little helper bash script:

$ cat 1.google.sh #!/bin/bash pwd make clean perl Makefile.PL make make test make install ls cd blib cd script ./3.my_script.pl $

I offer this as a keystroke reduction mechanism, not wanting to be OT.

The translations went well with the exception of certain characters. Let's look at a couple paragraphs with differing tags. Here is output with pre tags

New Paragraph: €œAre you mad, my friend?€ said he. €œOr whither does your
senseless curiosity lead you? Would you also create for yourself and the
world a demoniacal enemy? Peace, peace! Learn my miseries and do not seek
to increase your own.€

result is - Ты злишься, друг мой? - спросил он. Или куда ты
бессмысленное любопытство приведет тебя? Не могли бы вы также создать для себя и
мир демонический враг? Мир, мир! Узнай мои страдания и не ищи
увеличить свой собственный. 

 
New Paragraph: Frankenstein discovered that I made notes concerning his history; he asked
to see them and then himself corrected and augmented them in many places,
but principally in giving the life and spirit to the conversations he held
with his enemy. €œSince you have preserved my narration,€ said
he, €œI would not that a mutilated one should go down to
posterity.€

result is Франкенштейн обнаружил, что я делал заметки, касающиеся его истории; он спросил
чтобы увидеть их, а затем сам исправить и дополнить их во многих местах,
но главным образом в том, чтобы дать жизнь и дух разговорам, которые он вел
со своим врагом. "Так как вы сохранили мое повествование", сказал
он, Я бы не хотел, чтобы изуродованный
posterity.

Here is what the 1st paragraph looks like in code tags:

New Paragraph: &#128;&#156;Are you mad, my friend?&#128; said he. +&#128;&#156;Or whither does your senseless curiosity lead you? Would you also create for yourself and t +he world a demoniacal enemy? Peace, peace! Learn my miseries and do not s +eek to increase your own.&#128;

For some reason, Shelley quotes paragraphs as a matter of course, and they are getting garbled as I read in under these conditions:

#!/usr/bin/perl -w use 5.011; use WWW::Google::Translate; use Data::Dumper; use open OUT => ':utf8'; use Path::Tiny; use lib "."; use translate; binmode STDOUT, 'utf8'; use POSIX qw(strftime);

Google sometimes gives the correct rendering of quotes in russian. They do it somewhat like this: << >> .

Q5: How do I change my script so that these characters are rendered correctly? They look right as I read them in gedit.

Finally, as I look at the arguments in Makefile.Pl:

my %WriteMakefileArgs = ( NAME => 'Translate1', AUTHOR => q{gilligan <gilligan@island.coconut>}, VERSION_FROM => 'lib/Translate1.pm', LICENSE => 'artistic_2', MIN_PERL_VERSION => '5.006', CONFIGURE_REQUIRES => { 'ExtUtils::MakeMaker' => '0', }, TEST_REQUIRES => { 'Test::More' => '0', }, PREREQ_PM => { #'ABC' => '1.6', #'Foo::Bar::Module' => '5.0401', }, EXE_FILES => ['lib/3.my_script.pl'], dist => { COMPRESS => 'gzip -9f', SUFFIX => 'gz', }, clean => { FILES => 'Translate1-*' }, );

Q6: How would I determine which version of WWW::Google::Translate to require?

Thank you for your comments,


In reply to Re^2: chunking up texts correctly for online translation by Aldebaran
in thread chunking up texts correctly for online translation by Aldebaran

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.