Hello Monks,

I've been developing a primitive package that translates along the lines of _Intermediate Perl_, specifically using module-starter and customizing an appropriate Makefile.PL . I may have put files in loopy places due to inexperience, but I am getting preliminary results. Of my 3 test files, 2 of 3 are translated faithfully. I have output and source to show, to motivate a couple questions:

Here are the driver script and the package:

#!/usr/bin/perl -w use 5.011; use WWW::Google::Translate; use Data::Dumper; use open OUT => ':utf8'; use Path::Tiny; use lib "."; use translate; binmode STDOUT, 'utf8'; use POSIX qw(strftime); ### values to initialize (customize these to suit) my $ini_path = qw( /home/bob/Documents/html_template_data/3.values.ini + ); my $sub_hash = "google"; my ( $from, $to ) = ( 'en', 'ru' ); #put defaults here my $input_directory = qw( /home/bob/Documents/meditations/castaways/translate/data ); my $output_appendage = "output"; ## get values for google from an .ini file my $key = get_config( $ini_path, $sub_hash ); say "Would you like to see the possibilities?"; my $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { show_lang_codes(); } say "Would you like to change the from language?"; $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { $from = get_lang($from); } say "Would you like to change the to language?"; $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { $to = get_lang($to); } # create output directory say "Creating output directory as nephew of input"; say "using localtime for uniqueness"; my $munge = strftime( "%d-%m-%Y-%H-%M-%S\.txt", localtime ); my $parent = path($input_directory)->parent; my $out_dir = path( $parent, $output_appendage, $munge ); my $wgt = WWW::Google::Translate->new( { key => $key, default_source => $from, default_target => $to, } ); my @texts = path("$input_directory")->children(qr/\.txt$/); say "texts are @texts"; for my $file (@texts) { local $/ = ""; open my $fh, '<', $file; my $base_name = path("$file")->basename; my $out_file = path( $out_dir, $base_name )->touchpath; say "out_file is $out_file"; while (<$fh>) { print "New Paragraph: $_"; my $r = get_trans( $wgt, $_ ); for my $trans_rh ( @{ $r->{data}->{translations} } ) { #print $trans_rh->{translatedText}, "\n"; my $result = $trans_rh->{translatedText}; say "result is $result "; my @lines = split /\n/, $result; push @lines, "\n"; path("$out_file")->append_utf8(@lines); } } close $fh; }
package translate; use 5.006; use strict; use warnings; require Exporter; our @ISA = qw(Exporter); our @EXPORT = qw( get_config get_trans get_lang show_lang_codes rever +se_trans); our $VERSION = '0.01'; =head1 SYNOPSIS use translate; my $key = get_config('path-to-ini-file', $sub_hash); my $from = get_lang($from_default); my $to = get_lang($to_default); my $trans_output_file = get_trans($input_file, $from, $to, $key); my $reverse = reverse_trans($trans_output_file, $to, $from, $key); =cut sub get_config { use Config::Tiny; use Data::Dumper; use open OUT => ':encoding(UTF-8)'; use Path::Tiny; use 5.011; my ( $ini_path, $sub_hash ) = @_; say "ini path is $ini_path"; say "sub_hash is $sub_hash"; my $Config = Config::Tiny->new; $Config = Config::Tiny->read( $ini_path, 'utf8' ); say Dumper $Config; my $key = $Config->{$sub_hash}{'api_key_1'}; return $key; } sub get_lang { use Path::Tiny; use 5.011; my $lang = shift; say "Would you like to change languages?"; my $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { say "enter new lang: "; $prompt1 = <STDIN>; chomp $prompt1; $lang = $prompt1; } return $lang; } sub show_lang_codes { use Path::Tiny; use 5.011; my $path_to_langs = path( "my_data", "lang_data", "1.langlist.txt" ) +; my $data = $path_to_langs->slurp_utf8; say "$data"; } sub get_trans { use Path::Tiny; use 5.011; use WWW::Google::Translate; my ( $wgt, $paragraph ) = @_; my $r = $wgt->translate( { q => $paragraph } ); return $r; } 1; # End of translate

This is typical output for my first two input files:

New Paragraph:         Did the last version also pass all tests? Were the changes required? Were new tests added to cover the changes?

result is         Последняя версия также прошла все тесты? Были ли необходимы изменения? Были ли добавлены новые тесты для покрытия изменений?

It chunks up nicely and formats well in the output file too.

My final input file did not. It is Shelley's Frankenstein from Gutenberg texts online. With my input record separator as it is, it seems to slurp in the entire book at once:

, source => en, format => text unsuccessful translate POST for 450783 bytes: Request payload size exc +eeds the limit: 204800 bytes. check that BOB-THINKPAD-SL510 has API Access for this API key at https://console.developers.google.com/cloud-resource-manager $

I took a look at it in the hex editor, and the Shelley text has

0D 0A

as line feeds, while the first two have 0A . Is this not the Unix versus windows line feed problem? If so, it must be a well-worn path.

Q1: How do I rewrite my script so that I get paragraph-sized chunks getting sent to google regardless of line feed encoding?

my $r = get_trans( $wgt, $_ ); for my $trans_rh ( @{ $r->{data}->{translations} } ) { #print $trans_rh->{translatedText}, "\n"; my $result = $trans_rh->{translatedText}; say "result is $result "; my @lines = split /\n/, $result; push @lines, "\n"; path("$out_file")->append_utf8(@lines); }

Q2: Do I really need all of this to extract one paragraph of translation?

Thanks for your comment,


In reply to chunking up texts correctly for online translation by Aldebaran

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.