Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I've been developing a primitive package that translates along the lines of _Intermediate Perl_, specifically using module-starter and customizing an appropriate Makefile.PL . I may have put files in loopy places due to inexperience, but I am getting preliminary results. Of my 3 test files, 2 of 3 are translated faithfully. I have output and source to show, to motivate a couple questions:
Here are the driver script and the package:
#!/usr/bin/perl -w use 5.011; use WWW::Google::Translate; use Data::Dumper; use open OUT => ':utf8'; use Path::Tiny; use lib "."; use translate; binmode STDOUT, 'utf8'; use POSIX qw(strftime); ### values to initialize (customize these to suit) my $ini_path = qw( /home/bob/Documents/html_template_data/3.values.ini + ); my $sub_hash = "google"; my ( $from, $to ) = ( 'en', 'ru' ); #put defaults here my $input_directory = qw( /home/bob/Documents/meditations/castaways/translate/data ); my $output_appendage = "output"; ## get values for google from an .ini file my $key = get_config( $ini_path, $sub_hash ); say "Would you like to see the possibilities?"; my $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { show_lang_codes(); } say "Would you like to change the from language?"; $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { $from = get_lang($from); } say "Would you like to change the to language?"; $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { $to = get_lang($to); } # create output directory say "Creating output directory as nephew of input"; say "using localtime for uniqueness"; my $munge = strftime( "%d-%m-%Y-%H-%M-%S\.txt", localtime ); my $parent = path($input_directory)->parent; my $out_dir = path( $parent, $output_appendage, $munge ); my $wgt = WWW::Google::Translate->new( { key => $key, default_source => $from, default_target => $to, } ); my @texts = path("$input_directory")->children(qr/\.txt$/); say "texts are @texts"; for my $file (@texts) { local $/ = ""; open my $fh, '<', $file; my $base_name = path("$file")->basename; my $out_file = path( $out_dir, $base_name )->touchpath; say "out_file is $out_file"; while (<$fh>) { print "New Paragraph: $_"; my $r = get_trans( $wgt, $_ ); for my $trans_rh ( @{ $r->{data}->{translations} } ) { #print $trans_rh->{translatedText}, "\n"; my $result = $trans_rh->{translatedText}; say "result is $result "; my @lines = split /\n/, $result; push @lines, "\n"; path("$out_file")->append_utf8(@lines); } } close $fh; }
package translate; use 5.006; use strict; use warnings; require Exporter; our @ISA = qw(Exporter); our @EXPORT = qw( get_config get_trans get_lang show_lang_codes rever +se_trans); our $VERSION = '0.01'; =head1 SYNOPSIS use translate; my $key = get_config('path-to-ini-file', $sub_hash); my $from = get_lang($from_default); my $to = get_lang($to_default); my $trans_output_file = get_trans($input_file, $from, $to, $key); my $reverse = reverse_trans($trans_output_file, $to, $from, $key); =cut sub get_config { use Config::Tiny; use Data::Dumper; use open OUT => ':encoding(UTF-8)'; use Path::Tiny; use 5.011; my ( $ini_path, $sub_hash ) = @_; say "ini path is $ini_path"; say "sub_hash is $sub_hash"; my $Config = Config::Tiny->new; $Config = Config::Tiny->read( $ini_path, 'utf8' ); say Dumper $Config; my $key = $Config->{$sub_hash}{'api_key_1'}; return $key; } sub get_lang { use Path::Tiny; use 5.011; my $lang = shift; say "Would you like to change languages?"; my $prompt1 = <STDIN>; chomp $prompt1; if ( $prompt1 eq ( "y" | "Y" ) ) { say "enter new lang: "; $prompt1 = <STDIN>; chomp $prompt1; $lang = $prompt1; } return $lang; } sub show_lang_codes { use Path::Tiny; use 5.011; my $path_to_langs = path( "my_data", "lang_data", "1.langlist.txt" ) +; my $data = $path_to_langs->slurp_utf8; say "$data"; } sub get_trans { use Path::Tiny; use 5.011; use WWW::Google::Translate; my ( $wgt, $paragraph ) = @_; my $r = $wgt->translate( { q => $paragraph } ); return $r; } 1; # End of translate
This is typical output for my first two input files:
New Paragraph: Did the last version also pass all tests? Were the changes required? Were new tests added to cover the changes? result is Последняя версия также прошла все тесты? Были ли необходимы изменения? Были ли добавлены новые тесты для покрытия изменений?
It chunks up nicely and formats well in the output file too.
My final input file did not. It is Shelley's Frankenstein from Gutenberg texts online. With my input record separator as it is, it seems to slurp in the entire book at once:
, source => en, format => text unsuccessful translate POST for 450783 bytes: Request payload size exc +eeds the limit: 204800 bytes. check that BOB-THINKPAD-SL510 has API Access for this API key at https://console.developers.google.com/cloud-resource-manager $
I took a look at it in the hex editor, and the Shelley text has
0D 0Aas line feeds, while the first two have 0A . Is this not the Unix versus windows line feed problem? If so, it must be a well-worn path.
Q1: How do I rewrite my script so that I get paragraph-sized chunks getting sent to google regardless of line feed encoding?
my $r = get_trans( $wgt, $_ ); for my $trans_rh ( @{ $r->{data}->{translations} } ) { #print $trans_rh->{translatedText}, "\n"; my $result = $trans_rh->{translatedText}; say "result is $result "; my @lines = split /\n/, $result; push @lines, "\n"; path("$out_file")->append_utf8(@lines); }
Q2: Do I really need all of this to extract one paragraph of translation?
Thanks for your comment,
|
|---|