in reply to Re: De-googleizing translation scripts
in thread De-googleizing translation scripts

Nowadays DeepL is considered to produce better results than Google Translate. It even has an HTTP API, for which you can register for free (for some value of free):

Thx for your reply, kikuchiyo, I think this is gonna work out for me. They do make you put a credit card on record, but I'm willing to offer that kind of skin in this game. DeepL seems more trustworthy than Google. I'm excited to see what capabilities this service can provide.

I don't know about this "trans" command of yours, but coding a wrapper script for a HTTP API is trivial in Perl.

Well, I don't know about "trivial." Maybe for corion and bliako, but I'm a garden-variety human who fumbles the ball and needs to consult. I was proud that I remembered corion's curl converter, from which I got this:

#!perl use strict; use warnings; use HTTP::Tiny; my $ua = HTTP::Tiny->new( 'verify_SSL' => '1' ); my $res = $ua->request( 'POST' => 'https://api-free.deepl.com/v2/translate', { headers => { 'Authorization' => 'DeepL-Auth-Key redacted', 'Content-Length' => '37', 'Accept' => '*/*', 'Content-Type' => 'application/x-www-form-urlencoded', 'User-Agent' => 'curl/7.55.1' }, content => "text=Hello\x252C\x2520world!&target_lang=DE" }, ); __END__ Created from curl command line curl -X POST 'https://api-free.deepl.com/v2/translate' -H 'Author +ization: DeepL-Auth-Key redacted' -d 'text=Hello%2C%20world!' + -d 'target_lang=DE'

But I run into trouble decoding the json:

fritz@laptop:~/Documents$ ./3.trans.pl Hello neighbor on Watercress lane, {"translations":[{"detected_source_language":"EN","text":"Hola vecino +de Watercress lane,"}]}content is {"translations":[{"detected_source_ +language":"EN","text":"Hola vecino de Watercress lane,"}]} data is HASH(0x55fbf7cf4140) ... Anyways, I start getting letters saying that I have not complied with +this declaration, which had the bizarre predicate that we had to come + to their residence to prove that we had complied. One thing I can pr +omise you: I will never cross their threshold, because I don't want t +o know them at all based on what they stuffed into my mailbox. {"translations":[{"detected_source_language":"EN","text":"De todos mod +os, empiezo a recibir cartas diciendo que no he cumplido con esta dec +laración, que tenía el extraño predicado de que teníamos que ir a su +residencia para demostrar que habíamos cumplido. Una cosa puedo prome +ter: Nunca cruzaré su umbral, porque no quiero conocerlos en absoluto + basándome en lo que me metieron en el buzón."}]}content is {"transla +tions":[{"detected_source_language":"EN","text":"De todos modos, empi +ezo a recibir cartas diciendo que no he cumplido con esta declaración +, que tenía el extraño predicado de que teníamos que ir a su residenc +ia para demostrar que habíamos cumplido. Una cosa puedo prometer: Nun +ca cruzaré su umbral, porque no quiero conocerlos en absoluto basándo +me en lo que me metieron en el buzón."}]} data is HASH(0x55fbf8768858) fritz@laptop:~/Documents$ ^C

Source:

#!/usr/bin/perl use v5.030; # strictness implied use warnings; use Path::Tiny; use HTTP::Tiny; use JSON::MaybeXS; my $file_in = path("/home/fritz/Desktop/1.enchanto.txt"); my $file_out = path('/home/fritz/Desktop/1.enc_trans.txt'); my $lang = 'es'; my $guts = $file_in->slurp_utf8; my @spl = split( '\n', $guts ); my $ua = HTTP::Tiny->new( 'verify_SSL' => '1' ); for my $para (@spl) { say $para; my $payload = "text=$para&target_lang=$lang"; my $payloadlen = length($payload); my $response = $ua->request( 'POST' => 'https://api-free.deepl.com/v2/translate', { headers => { 'Authorization' => 'DeepL-Auth-Key redacted', 'Content-Length' => $payloadlen, 'Accept' => '*/*', 'Content-Type' => 'application/x-www-form-urlencoded', 'User-Agent' => 'curl/7.55.1' }, content => $payload, }, ); die "Failed!\n" unless $response->{success}; print $response->{content} if length $response->{content}; my $content = $response->{content}; say "content is $content"; my $data = decode_json($content); say "data is $data"; $file_out->spew_utf8( $para, $data ); } __END__

I typically use bliako's software for this, but I couldn't reconcile that with HTTP::Tiny:

use LWP::UserAgent; use HTTP::Request; use Data::Roundtrip; ... my $req = HTTP::Request->new( ... $response = $ua->request($req); die "Error fetching: " . $response->status_line unless $response->is_success; my $content = $response->decoded_content; my $data = Data::Roundtrip::json2perl($content); die "failed to parse received data:\n$content\n" unless exists $data->{'elevation'}; return $data->{'elevation'};

In particular I don't see how to do this without these modules:

my $content = $response->decoded_content; my $data = Data::Roundtrip::json2perl($content);

Anyways, I'm elated that I have spanish that I don't understand already and hope that someone can help me over the finish line with the json.

Cheers from the Rocky Mountains,

Replies are listed 'Best First'.
Re^3: De-googleizing translation scripts
by bliako (Abbot) on Nov 06, 2022 at 09:06 UTC
    data is HASH(0x55fbf8768858)

    You are receiving a JSON string from the remote server with your script (great!), that's stored in $response->decoded_content. Then you correctly convert that string, using decode_json(), into a perl data structure and store it in variable $data, in this case, of type HASH. You can use this data structure ($data) as usual, e.g. my $text1 = $data->{'translations'}->[0]->{'text'}. The data structure is this, for my case:

    { 'translations' => [ { 'text' => 'vencino hola', 'detected_source_language' => 'ES' } ] };

    If your question is how to print this data structure ($data) and get something meaningful instead of data is HASH(0x55fbf8768858), then there are lots of choices, I know of 2: Data::Dumper's Dumper() and Data::Roundtrip's perl2dump()*, which you mentioned already. Pick your poison.

    Of course you can write your own "data dumper", and that would be a nice climb up Recursion Peak and the Monastery is right behind you.

    Note that you have included an auth-key in your SCSE. You don't want that. *They* have now linked your CC, your translations and your monk handle and thus your comments. Brrrr (but hey the danger is not with "They" but with evil dictators outside Western Democracies /sic/ /sarcasm-off)

    bw, bliako

    Edit: *) Data::Roundtrip depends on Data::Dumper, so it would be simpler to use the latter, the former offers data converters and an easy way to "not-bloody-escape-unicode" which the latter does incessantly, to my eyeballs' irritation.

      "an easy way to "not-bloody-escape-unicode" which the latter does incessantly, to my eyeballs' irritation."

      See Data::Dumper::AutoEncode.

      Hope this helps!


      The way forward always starts with a minimal test.
      Note that you have included an auth-key in your SCSE. You don't want that.

      Thanks everyone who told me directly that I had left my fly undone. I was amazed that the example code worked right out of the gate...I guess it wasn't just an example, I got a little fooled as I hadn't searched for my key yet. (now changed) It's a pretty slick operation at deepl.

      First of all, bliako, thank you for your response, and it's good to hear from you. I had feared for your welfare with your proximity to ...Charybdis, but you sound no worse for the wear. I got some better results and then tried to extend it, make it more bliako-esque, and didn't quite get there. The writeup will be better in readmores:

      una velada agradable para el monasterio,

        *They* be dragons, so let's leave it at that.

        I think this calls for creating separate package(s) and a small script which will take user input from the command line like translate.pl --infile 'xyz' --outfile 'aaa' --verbose 1. I said maybe more packages because what I see above is some app-specific functions like http_tiny (I would call that fetch_from_server or something?) and also some more general-purpose functions like get_secrets which is general because it reads a config file and looks for some user-specified keys and, therefore, you can reuse that for other apps you will be creating in the future.

        More concretely, app-specific functions go to (say) Net::API::DeepL and general-purpose go to Aldebaran::Util. Now that's a first thought, other Monks may have some better suggestions. But the gist is to separate code in packages, and aim at re-using code (e.g. from your Aldebaran::Util) for any other scripts you produce in the future.

        Once you have these packages, then you create the simplest script to "drive" them and here useful will be Getopt::Long which makes it easy-peasy to parse CLI user input (the --infile xxx above).

        If you are still with me, then you need to start this properly:

        Module::Starter provides the CLI command module-starter which creates a skeleton project/app directory: module-starter --module='Net::API::DeepL' --builder='ExtUtils::MakeMaker' --author='Aldebaran' (optionally add --email=xyz@... if you want to publish this and get feedback, perhaps CPAN).

        Note, I have always used ExtUtils::MakeMaker, disclaimer: I never tried the alternatives, as this covers my needs just fine.

        Now you have a dir Net-API-DeepL and in there there will be your main file lib/Net/API/DeepL.pm. It will have a skeleton pod and be ready for inserting your subs in there (those related to DeepL and not those general ones). So add one sub in there (like the tiny_http).

        Your immediate next step will be to create test(s) for testing that sub you just added. Well, the wise ones will say that your first step is creating the test and then creating your tiny_http()!

        Create file t/10-tiny_http.t which may contain (just a suggestion):

        #!/usr/bin/env perl use strict; use warnings; use utf8; # if you must our $VERSION = '0.01'; use Test::More; use Test2::Plugin::UTF8; # rids of the Wide Character in TAP message! use Net::API::DeepL qw/http_tiny/; # import our new module my $results = http_tiny(...); # this is how a test looks like: ok(defined $results, "http_tiny() : called and got defined results"); ok(ref($results) eq 'HASH', "http_tiny() : results is a HASHref"); # etc etc etc done_testing(); # epilogue

        And you are ready to test your module:

        perl Makefile.PL make all make test

        You can create other test files for different subs, don't stuff everything into one test file. All test files will be run automatically with make test (and in alphabetical order, that's why we prepend them with numbers).

        Once your subs are tested, then create the driver script. This will be your main entry to Net::API::DeepL from the command line. E.g. translate.pl --infile ...

        Here is a skeleton which demonstrates the use of Getopt::Long to parse CLI parameters.

        #!/usr/bin/env perl use strict; use warnings; use utf8; use Getopt::Long; use Net::API::DeepL qw/http_tiny/; # import our new module my ($infile, $outfile, $verbose); $verbose = 0; if( ! Getopt::Long::GetOptions( 'infile=s' => \$infile, 'outfile=s' => \$outfile, 'verbose=i' => \$verbose, 'help' => sub { print STDERR "Usage : $0 --infile xx --outfile xx [- +-verbose N]\n"; exit(0) }, ) ){ die "error, something wrong with the command-line parameters." } die "parameters needed!" unless $infile and $outfile; my $results = http_tiny($infile, $outfile, ...); # at this point consider adding all your parameters into a hash and # pass that to http_tiny($options) instead of passing a long list whic +h # may contain optional parameters. die unless $results; print "$0 : done, success.\n";

        Well, that's something to get you started: create project dir, add module functionality, create tests, create driver script. I have omitted more details on file lib/Net/API/DeepL.pm like how to export http_tiny(). Will do that when you are ready.

        Also, you may want to think about creating a database of past translations which your module fills in as data is fetched from server so that you don't translate things twice. But with free text translation this is not going to be worth.

        Edit: Also, you may want to consider using an OO approach. This allows for storing some data into your translate object (of class Net::API::DeepL), e.g. your credentials. If you need to be doing multiple translations, this will be ideal:

        my $config = get_secrets(); my $trans = Net::API::DeepL->new($config); my @results; for my $totranslate (@$translations){ push @results, $trans->http_tiny($totranslate); }

        Others may want to give their advice or make a comment for all I mentioned above as nothing is written on stone, please do.

        bw, bliako

Re^3: De-googleizing translation scripts
by karlgoethebier (Abbot) on Nov 06, 2022 at 13:13 UTC

    See here for some more ideas for your client. See also WWW::Curl. If I didn’t mention this already.

    And remember from the DeepL API:

    "… You should not put the key in publicly-distributed code… If your authentication key becomes compromised, you can recreate a new key and discard the old one in your account settings."

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»