qnguyen has asked for the wisdom of the Perl Monks concerning the following question:

I thought I had solved this problem, but still having issues using spanish characters with a hash. I dont get a result if I use this $spanishword = &trim (" Espere para verificar el estado del operador. "); . I will get a result if I use this $spanishword = &trim (" Forward ");. I know its a problem with encoding. I tried saving my .pl and .txt files as ANSI and UTF-8, but still does not work. Is it the version of perl I am using thats causing the problem? ActivePerl 5.8.8 Build 819. If someone can get this to work, can they send me the encoding they are using and perl version.
#!/usr/bin/perl use warnings; use strict; ## LAN filename for translation my $filetwo="test2.txt"; ## Translated debug file my $fileout="out.txt"; my $linein; my $englishword; my $spanishword; my %langhash; my @arrayin; ## Open Lan file to read in the English and Spanish words open (FILEIN2, $filetwo) or die "Can't open debug file to be translat +ed.\n";; open (FILEOUT, ">$fileout") or die "Can't open output file.\n";; ## Go through each line and replace while ($linein = <FILEIN2>) { chomp $linein; @arrayin = split /:/, $linein; ##creates an array with + split function using : as separator if (exists $arrayin[0] && $arrayin[0] eq "Default") { $englishword = $arrayin[1]; } if (exists $arrayin[0] && $arrayin[0] eq "Spanish_LatinAmerica +n|es_MX") { $spanishword = $arrayin[1]; } if (exists $arrayin[0] && $arrayin[0] eq "Spanish_LatinAmerica +n|es_MX") { $langhash{$spanishword} = $englishword; } } ## while (($spanishword, $englishword) = each %langhash){ ##Pr +int entire hash tableS ## print "$englishword => $spanishword\n"; ## } close(FILEIN2); close(FILEOUT); ##while (($spanishword, $englishword) = each %langhash){ ##Print en +tire hash tableS ## print "$englishword => $spanishword\n"; ## print "$langhash{$spanishword}\n"; ## } $spanishword = &trim (" Espere para verificar el estado del operador +. "); if (exists $langhash{$spanishword}){ print "$langhash{$spanishword}\n"; } else { print "No match\n"; } sub trim { for (my $s = $_[0]) { s/^\s+//; s/\s+$//; return $_; } }
Here is an example of my test2.txt file:
Comment:No Translation Needed Default:digit incorrect. Translate: FALSE Spanish_LatinAmerican|es_MX:dígito verificador incorrecto. Comment: This message is spoken to an operator when the "release licen +se" voice command is used. The message confirms to the operator that +the license was released. Default:Say ready. Translate: FALSE Spanish_LatinAmerican|es_MX:listo. Comment: This message is spoken to an operator when the "release licen +se" voice command is used. The message confirms to the operator that +the license was released. Default:Reverse Translate: FALSE Spanish_LatinAmerican|es_MX:Forward Comment: The task is telling the operator that they specified an incor +rect location. Default: Incorrect location. Spanish_LatinAmerican|es_MX: Ubicación incorrecta. Comment: Tells the operator that the operator status is being checked. Default: Please wait to check operator status Spanish_LatinAmerican|es_MX: Espere para verificar el estado del opera +dor.

Replies are listed 'Best First'.
Re: Still having problems with spanish characters
by almut (Canon) on Sep 28, 2007 at 17:58 UTC

    Try

    ... if (exists $arrayin[0] && $arrayin[0] eq "Spanish_LatinAmerica +n|es_MX") { $spanishword = trim( $arrayin[1] ); }

    i.e. the hash key must be trimmed the same way the lookup string is, otherwise they won't match...

    To debug such issues yourself, simply dump the entire hash using Data::Dumper:

    use Data::Dumper; print Dumper \%langhash;

    You would've seen that there's a leading space before " Espere ..."

Re: Still having problems with spanish characters
by Anonymous Monk on Sep 28, 2007 at 20:15 UTC
    also, the  trim() function does not seem correct: it is passed a parameter that is not used, has a useless for-loop, and it alters and returns the  $_ parameter (and i can't figure out what  $_ is at that point). is this really what you intend?

    sub trim { for (my $s = $_[0]) { # passed param assigned, not used s/^\s+//; # substitutions on $_ s/\s+$//; return $_; } }

    instead, try something like

    sub trim { my ($string) = @_; $string =~ s{ \A \s+ }{}xms; $string =~ s{ \s+ \z }{}xms; return $string; }

    also also, do not invoke a function with an ampersand, i.e.,  &trim($string);, unless you know what this really does. use something like  trim($string); or  trim("  string to be trimmed  "); instead.

      sub trim { for (my $s = $_[0]) { # passed param assigned, not used s/^\s+//; # substitutions on $_ ...

      Actually, it works... it's only written in a somewhat weird fashion :)

      The passed parameter is assigned to $s, which is then aliased to $_ by the for loop... The seemingly superfluous $s is needed, because otherwise you'd get the error "Modification of a read-only value attempted".  But I agree there are more readable versions...

        it's only written in a somewhat weird fashion

        somewhat weird...?!?!?

Re: Still having problems with spanish characters
by graff (Chancellor) on Sep 29, 2007 at 14:43 UTC
    Speaking of things being written in a "somewhat weird fashion", I find this bothersome (even as corrected by almut):
    while ($linein = <FILEIN2>) { chomp $linein; @arrayin = split /:/, $linein; if (exists $arrayin[0] && $arrayin[0] eq "Default") { $englishword = trim( $arrayin[1] ); } if (exists $arrayin[0] && $arrayin[0] eq "Spanish_LatinAmerican|es +_MX") { $spanishword = trim( $arrayin[1] ); } if (exists $arrayin[0] && $arrayin[0] eq "Spanish_LatinAmerican|es +_MX") { $langhash{$spanishword} = $englishword; } }
    I think this would make more sense:
    while (<FILEIN2>) { next unless ( /.:\s*\S/ ); # necessary/sufficient condition for w +hat follows my ( $lang, $text ) = split /:/, $_, 2; if ( $lang eq "Default" ) { $englishword = trim( $text ); # takes care of chomping } elsif ( $lang eq "Spanish_LatinAmerican|es_MX" ) { $spanishword = trim( $text ); $langhash{$spanishword} = $englishword; } }
    It seemed to me that the OP code was doing too many unnecessary tests with all those "if" conditions, and they actually weren't entirely appropriate to the task: you only check for the existence of the first array element, and then do something with the second array element whether it exists or not.

    And I would agree with the Anonymonk who thinks that "weird" is an understatement when describing a subroutine with a for loop that does a "return" on each iteration... {grin}

      ... "weird" is an understatement when describing a subroutine with a for loop that does a "return" on each iteration...

      Maybe one should mention (in defense of the OP being wrongfully picked upon), that this particular piece of code had been suggested as is by one of our Grandmasters...
      </Mother-Teresa-mode>

        Ah! Well, then... point well taken, to be sure. And now that you mention it, the first term that sprang to my mind on seeing that snippet was: clever.

        (And as we all know, one person's cleverness might sometimes go in directions where another person might prefer not to follow. ;)