qnguyen has asked for the wisdom of the Perl Monks concerning the following question:

Ive built a hash table below and I need to replace the spanish characters inside the ##...## with the english translations. I sometimes will not have the translations, so I will have to print a mix of spanish and english words. What is the most efficient way of performing this as my logs are very large. I have this so far, but it CANNOT replace the individual words in between the double pound signs, only the entire string. Would it be better to use regx or split/join to complete this? Any examples would be helpful.
Example: (5/4/07 1:45:29 PM CDT) 7819953:##uno cinco cuatro totes##;pp I would like the following result: (5/4/07 1:45:29 PM CDT) 7819953:##1 2 4 totes##;pp
#!/usr/bin/perl use warnings; use strict; ## LAN filename for translation my $filetwo="VoiceLink only Soriana.txt"; ## Debug file translated my $fileout3="fileout3.txt"; ## Debug file to be translated, must be saved as ANSI my $linein3="Log_097102053_2007-05-04_20-34-51.txt"; ## FIL file with translations my $linein = "COM_211_RAD_07A_ES_MX.FIL"; my $englishword; my $spanishword; my %langhash; my @arrayin; open (FILEIN, $linein) or die "Can't open first file.\n";; ## Read in each line from FIL file and create a hash table while ($linein = <FILEIN>) { chomp $linein; @arrayin = split /\|/, $linein; ##creates an array with spli +t function using | as separator $englishword = trim ( $arrayin[0] ); $spanishword = trim ( $arrayin[1] ); print "$spanishword -> $englishword\n"; ## test print $langhash{$spanishword} = $englishword; } close(FILEIN); ## Open Lan file to read in the English and Spanish words open (FILEIN2, $filetwo) or die "Can't open debug file to be translat +ed.\n";; ## Go through each line and replace while (<FILEIN2>) { next unless ( /.:\s*\S/ ); my ( $lang, $text ) = split /:/, $_, 2; if ( $lang eq "Default" ) { $englishword = trim( $text ); # takes care of chomping } elsif ( $lang eq "Spanish_LatinAmerican|es_MX" ) { $spanishword = trim( $text ); $langhash{$spanishword} = $englishword; } } ## while (($spanishword, $englishword) = each %langhash){ ## T +est print of entire hash table ## print "$englishword => $spanishword\n"; ## } close(FILEIN2); ## Open debug file to convert from English to Spanish open (FILEIN3, $linein3) or die "Can't open debug file.\n"; open (FILEOUT3, ">$fileout3") or die "Can't open output file.\n"; ## Go through each line and replace while ($linein3 = <FILEIN3>) { chomp $linein3; if ($linein3 =~ /##\s*([^#]+?)\s*##/) { if (exists ($langhash{$1})) { $linein3=~ s/##\s*([^#]+?)\s*##/##$langhash{$1}##/; print FILEOUT3 "$linein3\n"; } else { $linein3=~ s/##\s*([^#]+?)\s*##/##$1##/; print FILEOUT3 "$linein3\n"; } } else { print FILEOUT3 "$linein3\n";} ##print unmodified line where no double + pound exists } close(FILEIN3); close(FILEOUT3); sub trim { for (my $s = $_[0]) { s/^\s+//; s/\s+$//; return $_; } }
Here is an example of Hash Table:
barra -> / cero -> 0 uno -> 1 dos -> 2 tres -> 3 cuatro -> 4 cinco -> 5 seis -> 6 siete -> 7 ocho -> 8 nueve -> 9
I have the following log below: (5/4/07 1:44:44 PM CDT) 7782719:##uno seis uno + + ##;pp (5/4/07 1:45:29 PM CDT) 7819953:##uno cinco cuatro##;pp (5/4/07 1:45:29 PM CDT) 7821389:##uno seis uno + + ##;pp (5/4/07 1:48:01 PM CDT) 7976990:##uno seis uno + + ##;pp (5/4/07 1:48:01 PM CDT) 7979657:## tres empaques##;pp (5/4/07 1:48:16 PM CDT) 7980901:##uno seis uno + + ##;pp (5/4/07 1:48:33 PM CDT) 8007951:##Tome tres empaques ##;pp (5/4/07 1:48:48 PM CDT) 8010439:##BEEP (375,2)## (5/4/07 1:48:48 PM CDT) 8011336:##Selección completa. ##;pp (5/4/07 1:48:48 PM CDT) 8012745:## andén Anden uno tres cinco##;pp

Replies are listed 'Best First'.
Re: Substitution in conjunction with split or regx
by toolic (Bishop) on Oct 30, 2007 at 17:17 UTC
    If I understand your question correctly, the following code should translate certain Spanish words into numerical values:

    Produces this output:

    (5/4/07 1:44:44 PM CDT) 7782719:##1 6 1##;pp (5/4/07 1:45:29 PM CDT) 7819953:##1 5 4##;pp (5/4/07 1:45:29 PM CDT) 7821389:##1 6 1##;pp (5/4/07 1:48:01 PM CDT) 7976990:##1 6 1##;pp (5/4/07 1:48:01 PM CDT) 7979657:## 3 empaques##;pp (5/4/07 1:48:16 PM CDT) 7980901:##1 6 1##;pp (5/4/07 1:48:33 PM CDT) 8007951:##Tome 3 empaques##;pp (5/4/07 1:48:48 PM CDT) 8010439:##BEEP (375,2)## (5/4/07 1:48:48 PM CDT) 8011336:##Selecci&#65533;n completa.##;pp (5/4/07 1:48:48 PM CDT) 8012745:## and&#65533;n Anden 1 3 5##;pp

    This may not be the most efficient way to do this, considering that you have large files, but our fellow Monks will no doubt be able to help you optimize this.